ZKM Python

wayback-cache-proxy

A caching proxy for the Wayback Machine with Redis-backed two-tier cache, prefetch crawler, modem speed throttling, and admin interface.

wayback-machineproxyredismuseumcachingweb-preservation

What it does

An HTTP proxy that fetches archived web pages from the Wayback Machine, caches them locally in Redis, and serves them independently. Built for museum exhibitions that need reliable access to historical web content without depending on the Internet Archive’s availability.

Key features

  • Two-tier Redis cache — permanent curated tier for verified content plus auto-expiring hot tier for visitor-discovered pages
  • Prefetch crawler — async spider that pre-populates the cache from seed URLs before exhibitions open
  • Modem speed throttling — period-accurate simulation of 14.4k, 28.8k, 56k, ISDN, and DSL connections with visitor-selectable dropdown
  • Content transformation — strips Wayback Machine toolbar, fixes asset URLs, and cleans injected scripts
  • Header bar overlay — injected UI showing archive date, current URL, and speed selector
  • Admin interfaces — FastAPI dashboard on port 8080 plus embedded IE4-compatible interface at /_admin/
  • URL allowlisting — restrict browsable domains for curated exhibition experiences
  • Live config reload — change settings via admin UI without restarting the proxy

Tech stack

Async Python TCP server speaking raw HTTP, Redis for caching and pub/sub, FastAPI for the remote admin service. Docker Compose for deployment. Built for the Choose Your Filter! browser art exhibition at ZKM Karlsruhe.