Wayback Cache Proxy — Browsing the Old Web Offline
When I started restoring 30 years of browser art for ZKM’s Choose Your Filter! exhibition, everything depended on the Wayback Machine. Old websites, dead servers, gone domains — the Internet Archive was the only source. Then in late 2024, it went offline for weeks after DDoS attacks and a data breach.
That was the moment I realized I couldn’t run an exhibition on someone else’s uptime.
The Problem
The exhibition needed visitors to browse archived websites from the 1990s and 2000s inside historical browser environments. Every page load hit the Wayback Machine in real time. No Wayback Machine, no exhibition. And even when it was up, response times were unpredictable — sometimes a page took 10 seconds, sometimes it timed out entirely.
What I Built
Wayback Cache Proxy sits between the browser and the Wayback Machine. It fetches archived pages once, caches them in Redis, and serves them locally from that point on. If the Wayback Machine goes down tomorrow, the exhibition keeps running.
The proxy does more than just cache though:
Two-tier cache — A permanent “curated” tier for content I’ve verified and want to keep forever, plus an auto-expiring “hot” tier for pages visitors discover on their own. The curated tier survives Redis restarts, the hot tier cleans itself up after 7 days.
Prefetch crawler — Before the exhibition opened, I crawled seed URLs to pre-populate the cache. You add URLs through the admin interface, set a crawl depth, and let it spider. By opening day, thousands of pages were already cached.
Modem speed throttling — This was the fun part. You can throttle connections to simulate 14.4k, 28.8k, 56k, ISDN, or DSL speeds. Visitors could pick their speed from a dropdown in the header bar. Watching someone wait 30 seconds for an image to load on simulated 14.4k really drives home how different browsing used to be.
Content transformation — The Wayback Machine injects its own toolbar, scripts, and rewrites URLs in archived pages. The proxy strips all of that out and fixes asset URLs so pages render cleanly.
Some Technical Decisions
The proxy is an async TCP server that speaks raw HTTP — no framework, just asyncio reading bytes off a socket. This was deliberate. Exhibition software needs to be simple to debug and predictable under load. When something breaks at 9am on a Saturday and the exhibition opens at 10, you want to be reading your own code, not a framework’s stack trace.
The admin interface at /_admin/ is built to work in IE4. Not because anyone is using IE4 to administrate the proxy, but because the proxy already needed to handle requests from historical browsers running in the exhibition. Making the admin work there too meant one less thing to worry about.
There’s also a separate FastAPI admin service on port 8080 with a proper dark-themed dashboard for remote management — cache statistics, crawler controls, live configuration reload via Redis Pub/Sub.
Open Source
I released it under MIT on ZKM’s GitHub: zkmkarlsruhe/wayback-cache-proxy
It’s designed for museums and exhibitions, but the architecture works for anyone who needs reliable, offline-capable access to archived web content. If you’re running a net art show, doing web preservation research, or just want to browse the old web without depending on the Internet Archive’s availability — give it a try.