exhibition-vm-controller — Snapshot-Revert as a Conservation Strategy


I work with a lot of digital art that was built for an OS nobody supports anymore. Flash on Windows XP. Pulse 3D on Mac OS 9. Browser-based works that depend on Java applets running inside Internet Explorer 8. The art is fine. The substrate isn’t, and you cannot port the substrate without changing the art.

The conservation strategy at ZKM for this kind of work has converged on something specific: don’t try to keep the software working — keep the whole operating system frozen, and revert it on any fault. exhibition-vm-controller is the framework that makes that an autonomous, exhibition-grade operation.

The unit of recovery is the OS, not the application

A typical “auto-restart on crash” system watches a single process and respawns it. That works for software written in this decade. It does not work for software whose author is dead and whose runtime no longer exists. By the time the application has crashed, the state of the OS underneath it is also corrupted in ways nobody knows how to clean up. Restarting the binary means starting from a worse state than you were in five minutes ago, and the rot accumulates over the run of an exhibition.

The fix is to treat the VM snapshot as the recovery unit. You capture a “ready” snapshot of the artwork’s environment — Windows XP with everything launched, the artwork running, the splash screens dismissed — and on any failure trigger you revert. The OS is back to the exact state of “ready,” every time. No drift, no accumulated cruft, no manual intervention.

The triggers

The hard part is knowing when to revert. The framework watches for several things in parallel:

  • Heartbeat timeout — the guest sends a heartbeat every few seconds; no signal for 15 s and the host pulls the trigger
  • VM shutdown — manual or unexpected, both treated the same
  • QEMU guest agent unresponsive — the guest is alive at the hypervisor level but its agent is wedged
  • Application-specific error signals — the artwork itself can declare an error condition via a guest-side script

Heartbeats are platform-specific because the guest OSes are. AutoIT scripts on Windows XP. Bash on Linux. AppleScript on Mac OS. They all push a single TCP packet to the host every few seconds, and the absence of that packet is what triggers everything else.

The architecture

Host controller is a Python FastAPI service. It receives heartbeats, monitors VM state, controls the libvirt lifecycle (snapshot revert, restart), and exposes a REST API so external monitoring can see what’s going on.

Guest components are deliberately minimal: the QEMU guest agent (mandatory in every guest OS), the heartbeat script in whatever the guest natively understands, an idle monitor for the application, a process watchdog. Anything more elaborate inside the guest is fragile.

Display in production runs Openbox + virt-viewer --kiosk for full-screen exhibition mode; virt-manager is available during installation and maintenance for the obvious reasons.

Why VMs and not emulation

Emulators are a great research tool. They are not a great exhibition tool. They are slower, the input timing is subtly wrong, and they often look slightly off in ways visitors notice without being able to articulate. A KVM/QEMU VM running the actual original OS on modern hardware is faster, faithful, and — critically — cheaper to keep running for ten years. The KVM substrate has been stable for two decades. The chance that it’s still around in another twenty years is a bet I’m willing to take.

Open source

Released under MIT, with a Zenodo DOI: 10.5281/zenodo.18652760. Companion to several historical works in the ZKM collection, including the netart-extinction site.

If you’re conserving software-based art and have been wondering whether the VM-snapshot approach is viable for autonomous exhibition operation: it is, we’ve been running it on real shows, and the code is yours.