Skip to content

ArchiveBox — Runbook

Quick Reference

Item Value
LXC 128 @ 192.168.1.128
URL https://archive.eva-00.network
Version v0.7.3 (stable)
Health (web) curl http://192.168.1.128:8000 (200 or 302)
Health (API) curl http://192.168.1.128:8001/health (200)
Vault secret/data/archivebox
Deploy Forgejo Actions -> Deploy ArchiveBox

Check Service Status

ssh [email protected] docker compose -f /opt/archivebox/docker-compose.yml ps

Both containers should be running:

  • archivebox — Web UI on port 8000
  • archivebox-api — API wrapper on port 8001

Restart Services

ssh [email protected] docker compose -f /opt/archivebox/docker-compose.yml restart

View Logs

Via Loki (preferred)

{container_name="archivebox"}
{container_name="archivebox-api"}
{container_name=~"archivebox.*"} |= "error"
{container_name="archivebox-api"} |= "POST /add"

Via SSH (fallback)

ssh [email protected] docker compose -f /opt/archivebox/docker-compose.yml logs -f --tail 100
ssh [email protected] docker compose -f /opt/archivebox/docker-compose.yml logs -f --tail 100 api-wrapper

Add URLs to Archive

Via API wrapper (preferred)

curl -X POST http://192.168.1.128:8001/add \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://old.reddit.com/r/...", "tag": "subreddit_name"}'

Via CLI (direct)

ssh [email protected] docker exec archivebox archivebox add 'https://example.com'

With tags

ssh [email protected] docker exec archivebox archivebox add --tag 'reddit,pics' 'https://old.reddit.com/r/pics/...'

From a file (one URL per line)

scp urls.txt [email protected]:/tmp/urls.txt
ssh [email protected] docker exec archivebox archivebox add < /tmp/urls.txt

Re-run missing extractors on existing snapshots

ssh [email protected] docker exec archivebox archivebox update

This only runs extractors that haven't succeeded yet (smart incremental).

Trigger Backups Manually

PBS — LXC snapshot

Via Proxmox UI or CLI:

ssh [email protected] vzdump 128 --storage cajita-elite --compress zstd --mode snapshot --notes "manual backup"

Databasement — SQLite dump

Trigger via the Databasement web UI at http://192.168.1.196:2226. Navigate to the ArchiveBox database entry and click "Backup Now".

Backrest — Archive files

ssh [email protected] curl -s -X POST 'http://localhost:9898/v1.Backrest/Backup' \
  -H 'Content-Type: application/json' \
  -d '{"value":"archivebox-archives"}'

Fresh Redeploy

Trigger via Forgejo Actions with force_clean=true, or manually:

ssh [email protected]
cd /opt/archivebox
docker compose down
rm -rf /opt/archivebox/data/*
# Note: archive files on urahara are preserved

Then re-run the workflow. It will re-init and re-create the admin user.

Memory / Performance

LXC resource allocation

Resource Value Notes
RAM 3072 MB Headless Chrome is the main consumer (~300-500MB per tab)
Cores 2
Disk 16 GB rootfs Archives on urahara, so rootfs stays small

RAM guidance

Archiving activity Expected RAM
Idle (no archiving) ~200 MB
Screenshot (1 URL) ~500 MB - 1 GB
DOM + wget + WARC (1 URL) ~300-500 MB
Media download (yt-dlp) ~300-500 MB
Peak (Chrome + wget + yt-dlp) ~1.5-2 GB

ArchiveBox processes URLs sequentially by default. If you see OOM, increase LXC RAM:

pct set 128 --memory 4096

Disk usage on urahara

Reddit post archives average 5-10 MB. General web pages 1-5 MB. Media varies widely.

Bookmark count Estimated archive size
1,000 5-15 GB
5,000 35-50 GB
10,000 70-100 GB

Troubleshooting

API wrapper returns 401

Check the Bearer token matches the one in Vault at secret/data/archivebox -> api_key. Also verify the token matches what's in n8n's .env as ARCHIVEBOX_API_KEY.

API wrapper returns 500 or timeout

Check the archivebox container is healthy and the /data volume is shared:

ssh [email protected] docker exec archivebox-api curl -s http://localhost:8001/health
ssh [email protected] docker exec archivebox-api ls /data/index.sqlite3

ArchiveBox returns 403 when accessing via browser

oauth2-proxy is blocking. Check:

  1. PocketID OIDC client exists with correct callback URL
  2. oauth2-proxy container is running on LXC 119: {container_name="oauth2-proxy-archivebox"}
  3. Caddy is routing archive.eva-00.network to 192.168.1.119:8592

wget/WARC not running despite being enabled

In v0.7.3, WARC generation requires SAVE_WGET=True (wget produces the WARC). Both must be enabled. Check:

ssh [email protected] docker exec archivebox archivebox config --get SAVE_WGET
ssh [email protected] docker exec archivebox archivebox config --get SAVE_WARC

"output.html does not exist" in web UI

This happens when --overwrite creates new snapshot directories with different timestamps while DB still points to the original. Fix:

# Re-run missing extractors on original snapshot
ssh [email protected] docker exec archivebox archivebox update

yt-dlp not downloading media

Check yt-dlp is installed and up to date:

ssh [email protected] docker exec archivebox yt-dlp --version

Archives not appearing in /mnt/archivebox/archive

Verify the bind mount is active:

ssh [email protected] df -h /mnt/archivebox/archive
ssh [email protected] ls -la /mnt/archivebox/archive/

If empty, check Proxmox bind mount config for LXC 128.

Clear the _oauth2_archivebox cookie in your browser, or try incognito.