ADR-016 — Storage Migration Plan
Generated: 2026-04-02 (from conversation 90d09d79) Updated: 2026-04-04 — no mirror, 4-disk layout, docker-host decomposed, sdd as bulk data store
Current Storage Architecture
| Disk | Size | Model | Current Use | State |
|---|---|---|---|---|
| nvme0n1 | 250GB | Samsung 960 EVO | Proxmox OS + local-lvm thin pool |
At capacity — 79.55% (~28GB free) |
| sda | 10.9TB | WD Gold | /mnt/filedump → LXC 102 |
Keep |
| sdb | 1.8TB | Crucial MX500 | /mnt/seedbox → LXC 111 data |
Keep |
| sdc | 3.6TB | Crucial MX500 | /mnt/all-might → LXC 116 (/unohana) app data |
Keep |
| sdd | 1.8TB | Crucial MX500 | Wiped | → urahara (bulk data) |
| sde | 500GB | Samsung 860 EVO | ollama-disk → LXC 107 rootfs |
WIPE → apps-pool |
| sdf | 1TB | Samsung 860 EVO | ZFS zpool → LXCs 109, 110, 111, 115, 116 rootfs |
Decommission |
| sdg | 500GB | Samsung 860 EVO | Partition exists, not mounted | WIPE → infra-pool |
| sdh | 500GB | Samsung 860 EVO | Partition exists, not mounted | WIPE → heavy-pool |
Problem: All critical infra LXC rootfs live on a single NVMe disk with no redundancy and only 28GB free. Docker-host (LXC 103) is a 35GB monolith mixing critical auth services with heavy I/O apps — a failure takes everything down.
zpool/media: Cleared on 2026-04-04. Only LXC subvolumes remain (~9GB total).
Mirror vs No Mirror — Why No Mirror
- With PBS nightly whole-LXC backups, a disk failure = at most ~24hrs data loss and minutes to restore
- A mirror only prevents downtime, not data loss — PBS covers both
- No mirror = 4 separate disks = better I/O isolation, full capacity, better SSD longevity
See backup-strategy.md §9 for full analysis.
Filesystem Choice — Why ZFS + ext4
| ZFS | ext4 | btrfs | XFS | |
|---|---|---|---|---|
| Proxmox integration | Native zfspool — pct move-volume, thin provisioning |
dir storage only — manual mount, no thin provisioning |
Partial subvolume support, Proxmox prefers ZFS | dir storage only |
| Snapshots | Instant, free until changed | No | Yes | No |
| Compression | lz4 (fast) or zstd (better ratio) | No | zstd | No |
| Copy-on-write | Yes — efficient pct move-volume |
No — full copy | Yes | No |
| Data checksums | Detects silent corruption | No | Yes but historically buggy on single disk | No |
| RAM usage | Hungry — ARC wants 1GB+ per pool | Minimal | Moderate | Minimal |
| SSD wear | CoW = write amplification on random writes | Lower — in-place writes | CoW = same as ZFS | Lowest — in-place |
| Recovery tools | zpool scrub, import/export |
fsck.ext4 — best in Linux |
btrfs check — improving |
xfs_repair — solid |
Decision:
- sdg, sdh, sde → ZFS (infra-pool, heavy-pool, apps-pool) — Proxmox native integration makes pct move-volume seamless, instant snapshots before risky changes, lz4 compression saves space on logs/databases/Docker layers, checksums catch silent SSD corruption
- sdd → ext4 (urahara) — bulk data via bind-mounts, not LXC rootfs management. Simpler, lower overhead, less write amplification for large sequential files (models, media, archives)
- RAM budget: 3 ZFS pools ≈ 2-3GB ARC cache. Cap with options zfs zfs_arc_max=2147483648 (2GB) in /etc/modprobe.d/zfs.conf if needed
Directory Convention
ZFS pools (infra-pool, heavy-pool, apps-pool)
Proxmox manages ZFS subvolumes automatically (subvol-<id>-disk-0). We can't control that. What we standardize is the layout inside each LXC:
/opt/<app-name>/
├── docker-compose.yml
├── .env
├── config/ # app config files (mounted into container)
├── database/ # DB data volume (if applicable)
└── assets/ # local assets (if not on urahara)
Docker volumes use bind mounts to these directories instead of anonymous named volumes. This makes everything inspectable, discoverable, and backup-friendly.
Examples:
# auth LXC (sdg/infra-pool)
/opt/vaultwarden/
├── docker-compose.yml
├── .env
├── config/
└── database/ # SQLite (305KB)
/opt/pocketid/
├── docker-compose.yml
├── .env
├── config/
└── database/ # pocketid data (5MB)
# matrix LXC (sdh/heavy-pool)
/opt/synapse/
├── docker-compose.yml
├── .env
├── config/ # homeserver.yaml, signing key
├── database/ # synapse DB
└── assets/ # media_store (or bind-mount to urahara if grows)
# observability LXC (sdh/heavy-pool)
/opt/loki/
├── config/ # loki-config.yml
└── database/ # chunks, index
/opt/prometheus/
├── config/ # prometheus.yml, rules
└── database/ # TSDB
/opt/grafana/
├── config/ # grafana.ini, provisioning/
└── database/ # grafana.db
Benefits:
- ls /opt/ on any LXC instantly shows what apps live there
- du -sh /opt/*/database/ shows DB sizes across all apps on a host
- Tier 2 targeted backups can glob /opt/*/database/ for all DB dumps
- Consistent across all LXCs — no guessing where an app stores its data
- Docker-compose files always at /opt/<app-name>/docker-compose.yml
Urahara (sdd/ext4)
Bulk data store — top-level folders are app names, subfolders describe data type:
/mnt/pve/urahara/
├── ollama/models/
├── karakeep/assets/
├── immich/photos/
├── tubearchivist/videos/
└── archivebox/archives/
Accessed by LXCs via Proxmox bind-mounts (pct set <id> -mp0 /mnt/pve/urahara/<app>/<type>,mp=/mnt/<app>/<type>).
Proposed New Storage (4-disk layout)
| Pool/Mount | Disk | Size | Purpose |
|---|---|---|---|
| infra-pool | sdg (ZFS single) | ~465GB | Critical infra — low I/O, must stay stable |
| heavy-pool | sdh (ZFS single) | ~465GB | Heavy I/O + growth-prone service configs/DBs |
| apps-pool | sde (ZFS single) | ~465GB | Light app LXC rootfs |
| urahara | sdd (ext4 dir) | ~1.8TB | Bulk data: ollama models, karakeep assets, future immich/TA/AB data |
Docker-Host Decomposition
LXC 103 currently runs 30 containers in one monolith. Splitting into purpose-built LXCs based on criticality, I/O profile, and failure blast radius.
Current docker-host inventory
| Service | Image size | Volume data | I/O | Criticality |
|---|---|---|---|---|
| vaultwarden | 254MB | 305KB | Negligible | Critical — all passwords |
| pocketid | 78MB | 5MB | Negligible | Critical — SSO for everything |
| gatus | 50MB | 4.7MB | Light | Infra — uptime monitoring |
| ntfy | 84MB | 61KB | Light | Infra — push notifications |
| glance | 23MB | tiny | Light | Infra — dashboard |
| synapse | 377MB | 6.5MB (will grow) | Heavy | Medium — chat |
| open-webui | 4.97GB | 1.1GB | Heavy | Medium — AI chat |
| ollama (Docker) | 6.02GB | 11GB | Heavy | Medium — redundant with LXC 107 |
| n8n | 1.78GB | 35MB | Moderate | Medium — automation |
| code-server | 754MB | 1MB | Light | Low — on-demand IDE |
| thelounge | 214MB | 66KB | Light | Low — IRC |
| qbitwebui | 479MB | 4.4MB | Light | Low — torrent UI |
| alloy | 425MB | 7.9MB | Light | Monitoring agent |
| ~~cadvisor~~ | 82MB | — | — | Removed — replaced by Alloy |
| ~~node-exporter~~ | 28MB | — | — | Removed — replaced by Alloy |
| ~~promtail~~ | 202MB | — | — | Removed — replaced by Alloy |
| 11x oauth2-proxy | 38MB each | 0 | None | Sidecar — moves with its service |
Docker ollama on LXC 103 is redundant — open-webui already connects to LXC 107 (192.168.1.107:11434) over the network. Eliminating the Docker ollama saves 17GB (6GB image + 11GB models).
New LXC groupings
| New LXC | Services | Why grouped | Disk | Est. size |
|---|---|---|---|---|
| auth | vaultwarden, pocketid, + their oauth2-proxies | Both critical auth; tiny I/O; shared failure = auth down (already true today) | sdg (infra) | 4GB alloc |
| infra-apps | gatus, ntfy, glance, + oauth2-proxies for glance/gatus | Infra monitoring/alerting/dashboard; all tiny | sdg (infra) | 4GB alloc |
| matrix | synapse, + oauth2-proxy (if needed) | Heavy DB growth, federation traffic; isolate from everything | sdh (heavy) | 16GB alloc |
| ai | open-webui, + oauth2-proxy | Points to LXC 107 ollama over network; chat history grows | sdh (heavy) | 8GB alloc |
| automation | n8n, + oauth2-proxy | Moderate I/O, workflow DB grows; triggers across services | sdh (heavy) | 8GB alloc |
| tools | code-server, thelounge, qbitwebui, + oauth2-proxies | All light, on-demand use; non-critical | sde (apps) | 8GB alloc |
Monitoring: Each new LXC gets Alloy only — it replaces promtail, node-exporter, and cAdvisor (migration completed March 2026). Legacy agents must not be deployed on any new LXC.
OAuth2-proxy for external services (filedump, homebridge, seedbox, shoko, grimmory, romm, normal-qbit) — these proxy services on other LXCs. They can go on infra-apps since they're stateless and infra-adjacent.
I/O and Growth Analysis
Heavy I/O
| Service | LXC | Rootfs | Why |
|---|---|---|---|
| Loki (observability) | 108 | sdh | Constant log ingestion writes 24/7 |
| Prometheus (observability) | 108 | sdh | Constant metrics TSDB writes 24/7 |
| Meilisearch (karakeep) | 117 | sdh | Bursty re-indexing, heavy writes during bookmark imports |
| headless Chrome (karakeep) | 117 | sdh | Full-page archiving (assets offloaded to sdd) |
| synapse | NEW | sdh | DB writes on every message, federation traffic, media store |
| open-webui | NEW | sdh | Chat history writes (1.1GB volume), model interaction |
| n8n | NEW | sdh | Workflow executions write to SQLite DB |
| mediabot PostgreSQL | 113 | sde | DB writes during searches/downloads, index rebuilds |
| mediabot qBittorrent | 113 | sde | Torrent state I/O during active downloads |
| Jackett/Prowlarr (mediabot) | 113 | sde | Indexer queries, cache writes |
| jellyfin | 114 | sde | Metadata scans, transcoding temp files, image cache |
| Immich (future) | TBD | sdh | PostgreSQL + Redis, ML embeddings, thumbnail gen |
| Tube Archivist (future) | TBD | sdh | Elasticsearch indexing, metadata extraction |
| ArchiveBox (future) | TBD | sdh | SQLite + full-text index, bursty during archiving |
Light I/O
| Service | LXC | Rootfs | Why |
|---|---|---|---|
| vault | 106 | sdg | Sealed key reads, infrequent secret writes |
| caddy | 105 | sdg | Config reads, rare cert renewals |
| forgejo | 100 | sdg | Small infrequent git pushes |
| forgejo-runner | 101 | sdg | Bursty during CI but short-lived, mostly idle |
| netbird | 115 | sdg | VPN state, minimal disk activity |
| vaultwarden | NEW (auth) | sdg | Tiny SQLite DB (305KB) |
| pocketid | NEW (auth) | sdg | 5MB data |
| gatus | NEW (infra-apps) | sdg | 4.7MB uptime DB |
| ntfy | NEW (infra-apps) | sdg | 61KB cache |
| glance | NEW (infra-apps) | sdg | Near-zero writes |
| oauth2-proxies (external) | NEW (infra-apps) | sdg | Stateless, zero disk writes |
| code-server | NEW (tools) | sde | 1MB config, on-demand |
| thelounge | NEW (tools) | sde | 66KB data |
| qbitwebui | NEW (tools) | sde | 4.4MB, read-only UI |
| homebridge | 104 | sde | Tiny config |
| mediamanager | 112 | sde | Lightweight API calls |
| gluetun | 110 | sde | VPN tunnel, almost no disk I/O |
| seedbox (2x qBittorrent) | 111 | sde | Torrent data on sdb |
| minecraft | 109 | sde | Only active during play |
| Shoko/ROMM/Grimmory (all-might) | 116 | sde | Media on sdc bind-mount |
| filedump | 102 | sde | Serves files from sda bind-mount, tiny rootfs |
Growth Potential
| Service | LXC | Rootfs | What grows | Risk | Current → Ceiling |
|---|---|---|---|---|---|
| Loki | 108 | sdh | Log chunks | High | → 50GB+ |
| Prometheus | 108 | sdh | Metrics TSDB | High | → fills 30GB |
| Karakeep DB + Meilisearch | 117 | sdh | DB rows, search index (assets on sdd) | Medium | ~16GB → 20-30GB |
| synapse | NEW | sdh | Chat history, media, federation | High | 6.5MB → 10-20GB |
| open-webui | NEW | sdh | Chat history, uploads | Medium | 1.1GB → 5-10GB |
| n8n | NEW | sdh | Workflow execution history | Medium | 35MB → 500MB-1GB |
| Immich DB (future) | TBD | sdh | PostgreSQL + ML embeddings | High | 0 → 20-50GB |
| Tube Archivist ES (future) | TBD | sdh | Elasticsearch index | High | 0 → 5-10GB |
| ArchiveBox index (future) | TBD | sdh | SQLite + full-text index | Medium | 0 → 1-5GB |
| mediabot PostgreSQL | 113 | sde | Download history, metadata | High | 7.3GB → 30-50GB+ |
| jellyfin | 114 | sde | Metadata DB, image cache | High | 9.1GB → 30-40GB |
| MariaDB (all-might) | 116 | sde | ROMM/Grimmory/Shoko | Medium | 3.1GB → 10-20GB |
| minecraft | 109 | sde | World file | Medium | 1.2GB → 5-15GB |
| forgejo | 100 | sdg | Git repos | Low | 1.4GB → 3-5GB |
| Karakeep assets | — | sdd | Page archives (~5MB each) | High | 12GB → 100GB+ |
| Immich photos (future) | — | sdd | Raw photos/videos | Very High | 0 → 500GB+ |
| TA videos (future) | — | sdd | Downloaded videos | Very High | 0 → 500GB+ |
| AB archives (future) | — | sdd | HTML/PDF/WARC | High | 0 → 50-200GB |
| Ollama models | 107 | sdd | Model weights | Medium | 88GB → 150GB+ |
LXC Allocation Plan
→ sdg / infra-pool (54GB allocated / ~10GB used):
| LXC | Service | Allocated | Used | Notes |
|---|---|---|---|---|
| 106 | vault | 8GB | 0.7GB | Existing |
| 105 | caddy | 4GB | 2.1GB | Existing |
| 100 | forgejo | 2GB | 1.4GB | Existing |
| 101 | forgejo-runner | 8GB | 2.1GB | Existing |
| 115 | netbird | 16GB | 1.6GB | Existing, from zpool |
| NEW | auth (vaultwarden + pocketid + oauth2-proxies) | 8GB | ~1GB | Docker: images ~430MB, data ~5MB |
| NEW | infra-apps (gatus + ntfy + glance + external oauth2-proxies) | 8GB | ~1GB | Docker: images ~300MB, data ~5MB |
| Total | 54GB | ~10GB |
→ sdh / heavy-pool (94GB allocated / ~23GB used, + future apps):
| LXC | Service | Allocated | Used | Notes |
|---|---|---|---|---|
| 108 | observability (Loki, Prometheus, Grafana) | 30GB | 9.7GB | Existing |
| 117 | karakeep (DB + Meilisearch, assets on sdd) | 32GB | ~16GB | Existing, assets offloaded |
| NEW | matrix (synapse) | 16GB | ~1GB now, grows fast | Docker: synapse 377MB + data |
| NEW | ai (open-webui → connects to LXC 107) | 8GB | ~6GB | Docker: open-webui 4.97GB + 1.1GB data |
| NEW | automation (n8n) | 8GB | ~2GB | Docker: n8n 1.78GB + 35MB data |
| TBD | immich (PostgreSQL, Redis, ML — photos on sdd) | ~30GB | future | |
| TBD | tube archivist (Elasticsearch — videos on sdd) | ~20GB | future | |
| TBD | archivebox (SQLite, index — archives on sdd) | ~10GB | future | |
| Current total | 94GB | ~23GB | ||
| With future apps | ~154GB | — |
→ sde / apps-pool (168GB allocated / ~31GB used):
| LXC | Service | Allocated | Used | Notes |
|---|---|---|---|---|
| NEW | tools (code-server + thelounge + qbitwebui + oauth2-proxies) | 8GB | ~2GB | Docker: images ~1.5GB, data ~5MB |
| 104 | homebridge | 4GB | 2.3GB | Existing |
| 112 | mediamanager | 4GB | 2.5GB | Existing |
| 113 | mediabot (MediaManager, PostgreSQL, qBittorrent, Jackett, Prowlarr) | 50GB | 7.3GB | Existing |
| 114 | jellyfin (GPU passthrough, media bind-mounts) | 50GB | 9.1GB | Existing |
| 109 | minecraft | 20GB | 1.2GB | From zpool |
| 110 | gluetun (NordVPN WireGuard) | 4GB | 1.1GB | From zpool |
| 111 | seedbox (2x qBittorrent, data on sdb) | 8GB | 1.6GB | From zpool |
| 116 | all-might (Shoko, ROMM, Grimmory, MariaDB, data on sdc) | 16GB | 3.1GB | From zpool |
| 102 | filedump (data on sda bind-mount) | 4GB | ~1.5GB | From local-lvm |
| Total | 168GB | ~31GB |
→ sdd / urahara (1.8TB, ext4 dir storage):
urahara/
├── ollama/models/ → LXC 107 rootfs (88GB, medium growth)
├── karakeep/assets/ → LXC 117 bind-mount (12GB, ~5MB/bookmark)
├── immich/photos/ → future bind-mount (very high growth)
├── tubearchivist/videos/ → future bind-mount (very high growth)
└── archivebox/archives/ → future bind-mount (high growth)
| Path | Accessed by | Current size | Growth |
|---|---|---|---|
ollama/models/ |
LXC 107 | 88GB | Medium |
karakeep/assets/ |
LXC 117 | 12GB | High — ~5MB/bookmark |
immich/photos/ (future) |
TBD | 0 | Very High |
tubearchivist/videos/ (future) |
TBD | 0 | Very High |
archivebox/archives/ (future) |
TBD | 0 | High |
| Current total | ~100GB | ||
| Estimated ceiling | ~1TB+ |
Capacity Summary
| Pool | Disk | Capacity | Allocated (current) | Allocated (with future) | Headroom |
|---|---|---|---|---|---|
| infra-pool | sdg | 465GB | 54GB (12%) | 54GB | 411GB free |
| heavy-pool | sdh | 465GB | 94GB (20%) | ~154GB (33%) | ~311GB free |
| apps-pool | sde | 465GB | 168GB (36%) | 168GB | 297GB free |
| urahara | sdd | 1.8TB | ~100GB (5%) | ~1TB+ (55%+) | 800GB+ free |
Migration Phases
Phase 1 — Set up sdd as urahara
- Format sdd as ext4:
mkfs.ext4 /dev/sdd - Mount at
/mnt/pve/urahara, add to/etc/fstabusing/dev/disk/by-id/ - Add to Proxmox:
pvesm add dir urahara --path /mnt/pve/urahara --content rootdir,images - Create directories:
urahara/ ├── karakeep/assets/ ├── ollama/models/ ├── immich/photos/ ├── tubearchivist/videos/ └── archivebox/archives/
Phase 2 — Move ollama to urahara
- Stop LXC 107 →
pct move-volume 107 rootfs --storage urahara→ start → verify - Remove
ollama-diskProxmox storage → wipe sde
Phase 3 — Create new ZFS pools
zpool create infra-pool /dev/disk/by-id/<sdg-id>
zpool create heavy-pool /dev/disk/by-id/<sdh-id>
zpool create apps-pool /dev/disk/by-id/<sde-id>
pvesm add zfspool infra-pool --pool infra-pool --content rootdir
pvesm add zfspool heavy-pool --pool heavy-pool --content rootdir
pvesm add zfspool apps-pool --pool apps-pool --content rootdir
Phase 4 — Migrate existing infra to infra-pool (sdg)
- vault (106) → auto-unseal hook handles restart
- caddy (105) → brief downtime
- forgejo (100) + forgejo-runner (101)
- netbird (115) → from zpool, keep downtime short
Each: pct move-volume <id> rootfs --storage infra-pool --delete
Phase 5 — Migrate existing heavy I/O to heavy-pool (sdh)
- observability (108)
- karakeep (117)
Each: pct move-volume <id> rootfs --storage heavy-pool --delete
Phase 6 — Migrate existing light apps to apps-pool (sde)
From local-lvm: 102, 104, 112, 113, 114 From zpool: 109, 110, 111, 116
Each: pct move-volume <id> rootfs --storage apps-pool --delete
Phase 7 — Delete unused LXCs
pct destroy 208 # tailscale — no longer needed
pct destroy 1022 # docker-smb — served empty zpool/media, no purpose
Phase 8 — Decommission zpool
pvesm remove zpool
zpool destroy zpool
Phase 9 — Decompose docker-host (LXC 103)
Create new LXCs and migrate services one at a time. For each new LXC: 1. Create LXC on target pool with Docker + nesting enabled 2. Deploy service via Ansible playbook (create new playbook if needed) 3. Restore volume data from LXC 103 backup 4. Update Caddy config to point to new LXC IP 5. Verify service works 6. Remove service from LXC 103
Order (least disruptive first):
- tools (sde/apps-pool) — code-server, thelounge, qbitwebui
-
Non-critical, easy to test, good practice run
-
infra-apps (sdg/infra-pool) — gatus, ntfy, glance, external oauth2-proxies
-
Low risk, but update ntfy endpoints across services
-
automation (sdh/heavy-pool) — n8n
-
Update webhook URLs, workflow connections
-
matrix (sdh/heavy-pool) — synapse
-
Federation config, update well-known URLs
-
ai (sdh/heavy-pool) — open-webui
-
Point to LXC 107 ollama (192.168.1.107:11434), eliminate Docker ollama entirely
-
auth (sdg/infra-pool) — vaultwarden, pocketid
- LAST and most critical — all services depend on these
- Plan for brief auth outage; update all oauth2-proxy configs to new pocketid IP
-
Verify every oauth2-proxy still works after migration
-
Delete LXC 103 — only after all services confirmed working on new LXCs
Phase 10 — Set up karakeep assets bind-mount
- Stop karakeep containers on LXC 117
- Copy assets to
/mnt/pve/urahara/karakeep/assets/ - Add bind-mount:
pct set 117 -mp0 /mnt/pve/urahara/karakeep/assets,mp=/mnt/karakeep/assets - Update docker-compose to mount
/mnt/karakeep/assetsas assets volume - Verify → remove old assets from rootfs
Phase 11 — Update Ansible + IaC
- Create new playbooks for each new LXC (auth, infra-apps, matrix, ai, automation, tools)
- Update Caddy config with new IPs
- Update all oauth2-proxy configs pointing to pocketid
- Update Forgejo Actions workflows
- Remove old docker-host playbook/workflow
- Update inventory.yml
Phase 12 — Verify nvme0n1 is clean
After all migrations and deletions complete, local-lvm should be empty:
pvesm list local-lvm # should return nothing
lvs pve -o lv_name,lv_size,pool_lv --noheadings | grep -v "data\|root\|swap" # no thin volumes left
Future App Deployment (configs/DBs on sdh, data on sdd)
Immich
- LXC: New on heavy-pool (sdh), ~30GB rootfs
- Bind-mount:
/mnt/pve/urahara/immich/photos/→ photos, videos, thumbnails - I/O: Heavy — PostgreSQL, Redis, ML feature extraction
- DB growth: ~2-5GB per 100K photos
Tube Archivist
- LXC: New on heavy-pool (sdh), ~20GB rootfs
- Bind-mount:
/mnt/pve/urahara/tubearchivist/videos/→ downloaded videos - I/O: Heavy — Elasticsearch indexing, subtitle processing
- DB growth: ~5-10GB at 10K videos
ArchiveBox
- LXC: New on heavy-pool (sdh), ~10GB rootfs
- Bind-mount:
/mnt/pve/urahara/archivebox/archives/→ HTML/PDF/WARC/screenshots - I/O: Moderate — bursty during archiving, SQLite + full-text index
- DB growth: ~1-5GB for 10K entries
Key Risks
| Risk | Mitigation |
|---|---|
| Auth outage during pocketid migration | Migrate auth LAST; verify every oauth2-proxy after |
| Caddy config pointing to wrong IPs | Update Caddy atomically; test each service before moving next |
| Docker volume data loss during decomposition | Back up all docker volumes before starting Phase 9 |
| vault seal after restart | Auto-unseal hook retries for 30s |
| ZFS pool on wrong disk-id | Always use /dev/disk/by-id/ |
| zpool destroy on wrong pool | zpool status before destroy |
| netbird downtime = loss of VPN | Migrate early, verify immediately |
| sdd failure = all bulk data lost | PBS + targeted offsite backup |
| sdh fills up with future apps | 465GB; 154GB planned = 67% headroom |
| karakeep bind-mount breaks docker | Test mount path before removing old data |
| n8n webhooks break after IP change | Update all webhook URLs in workflows |
| Matrix federation issues after move | Check .well-known/matrix/server delegation |
Resolved Questions
- ~~sdd safe to wipe?~~ Yes — wiped 2026-04-04. Now designated as urahara for bulk data.
- ~~netbird on infra-pool?~~ Yes — critical service on sdg.
- ~~docker-host downtime window?~~ Replaced by phased decomposition — no single big downtime.
- ~~Mirror vs no mirror?~~ No mirror. PBS provides redundancy; 4 separate disks for isolation.
- ~~Where does ollama go?~~ LXC 107 rootfs on sdd (urahara). Docker ollama on 103 eliminated.
- ~~Future apps (immich, TA, AB)?~~ Configs/DBs on sdh, data on sdd via bind-mounts.
- ~~What about docker-host?~~ Decomposed into 6 purpose-built LXCs across all pools.
- ~~tailscale (208)?~~ Deleted — no longer needed.
- ~~docker-smb (1022)?~~ Deleted — served empty zpool/media, no purpose after zpool decommission.
- ~~filedump (102)?~~ Moved to apps-pool to fully clear NVMe.
- ~~VM 1000 (importante) / VM 1001 (elgrande)?~~ No longer exist — removed from docs.
Post-Migration: Freed/Deleted
| What | Size | Status |
|---|---|---|
| sdf | 1TB | Freed after zpool decommission — available for future use |
| nvme0n1 thin pool | ~264GB freed | All LXC rootfs migrated off — OS only remains |
| LXC 103 (docker-host) | 60GB | Deleted after decomposition into 6 LXCs |
| LXC 208 (tailscale) | 4GB | Deleted |
| LXC 1022 (docker-smb) | 4GB | Deleted |
Backup Strategy (complements this layout)
- Tier 1: PBS nightly whole-LXC snapshots — covers all LXC rootfs on all pools
- Tier 2: Targeted DB dumps + config exports (hourly/daily) — granular restore
- sdd urahara: Needs separate restic/borg jobs for bind-mounted data (karakeep assets, future immich/TA/AB data) — not covered by LXC snapshots
- See backup-strategy.md §9