ADR-023 — Remote access for the AI coding-agent dev environment
Date: 2026-05-28 Status: Accepted Related: ADR-022 · ADR-011 · ADR-008 · ADR-019 · ADR-007
Context
ADR-022 adopted a terminal-native dev stack (Ghostty → tmux → AoE → Claude Code + Codex). It picked the what but assumed it ran on Milton's Mac. In practice Milton needs to keep working from anywhere: at home (Mac/iPhone), at work (work-issued laptop on a corp VPN), travelling (any device with a browser).
Constraints driving this decision:
- The agent host must be reliably always-on. The Mac sleeps, takes OS updates, has no UPS, and powers off with the lid closed — not a server.
- Nothing personal lands on the work laptop. Corporate EDR/MDM flags personal VPN binaries; Netbird's overlay would collide with the corp VPN; and most workplace AUPs forbid installing personal VPN / SSH tooling on company hardware.
- Phone access matters — "anywhere" includes a phone screen on a train.
- Self-hosted ethos: agents and their work stay on Milton's overlay wherever practical.
Existing prior art on the homelab (don't reinvent)
- Dev LXC 131 (ADR-011) — provisioned by
chizuru-v2/ansible/playbooks/dev.yml. Runs code-server, has Claude Code CLI, MCP servers (gitea-mcp, mcp-grafana), Docker, and pre-clonedholo/chizuru-v2,holo/homelab-docs,claude/homelab-notes. Editor + browse-helper use case. - Codex Forgejo identity —
chizuru-v2/ansible/playbooks/forgejo-codex.ymlcreates thecodexForgejo account (mirrors theclaudebot pattern from ADR-019), token in Vault, collaborator on the relevant repos. - Netbird overlay — LXC 115,
netbird.ymlplaybook. The personal VPN. - Browser-exposure pattern — Caddy (LXC 105) + oauth2-proxy
(
infra-appsLXC 119) + PocketID (LXC 123) gates internal services behind auth at*.eva-00.networkURLs.code-server-devalready uses this. - Backup pattern — PBS LXC-level snapshots
(per ADR-007) cover whole LXCs
blanket-default. Backrest on
cajita-elitehandles service-specific stateful data; it is opt-in per service.
Options considered
- Mac as always-on host with Netbird. Set
pmsetto never sleep, plug it in, SSH from anywhere. Rejected: Macs are not designed for 24/7 server duty (sleep/wake bugs, forced reboots from OS updates, kernel panics, no UPS); and any work-laptop access still requires installing a personal VPN on the work laptop, which the constraints rule out. - Cloud VM (Hetzner / DigitalOcean / Fly.io). Reliable and always-on, but cuts against the self-hosted ethos — Forgejo content and agent state leave Milton's network for a third-party cloud. Rejected.
- Extend the existing dev-LXC (131) with the remaining ADR-022 pieces. Reuses one LXC but mixes two distinct trust/operations boundaries (editor
- long-running agent runtime, sandboxed Codex, Remote Control sessions) into a single failure domain. Rejected after second-opinion review: persistent agent processes, broader tool access, sandbox experiments, tmux state, and AoE worktrees deserve their own CPU/memory limits, snapshot policy, and blast radius.
- New dedicated
agentsLXC separate from dev-LXC, with browser-only access for untrusted hardware layered over the existing Caddy/oauth2-proxy pattern, sharing common Ansible roles withdev.yml. Chosen.
Decision
Host
A new unprivileged LXC, hostname agents, provisioned by
chizuru-v2/ansible/playbooks/agents.yml, separate from dev-LXC 131.
- Sized as a light app LXC; placed on
apps-poolinitially; defaults (2 cores / 2 GB RAM) percreate-lxc.yml. Snapshot auth/home state separately from any bulky clone/build cache (the latter can grow to a churnier mount if needed). - No shared live working trees with code-server-dev. Agent repos are cloned independently; AoE manages per-session worktrees inside the LXC. This keeps the blast-radius argument unambiguous.
- Common Ansible roles (shell tools, git, node, python, tmux, lazygit,
git-delta, difftastic) are factored out of
dev.ymland shared withagents.yml. Boring and thin; not a "developer platform" abstraction. - Docker policy: if Codex's
--sandboxis in scope, run dockerd nested inside the unprivileged LXC (same pattern dev-LXC already uses for oauth2-proxy + Alloy). Do not mount the Proxmox host's Docker socket. Fallback if nested Docker proves painful: drop--sandboxand treat the LXC itself as Codex's containment boundary. - Vault:
agentsgets its own machine identity, role, and policy, scoped to agent secrets. No reuse of dev-LXC Vault material. - Netbird: the LXC joins the overlay as one more peer. Per-peer cost is noise and explicitly not a deciding factor.
Naming
LXC hostname is agents. If/when Option B lands, the public vhost is
aoe.eva-00.network. The URL names the app it opens; the machine names
the role it plays.
Daemon supervision
Both claude remote-control and (when Option B lands) aoe serve run as
systemd units under a non-root service user, After=network-online.target,
with OnFailure= set to ntfy alerting and standard unit hardening
(ProtectSystem, NoNewPrivileges, PrivateTmp, ...). Interactive agent
work runs in tmux (via AoE); daemons run under systemd.
Headless agent CLI auth — interactive device-code on the LXC itself
- Claude Code: SSH into
agentsover Netbird, attach a tmux pane, runclaude /login(orclaude auth login). It prints a URL; open it in a browser on the Mac, sign in to claude.ai (Max), paste the returned code back to the LXC. Linux Claude Code stores credentials in~/.claude/.credentials.json. Do not copy~/.claude.jsonfrom the Mac — macOS Claude Code uses the Keychain, the file is not the credential, and Remote Control specifically needs full claude.ai OAuth whichCLAUDE_CODE_OAUTH_TOKEN/claude setup-tokencannot establish. - Codex: run
codex login --device-authon the LXC; open the code URL on the Mac, paste back. Copy~/.codex/auth.jsononly as a fallback when device-auth is unavailable, and treat the file as a password — never as routine IaC state.
Backup principle
PBS LXC snapshots only. No Backrest job for agent credential dirs. Credentials are cheap to reacquire (30-second interactive device-code) and expensive to leak. Rebuild path = Ansible + interactive login, never restore-from-secret.
Access tiers — pick by trust level of the device
| Device | Network | Access path | Personal creds on device? |
|---|---|---|---|
| Personal Mac at home | Home LAN or Netbird | SSH → tmux → AoE on agents |
yes |
| iPhone / iPad | Cellular or wifi | Blink Shell + Netbird + mosh → tmux → AoE; or claude.ai/code via Safari / the Claude mobile app |
yes |
| Work-issued laptop | Corp VPN | Browser only at claude.ai/code driving a claude remote-control session running on agents. No Netbird, no SSH keys, no install. |
NO |
| Borrowed device, anywhere | Any | Same as work laptop — browser to claude.ai/code |
NO |
| Browser-IDE backup | Any | Existing code-server-dev at its *.eva-00.network URL behind Caddy + oauth2-proxy + PocketID |
NO |
| Public AoE dashboard backup | Any | aoe serve via Caddy + oauth2-proxy + PocketID at aoe.eva-00.network — deferred to a follow-up ADR, not in initial agents.yml |
NO |
Identity stays consistent
Agents push to Forgejo as the claude account per
ADR-019 (and codex for any
Codex-attributed pushes per forgejo-codex.yml), regardless of which
client Milton is driving from.
Consequences
Enables
- True "anywhere" continuity — the same Claude session can be observed from Mac, then phone on a train, then work-laptop browser, while the agent never restarts.
- Clean separation of trust and operations boundaries: editor / browse work
on dev-LXC, agent runtime / sandbox / Remote Control on
agents. Independent resource limits, snapshots, restart policy, and failure domain. - Reuses every existing homelab pattern (Netbird, Caddy + oauth2-proxy + PocketID, Vault, Forgejo identities, PBS snapshots) — minimal new surface beyond the LXC and one playbook.
- Clean policy story: zero personal software on work hardware. The
work-laptop path is explicitly browser-only over outbound HTTPS to
claude.ai, defensible against any AUP audit.
Costs / risks
- One more LXC to provision and keep patched. Mitigated by sharing common
Ansible roles with
dev.yml. - Codex's Docker sandbox requires nested dockerd inside the unprivileged
LXC. Acceptable (dev-LXC already runs Docker the same way). Fallback
if nested Docker becomes problematic: skip
--sandboxand let the LXC itself be the boundary — explicitly recorded so the fallback is not a surprise. - Claude Code Remote Control routes its control plane through Anthropic's
relay (outbound HTTPS to
claude.ai). The agent and its work stay on the homelab; the connection itself is via Anthropic. Same trust placement we already accept by using Claude Code at all, and explicitly avoids exposing inbound ports. - Browser sessions on managed work laptops can still be screen-recorded by corporate DLP. Sensitive work belongs on personal devices regardless.
- Each
~/.codex/auth.jsonis a password-equivalent. Recorded in the runbook so it is treated as such, never as routine IaC state. - Env-var trap (must not be set on the
agentssystemd unit'sEnvironment=):ANTHROPIC_API_KEYforces API-key auth and breaks Remote Control; per current Claude docs,DISABLE_TELEMETRYandCLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFICcan break Remote Control's eligibility checks. Pin this in the runbook — easy to add six months from now and silently break the work-laptop path. - A
claude remote-controlrestart issues a newenvironment=URL; Glance must therefore link to the env-list page (claude.ai/code), not a per-session URL.
Follow-ups
The full set of IaC changes that must land together for this ADR to be honoured:
chizuru-v2/ansible/playbooks/agents.yml— provision LXC viacreate-lxc.yml, join Netbird, install the ADR-022 stack (tmux, AoE, Claude Code CLI, Codex CLI, lazygit, git-delta, difftastic), clonesenior-discount/ready, set up the non-root service user, lay down a systemd unit forclaude remote-control --name ready --spawn=worktreewithAfter=network-online.target,OnFailure=ntfy alert, and standard unit hardening.chizuru-v2/ansible/inventory/hosts.yml— entry foragentsunder theapps-poolgroup; pinnedlxc_id,ansible_host,lxc_storage.chizuru-v2/ansible/playbooks/dev.ymlrefactor — pull common deps (shell tools, git, node, python, tmux, lazygit, git-delta, difftastic) into a shared role consumed by bothdev.ymlandagents.yml. Boring and thin; not a platform.- DNS —
agentsA record, unless covered by the existing*.eva-00.networkwildcard. - Alloy / Loki labels —
alloy_job_name: agents, with per-unit log labels (claude-remote-control; futureaoe-serve) so units land in Grafana separately. - ntfy
OnFailure=— each systemd unit alerts on failure (silent crashes are unacceptable for a "from anywhere" promise). - PBS include/exclude policy — explicit allow/deny list for what the
snapshots cover on
agents. - Nested-Docker constraints — document the LXC config bits beyond
features=nesting=1(lxc.apparmor.profile,lxc.cgroup2.devices.allow, etc.) using whichever existing nested-docker LXC is the closest reference. - Vault role/policy for
agents, scoped to agent secrets only. services/glance/glance.yml— one external link tohttps://claude.ai/code(Dev/Tools category, alongsidegit/code/docs). The Option B internal link is added when Option B lands.- Runbook (
docs/workstation/ai-agent-dev-environment.md, "homelab install" section): log in as the service user; runclaude /loginandcodex login --device-auth; accept the Claude workspace-trust prompt for the cloned repo directory; explicitly do not setANTHROPIC_API_KEY,DISABLE_TELEMETRY, orCLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFICon the unit'sEnvironment=. - Future ADR-024 (or next available number): land Option B —
aoe servesystemd unit, Caddy vhostaoe.eva-00.network, oauth2-proxy upstream, PocketID provider + redirect URI, Vault entries for the oauth2-proxy client/cookie secrets, and a second Glance entry. Option B adds externally reachable surface and auth plumbing; it deserves its own small diff after Option A is proven in production. - The Mac's local Ghostty/theming work becomes per-client polish;
parked notes live in
docs/workstation/ai-agent-dev-environment-theming-notes.md.
Validation plan
Option A (Claude Code Remote Control) was validated using the Mac as a
temporary host on 2026-05-28: claude remote-control ran in ~/git/ready
on the Mac; the environment was driven from the work-laptop browser at
claude.ai/code and from the iPhone, with verbatim file reads and live
git log output proving the round-trip.
Open work is the homelab move per the Follow-ups above; once
claude remote-control runs persistently in agents, the Mac is no longer
required to be on for remote work. Option B is then a follow-up ADR.