ADR-023 — Remote access for the AI coding-agent dev environment

Date: 2026-05-28 Status: Accepted Related: ADR-022 · ADR-011 · ADR-008 · ADR-019 · ADR-007

Context

ADR-022 adopted a terminal-native dev stack (Ghostty → tmux → AoE → Claude Code + Codex). It picked the what but assumed it ran on Milton's Mac. In practice Milton needs to keep working from anywhere: at home (Mac/iPhone), at work (work-issued laptop on a corp VPN), travelling (any device with a browser).

Constraints driving this decision:

The agent host must be reliably always-on. The Mac sleeps, takes OS updates, has no UPS, and powers off with the lid closed — not a server.
Nothing personal lands on the work laptop. Corporate EDR/MDM flags personal VPN binaries; Netbird's overlay would collide with the corp VPN; and most workplace AUPs forbid installing personal VPN / SSH tooling on company hardware.
Phone access matters — "anywhere" includes a phone screen on a train.
Self-hosted ethos: agents and their work stay on Milton's overlay wherever practical.

Existing prior art on the homelab (don't reinvent)

Dev LXC 131 (ADR-011) — provisioned by chizuru-v2/ansible/playbooks/dev.yml. Runs code-server, has Claude Code CLI, MCP servers (gitea-mcp, mcp-grafana), Docker, and pre-cloned holo/chizuru-v2, holo/homelab-docs, claude/homelab-notes. Editor + browse-helper use case.
Codex Forgejo identity — chizuru-v2/ansible/playbooks/forgejo-codex.yml creates the codex Forgejo account (mirrors the claude bot pattern from ADR-019), token in Vault, collaborator on the relevant repos.
Netbird overlay — LXC 115, netbird.yml playbook. The personal VPN.
Browser-exposure pattern — Caddy (LXC 105) + oauth2-proxy (infra-apps LXC 119) + PocketID (LXC 123) gates internal services behind auth at *.eva-00.network URLs. code-server-dev already uses this.
Backup pattern — PBS LXC-level snapshots (per ADR-007) cover whole LXCs blanket-default. Backrest on cajita-elite handles service-specific stateful data; it is opt-in per service.

Options considered

Mac as always-on host with Netbird. Set pmset to never sleep, plug it in, SSH from anywhere. Rejected: Macs are not designed for 24/7 server duty (sleep/wake bugs, forced reboots from OS updates, kernel panics, no UPS); and any work-laptop access still requires installing a personal VPN on the work laptop, which the constraints rule out.
Cloud VM (Hetzner / DigitalOcean / Fly.io). Reliable and always-on, but cuts against the self-hosted ethos — Forgejo content and agent state leave Milton's network for a third-party cloud. Rejected.
Extend the existing dev-LXC (131) with the remaining ADR-022 pieces. Reuses one LXC but mixes two distinct trust/operations boundaries (editor
long-running agent runtime, sandboxed Codex, Remote Control sessions) into a single failure domain. Rejected after second-opinion review: persistent agent processes, broader tool access, sandbox experiments, tmux state, and AoE worktrees deserve their own CPU/memory limits, snapshot policy, and blast radius.
New dedicated agents LXC separate from dev-LXC, with browser-only access for untrusted hardware layered over the existing Caddy/oauth2-proxy pattern, sharing common Ansible roles with dev.yml. Chosen.

Decision

Host

A new unprivileged LXC, hostname agents, provisioned by chizuru-v2/ansible/playbooks/agents.yml, separate from dev-LXC 131.

Sized as a light app LXC; placed on apps-pool initially; defaults (2 cores / 2 GB RAM) per create-lxc.yml. Snapshot auth/home state separately from any bulky clone/build cache (the latter can grow to a churnier mount if needed).
No shared live working trees with code-server-dev. Agent repos are cloned independently; AoE manages per-session worktrees inside the LXC. This keeps the blast-radius argument unambiguous.
Common Ansible roles (shell tools, git, node, python, tmux, lazygit, git-delta, difftastic) are factored out of dev.yml and shared with agents.yml. Boring and thin; not a "developer platform" abstraction.
Docker policy: if Codex's --sandbox is in scope, run dockerd nested inside the unprivileged LXC (same pattern dev-LXC already uses for oauth2-proxy + Alloy). Do not mount the Proxmox host's Docker socket. Fallback if nested Docker proves painful: drop --sandbox and treat the LXC itself as Codex's containment boundary.
Vault: agents gets its own machine identity, role, and policy, scoped to agent secrets. No reuse of dev-LXC Vault material.
Netbird: the LXC joins the overlay as one more peer. Per-peer cost is noise and explicitly not a deciding factor.

Naming

LXC hostname is agents. If/when Option B lands, the public vhost is aoe.eva-00.network. The URL names the app it opens; the machine names the role it plays.

Daemon supervision

Both claude remote-control and (when Option B lands) aoe serve run as systemd units under a non-root service user, After=network-online.target, with OnFailure= set to ntfy alerting and standard unit hardening (ProtectSystem, NoNewPrivileges, PrivateTmp, ...). Interactive agent work runs in tmux (via AoE); daemons run under systemd.

Headless agent CLI auth — interactive device-code on the LXC itself

Claude Code: SSH into agents over Netbird, attach a tmux pane, run claude /login (or claude auth login). It prints a URL; open it in a browser on the Mac, sign in to claude.ai (Max), paste the returned code back to the LXC. Linux Claude Code stores credentials in ~/.claude/.credentials.json. Do not copy ~/.claude.json from the Mac — macOS Claude Code uses the Keychain, the file is not the credential, and Remote Control specifically needs full claude.ai OAuth which CLAUDE_CODE_OAUTH_TOKEN / claude setup-token cannot establish.
Codex: run codex login --device-auth on the LXC; open the code URL on the Mac, paste back. Copy ~/.codex/auth.json only as a fallback when device-auth is unavailable, and treat the file as a password — never as routine IaC state.

Backup principle

PBS LXC snapshots only. No Backrest job for agent credential dirs. Credentials are cheap to reacquire (30-second interactive device-code) and expensive to leak. Rebuild path = Ansible + interactive login, never restore-from-secret.

Access tiers — pick by trust level of the device

Device	Network	Access path	Personal creds on device?
Personal Mac at home	Home LAN or Netbird	SSH → tmux → AoE on `agents`	yes
iPhone / iPad	Cellular or wifi	Blink Shell + Netbird + mosh → tmux → AoE; or `claude.ai/code` via Safari / the Claude mobile app	yes
Work-issued laptop	Corp VPN	Browser only at `claude.ai/code` driving a `claude remote-control` session running on `agents`. No Netbird, no SSH keys, no install.	NO
Borrowed device, anywhere	Any	Same as work laptop — browser to `claude.ai/code`	NO
Browser-IDE backup	Any	Existing `code-server-dev` at its `*.eva-00.network` URL behind Caddy + oauth2-proxy + PocketID	NO
Public AoE dashboard backup	Any	`aoe serve` via Caddy + oauth2-proxy + PocketID at `aoe.eva-00.network` — deferred to a follow-up ADR, not in initial `agents.yml`	NO

Identity stays consistent

Agents push to Forgejo as the claude account per ADR-019 (and codex for any Codex-attributed pushes per forgejo-codex.yml), regardless of which client Milton is driving from.

Consequences

Enables

True "anywhere" continuity — the same Claude session can be observed from Mac, then phone on a train, then work-laptop browser, while the agent never restarts.
Clean separation of trust and operations boundaries: editor / browse work on dev-LXC, agent runtime / sandbox / Remote Control on agents. Independent resource limits, snapshots, restart policy, and failure domain.
Reuses every existing homelab pattern (Netbird, Caddy + oauth2-proxy + PocketID, Vault, Forgejo identities, PBS snapshots) — minimal new surface beyond the LXC and one playbook.
Clean policy story: zero personal software on work hardware. The work-laptop path is explicitly browser-only over outbound HTTPS to claude.ai, defensible against any AUP audit.

Costs / risks

One more LXC to provision and keep patched. Mitigated by sharing common Ansible roles with dev.yml.
Codex's Docker sandbox requires nested dockerd inside the unprivileged LXC. Acceptable (dev-LXC already runs Docker the same way). Fallback if nested Docker becomes problematic: skip --sandbox and let the LXC itself be the boundary — explicitly recorded so the fallback is not a surprise.
Claude Code Remote Control routes its control plane through Anthropic's relay (outbound HTTPS to claude.ai). The agent and its work stay on the homelab; the connection itself is via Anthropic. Same trust placement we already accept by using Claude Code at all, and explicitly avoids exposing inbound ports.
Browser sessions on managed work laptops can still be screen-recorded by corporate DLP. Sensitive work belongs on personal devices regardless.
Each ~/.codex/auth.json is a password-equivalent. Recorded in the runbook so it is treated as such, never as routine IaC state.
Env-var trap (must not be set on the agents systemd unit's Environment=): ANTHROPIC_API_KEY forces API-key auth and breaks Remote Control; per current Claude docs, DISABLE_TELEMETRY and CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC can break Remote Control's eligibility checks. Pin this in the runbook — easy to add six months from now and silently break the work-laptop path.
A claude remote-control restart issues a new environment= URL; Glance must therefore link to the env-list page (claude.ai/code), not a per-session URL.

Follow-ups

The full set of IaC changes that must land together for this ADR to be honoured:

chizuru-v2/ansible/playbooks/agents.yml — provision LXC via create-lxc.yml, join Netbird, install the ADR-022 stack (tmux, AoE, Claude Code CLI, Codex CLI, lazygit, git-delta, difftastic), clone senior-discount/ready, set up the non-root service user, lay down a systemd unit for claude remote-control --name ready --spawn=worktree with After=network-online.target, OnFailure= ntfy alert, and standard unit hardening.
chizuru-v2/ansible/inventory/hosts.yml — entry for agents under the apps-pool group; pinned lxc_id, ansible_host, lxc_storage.
chizuru-v2/ansible/playbooks/dev.yml refactor — pull common deps (shell tools, git, node, python, tmux, lazygit, git-delta, difftastic) into a shared role consumed by both dev.yml and agents.yml. Boring and thin; not a platform.
DNS — agents A record, unless covered by the existing *.eva-00.network wildcard.
Alloy / Loki labels — alloy_job_name: agents, with per-unit log labels (claude-remote-control; future aoe-serve) so units land in Grafana separately.
ntfy OnFailure= — each systemd unit alerts on failure (silent crashes are unacceptable for a "from anywhere" promise).
PBS include/exclude policy — explicit allow/deny list for what the snapshots cover on agents.
Nested-Docker constraints — document the LXC config bits beyond features=nesting=1 (lxc.apparmor.profile, lxc.cgroup2.devices.allow, etc.) using whichever existing nested-docker LXC is the closest reference.
Vault role/policy for agents, scoped to agent secrets only.
services/glance/glance.yml — one external link to https://claude.ai/code (Dev/Tools category, alongside git/code/docs). The Option B internal link is added when Option B lands.
Runbook (docs/workstation/ai-agent-dev-environment.md, "homelab install" section): log in as the service user; run claude /login and codex login --device-auth; accept the Claude workspace-trust prompt for the cloned repo directory; explicitly do not set ANTHROPIC_API_KEY, DISABLE_TELEMETRY, or CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC on the unit's Environment=.
Future ADR-024 (or next available number): land Option B — aoe serve systemd unit, Caddy vhost aoe.eva-00.network, oauth2-proxy upstream, PocketID provider + redirect URI, Vault entries for the oauth2-proxy client/cookie secrets, and a second Glance entry. Option B adds externally reachable surface and auth plumbing; it deserves its own small diff after Option A is proven in production.
The Mac's local Ghostty/theming work becomes per-client polish; parked notes live in docs/workstation/ai-agent-dev-environment-theming-notes.md.

Validation plan

Option A (Claude Code Remote Control) was validated using the Mac as a temporary host on 2026-05-28: claude remote-control ran in ~/git/ready on the Mac; the environment was driven from the work-laptop browser at claude.ai/code and from the iPhone, with verbatim file reads and live git log output proving the round-trip.

Open work is the homelab move per the Follow-ups above; once claude remote-control runs persistently in agents, the Mac is no longer required to be on for remote work. Option B is then a follow-up ADR.