Skip to content

ADR-023 — Remote access for the AI coding-agent dev environment

Date: 2026-05-28 Status: Accepted Related: ADR-022 · ADR-011 · ADR-008 · ADR-019 · ADR-007

Context

ADR-022 adopted a terminal-native dev stack (Ghostty → tmux → AoE → Claude Code + Codex). It picked the what but assumed it ran on Milton's Mac. In practice Milton needs to keep working from anywhere: at home (Mac/iPhone), at work (work-issued laptop on a corp VPN), travelling (any device with a browser).

Constraints driving this decision:

  • The agent host must be reliably always-on. The Mac sleeps, takes OS updates, has no UPS, and powers off with the lid closed — not a server.
  • Nothing personal lands on the work laptop. Corporate EDR/MDM flags personal VPN binaries; Netbird's overlay would collide with the corp VPN; and most workplace AUPs forbid installing personal VPN / SSH tooling on company hardware.
  • Phone access matters — "anywhere" includes a phone screen on a train.
  • Self-hosted ethos: agents and their work stay on Milton's overlay wherever practical.

Existing prior art on the homelab (don't reinvent)

  • Dev LXC 131 (ADR-011) — provisioned by chizuru-v2/ansible/playbooks/dev.yml. Runs code-server, has Claude Code CLI, MCP servers (gitea-mcp, mcp-grafana), Docker, and pre-cloned holo/chizuru-v2, holo/homelab-docs, claude/homelab-notes. Editor + browse-helper use case.
  • Codex Forgejo identitychizuru-v2/ansible/playbooks/forgejo-codex.yml creates the codex Forgejo account (mirrors the claude bot pattern from ADR-019), token in Vault, collaborator on the relevant repos.
  • Netbird overlay — LXC 115, netbird.yml playbook. The personal VPN.
  • Browser-exposure pattern — Caddy (LXC 105) + oauth2-proxy (infra-apps LXC 119) + PocketID (LXC 123) gates internal services behind auth at *.eva-00.network URLs. code-server-dev already uses this.
  • Backup pattern — PBS LXC-level snapshots (per ADR-007) cover whole LXCs blanket-default. Backrest on cajita-elite handles service-specific stateful data; it is opt-in per service.

Options considered

  • Mac as always-on host with Netbird. Set pmset to never sleep, plug it in, SSH from anywhere. Rejected: Macs are not designed for 24/7 server duty (sleep/wake bugs, forced reboots from OS updates, kernel panics, no UPS); and any work-laptop access still requires installing a personal VPN on the work laptop, which the constraints rule out.
  • Cloud VM (Hetzner / DigitalOcean / Fly.io). Reliable and always-on, but cuts against the self-hosted ethos — Forgejo content and agent state leave Milton's network for a third-party cloud. Rejected.
  • Extend the existing dev-LXC (131) with the remaining ADR-022 pieces. Reuses one LXC but mixes two distinct trust/operations boundaries (editor
  • long-running agent runtime, sandboxed Codex, Remote Control sessions) into a single failure domain. Rejected after second-opinion review: persistent agent processes, broader tool access, sandbox experiments, tmux state, and AoE worktrees deserve their own CPU/memory limits, snapshot policy, and blast radius.
  • New dedicated agents LXC separate from dev-LXC, with browser-only access for untrusted hardware layered over the existing Caddy/oauth2-proxy pattern, sharing common Ansible roles with dev.yml. Chosen.

Decision

Host

A new unprivileged LXC, hostname agents, provisioned by chizuru-v2/ansible/playbooks/agents.yml, separate from dev-LXC 131.

  • Sized as a light app LXC; placed on apps-pool initially; defaults (2 cores / 2 GB RAM) per create-lxc.yml. Snapshot auth/home state separately from any bulky clone/build cache (the latter can grow to a churnier mount if needed).
  • No shared live working trees with code-server-dev. Agent repos are cloned independently; AoE manages per-session worktrees inside the LXC. This keeps the blast-radius argument unambiguous.
  • Common Ansible roles (shell tools, git, node, python, tmux, lazygit, git-delta, difftastic) are factored out of dev.yml and shared with agents.yml. Boring and thin; not a "developer platform" abstraction.
  • Docker policy: if Codex's --sandbox is in scope, run dockerd nested inside the unprivileged LXC (same pattern dev-LXC already uses for oauth2-proxy + Alloy). Do not mount the Proxmox host's Docker socket. Fallback if nested Docker proves painful: drop --sandbox and treat the LXC itself as Codex's containment boundary.
  • Vault: agents gets its own machine identity, role, and policy, scoped to agent secrets. No reuse of dev-LXC Vault material.
  • Netbird: the LXC joins the overlay as one more peer. Per-peer cost is noise and explicitly not a deciding factor.

Naming

LXC hostname is agents. If/when Option B lands, the public vhost is aoe.eva-00.network. The URL names the app it opens; the machine names the role it plays.

Daemon supervision

Both claude remote-control and (when Option B lands) aoe serve run as systemd units under a non-root service user, After=network-online.target, with OnFailure= set to ntfy alerting and standard unit hardening (ProtectSystem, NoNewPrivileges, PrivateTmp, ...). Interactive agent work runs in tmux (via AoE); daemons run under systemd.

Headless agent CLI auth — interactive device-code on the LXC itself

  • Claude Code: SSH into agents over Netbird, attach a tmux pane, run claude /login (or claude auth login). It prints a URL; open it in a browser on the Mac, sign in to claude.ai (Max), paste the returned code back to the LXC. Linux Claude Code stores credentials in ~/.claude/.credentials.json. Do not copy ~/.claude.json from the Mac — macOS Claude Code uses the Keychain, the file is not the credential, and Remote Control specifically needs full claude.ai OAuth which CLAUDE_CODE_OAUTH_TOKEN / claude setup-token cannot establish.
  • Codex: run codex login --device-auth on the LXC; open the code URL on the Mac, paste back. Copy ~/.codex/auth.json only as a fallback when device-auth is unavailable, and treat the file as a password — never as routine IaC state.

Backup principle

PBS LXC snapshots only. No Backrest job for agent credential dirs. Credentials are cheap to reacquire (30-second interactive device-code) and expensive to leak. Rebuild path = Ansible + interactive login, never restore-from-secret.

Access tiers — pick by trust level of the device

Device Network Access path Personal creds on device?
Personal Mac at home Home LAN or Netbird SSH → tmux → AoE on agents yes
iPhone / iPad Cellular or wifi Blink Shell + Netbird + mosh → tmux → AoE; or claude.ai/code via Safari / the Claude mobile app yes
Work-issued laptop Corp VPN Browser only at claude.ai/code driving a claude remote-control session running on agents. No Netbird, no SSH keys, no install. NO
Borrowed device, anywhere Any Same as work laptop — browser to claude.ai/code NO
Browser-IDE backup Any Existing code-server-dev at its *.eva-00.network URL behind Caddy + oauth2-proxy + PocketID NO
Public AoE dashboard backup Any aoe serve via Caddy + oauth2-proxy + PocketID at aoe.eva-00.networkdeferred to a follow-up ADR, not in initial agents.yml NO

Identity stays consistent

Agents push to Forgejo as the claude account per ADR-019 (and codex for any Codex-attributed pushes per forgejo-codex.yml), regardless of which client Milton is driving from.

Consequences

Enables

  • True "anywhere" continuity — the same Claude session can be observed from Mac, then phone on a train, then work-laptop browser, while the agent never restarts.
  • Clean separation of trust and operations boundaries: editor / browse work on dev-LXC, agent runtime / sandbox / Remote Control on agents. Independent resource limits, snapshots, restart policy, and failure domain.
  • Reuses every existing homelab pattern (Netbird, Caddy + oauth2-proxy + PocketID, Vault, Forgejo identities, PBS snapshots) — minimal new surface beyond the LXC and one playbook.
  • Clean policy story: zero personal software on work hardware. The work-laptop path is explicitly browser-only over outbound HTTPS to claude.ai, defensible against any AUP audit.

Costs / risks

  • One more LXC to provision and keep patched. Mitigated by sharing common Ansible roles with dev.yml.
  • Codex's Docker sandbox requires nested dockerd inside the unprivileged LXC. Acceptable (dev-LXC already runs Docker the same way). Fallback if nested Docker becomes problematic: skip --sandbox and let the LXC itself be the boundary — explicitly recorded so the fallback is not a surprise.
  • Claude Code Remote Control routes its control plane through Anthropic's relay (outbound HTTPS to claude.ai). The agent and its work stay on the homelab; the connection itself is via Anthropic. Same trust placement we already accept by using Claude Code at all, and explicitly avoids exposing inbound ports.
  • Browser sessions on managed work laptops can still be screen-recorded by corporate DLP. Sensitive work belongs on personal devices regardless.
  • Each ~/.codex/auth.json is a password-equivalent. Recorded in the runbook so it is treated as such, never as routine IaC state.
  • Env-var trap (must not be set on the agents systemd unit's Environment=): ANTHROPIC_API_KEY forces API-key auth and breaks Remote Control; per current Claude docs, DISABLE_TELEMETRY and CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC can break Remote Control's eligibility checks. Pin this in the runbook — easy to add six months from now and silently break the work-laptop path.
  • A claude remote-control restart issues a new environment= URL; Glance must therefore link to the env-list page (claude.ai/code), not a per-session URL.

Follow-ups

The full set of IaC changes that must land together for this ADR to be honoured:

  1. chizuru-v2/ansible/playbooks/agents.yml — provision LXC via create-lxc.yml, join Netbird, install the ADR-022 stack (tmux, AoE, Claude Code CLI, Codex CLI, lazygit, git-delta, difftastic), clone senior-discount/ready, set up the non-root service user, lay down a systemd unit for claude remote-control --name ready --spawn=worktree with After=network-online.target, OnFailure= ntfy alert, and standard unit hardening.
  2. chizuru-v2/ansible/inventory/hosts.yml — entry for agents under the apps-pool group; pinned lxc_id, ansible_host, lxc_storage.
  3. chizuru-v2/ansible/playbooks/dev.yml refactor — pull common deps (shell tools, git, node, python, tmux, lazygit, git-delta, difftastic) into a shared role consumed by both dev.yml and agents.yml. Boring and thin; not a platform.
  4. DNSagents A record, unless covered by the existing *.eva-00.network wildcard.
  5. Alloy / Loki labelsalloy_job_name: agents, with per-unit log labels (claude-remote-control; future aoe-serve) so units land in Grafana separately.
  6. ntfy OnFailure= — each systemd unit alerts on failure (silent crashes are unacceptable for a "from anywhere" promise).
  7. PBS include/exclude policy — explicit allow/deny list for what the snapshots cover on agents.
  8. Nested-Docker constraints — document the LXC config bits beyond features=nesting=1 (lxc.apparmor.profile, lxc.cgroup2.devices.allow, etc.) using whichever existing nested-docker LXC is the closest reference.
  9. Vault role/policy for agents, scoped to agent secrets only.
  10. services/glance/glance.yml — one external link to https://claude.ai/code (Dev/Tools category, alongside git/code/docs). The Option B internal link is added when Option B lands.
  11. Runbook (docs/workstation/ai-agent-dev-environment.md, "homelab install" section): log in as the service user; run claude /login and codex login --device-auth; accept the Claude workspace-trust prompt for the cloned repo directory; explicitly do not set ANTHROPIC_API_KEY, DISABLE_TELEMETRY, or CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC on the unit's Environment=.
  12. Future ADR-024 (or next available number): land Option B — aoe serve systemd unit, Caddy vhost aoe.eva-00.network, oauth2-proxy upstream, PocketID provider + redirect URI, Vault entries for the oauth2-proxy client/cookie secrets, and a second Glance entry. Option B adds externally reachable surface and auth plumbing; it deserves its own small diff after Option A is proven in production.
  13. The Mac's local Ghostty/theming work becomes per-client polish; parked notes live in docs/workstation/ai-agent-dev-environment-theming-notes.md.

Validation plan

Option A (Claude Code Remote Control) was validated using the Mac as a temporary host on 2026-05-28: claude remote-control ran in ~/git/ready on the Mac; the environment was driven from the work-laptop browser at claude.ai/code and from the iPhone, with verbatim file reads and live git log output proving the round-trip.

Open work is the homelab move per the Follow-ups above; once claude remote-control runs persistently in agents, the Mac is no longer required to be on for remote work. Option B is then a follow-up ADR.