RFC — Grafana Dashboards, Warp CLI & Linting

Generated: 2026-03-23

1. Grafana Dashboard Proposals

Based on current observability data:

Already flowing into Prometheus: - node-exporter on LXC 103 (192.168.1.22:9100) and LXC 108 (192.168.1.108:9100) - cAdvisor on LXC 103 (192.168.1.22:9091) - Ollama metrics (192.168.1.107:11434/api/metrics) - Prometheus self-scrape

Already flowing into Loki: - All Docker containers on LXC 103 via Promtail - Forgejo runner (LXC 101) via systemd Promtail - Forgejo (LXC 100) via cron Python log pusher

Dashboard A: Homelab Overview

Works today (partial). The single pane of glass — land here first.

Panel	Type	Query
Host CPU heatmap (LXC 103 + 108)	Stat/Gauge	`100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)`
Host memory % used	Gauge	`(1 - node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes) * 100`
Disk usage (103 + 108)	Bar gauge	`(node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100`
Docker container count	Stat	`count(container_last_seen{name!=""})`
Container restart storm (last 1h)	Stat	`sum(increase(container_start_time_seconds[1h]))`
Recent log errors (all services)	Logs panel	`{host="docker-host"} \\|= "error" \\| json \\| __error__=""`
Loki ingestion rate	Time series	`sum(rate(loki_ingester_samples_ingested_total[5m]))`
Prometheus targets up	Stat	`count(up == 1)` vs `count(up == 0)`

Dashboard B: Host Metrics (Node Exporter)

Works today.

Import Grafana community dashboard ID 1860 ("Node Exporter Full") — covers everything out of box. Add a template variable to switch between 192.168.1.22:9100 and 192.168.1.108:9100.

Key panels included: CPU steal/iowait, memory pressure, disk I/O saturation, network throughput, filesystem fill prediction, load average.

Gap: Only LXC 103 and 108 have node-exporter. Add to other LXCs when needed. Priority: LXC 101 (runner), LXC 106 (Vault).

Dashboard C: Docker Containers (cAdvisor)

Works today.

Import Grafana community dashboard ID 14282 or 193.

Custom panels to add on top:

Panel	Query
Container CPU top-5	`topk(5, sum by(name) (rate(container_cpu_usage_seconds_total{name!=""}[5m])))`
Container OOM kills	`increase(container_oom_events_total[1h])`
Container restarts (per container)	`increase(container_start_time_seconds{name!=""}[24h])`
Memory over limit %	`container_memory_usage_bytes / container_spec_memory_limit_bytes * 100`

Dashboard D: CI/CD — Forgejo Actions

Works today (Loki-based).

Adjust {job="forgejo"} to match the labels your Python log pusher applies.

Panel	Query
Workflow runs (last 24h)	`count_over_time({job="forgejo"} \\|= "workflow_run" [24h])`
Failed workflows	`count_over_time({job="forgejo"} \\|= "conclusion=failure" [24h])`
Workflow run log stream	`{job="forgejo"} \\|= "workflow_run"`
Runner errors	`{job="runner"} \\|= "error"`
Runner job queue depth	`count_over_time({job="runner"} \\|= "pick up task" [5m])`
Recent log tail	Logs panel, `{job="forgejo"}`, last 50 lines

Stretch: Forgejo exposes /metrics (Prometheus format) natively. Adding it as a scrape target gives richer data: push events, active runners, repo counts. Just needs a target entry in prometheus-config.yml.

Dashboard E: Ollama / LLM Inference

Works today — Prometheus already scrapes 192.168.1.107:11434/api/metrics.

Verify exact metric names by querying Prometheus Explore (metric_name{job="ollama"}) first, as Ollama's schema has evolved.

Panel	Query
Active inference requests	`ollama_requests_in_flight`
Request rate (req/s)	`rate(promhttp_metric_handler_requests_total[5m])`
Model load duration	histogram from `ollama_model_load_duration_seconds`
Go GC pressure	`rate(go_gc_duration_seconds_count[5m])`
Memory (Go heap)	`go_memstats_heap_inuse_bytes`
GPU vRAM (if available)	`ollama_gpu_memory_used_bytes`

Dashboard F: Observability Stack Self-Monitoring

Works today.

Panel	Query
Prometheus ingestion rate	`rate(prometheus_tsdb_samples_appended_total[5m])`
Prometheus storage size	`prometheus_tsdb_storage_blocks_bytes`
Prometheus query duration p99	`histogram_quantile(0.99, rate(prometheus_engine_query_duration_seconds_bucket[5m]))`
Active Prometheus scrape targets	`count(up)`
Loki log lines/sec	`sum(rate(loki_ingester_samples_ingested_total[5m]))`
Loki active streams	`loki_ingester_memory_streams`
Promtail send rate	`rate(promtail_sent_entries_total[5m])`
Grafana active users (log-based)	`count_over_time({container_name="grafana"} \\|= "login" [5m])`

Dashboard G: Security & Auth

Needs Vault scrape target added to Prometheus.

Enable Vault metrics — add to services/loki-stack/prometheus-config.yml:

- job_name: vault
  static_configs:
    - targets: ['192.168.1.106:8200']
  metrics_path: /v1/sys/metrics
  params:
    format: ['prometheus']
  bearer_token: <vault_read_token>

Panel	Query
Vault seal status	`vault_core_unsealed` (1 = unsealed)
Secret read rate	`rate(vault_secret_kv_count[5m])`
Token auth rate	`rate(vault_token_lookup[5m])`
PocketID login attempts (log)	`{container_name="pocketid"} \\|= "login"`
PocketID auth failures (log)	`{container_name="pocketid"} \\|= "unauthorized"`
Vaultwarden logins (log)	`{container_name="vaultwarden"} \\|= "User .* logged in"`

Dashboard H: Application Services (Logs-based)

Works today — all Docker containers on LXC 103 ship logs via Promtail.

Adjust container_name labels to match what Promtail actually assigns (check with a Loki label browser).

Service	Panel	LogQL
n8n	Workflow executions	`{container_name="n8n"} \\|= "Execution finished"`
n8n	Workflow errors	`{container_name="n8n"} \\|= "error" \\| json`
Open WebUI	Active chat sessions	`{container_name="open-webui"} \\|= "chat"`
Matrix	Federation errors	`{container_name="synapse"} \\|= "ERROR"`
The Lounge	Connected users	`{container_name="thelounge"} \\|= "connected"`
Seedbox	Download completions	`{container_name="qbittorrent"} \\|= "Torrent finished"`
Gluetun	VPN reconnects	`{container_name="gluetun"} \\|= "Connected"`
Caddy	Upstream errors	Not yet — needs Promtail on LXC 105

Dashboard I: Uptime / Gatus

Needs Gatus scrape target added to Prometheus.

Gatus exposes /metrics (Prometheus format) on port 8080. Add to prometheus-config.yml:

- job_name: gatus
  static_configs:
    - targets: ['192.168.1.22:8080']

Panel	Query
Service availability %	`avg(gatus_results_success) by (name) * 100`
Response time p95	`histogram_quantile(0.95, rate(gatus_results_duration_ms_bucket[5m]))`
Down services count	`count(gatus_results_success == 0)`
Per-service time series	`gatus_results_success{name=~".+"}`

This gives Gatus data proper Grafana visualization — much richer than Gatus's built-in UI.

Exporter Gap Summary

LXC / Service	Missing	Priority
Vault (106)	Prometheus scrape of `/v1/sys/metrics`	High
Forgejo (100)	Prometheus scrape of `/metrics`	High
Gatus	Prometheus scrape of `/metrics`	High
Runner (101)	node-exporter	Medium
Caddy (105)	Promtail + Caddy metrics	Medium
Gluetun (110)	node-exporter	Low
Ollama GPU	Verify metric names exist in Prometheus	Medium

All of these are config changes only — no new software to deploy.

2. Warp CLI — Worth It?

There are two distinct products:

Warp Terminal — full terminal app replacement (macOS/Linux/Windows), block-based output, AI assistant, MCP integration
oz CLI — headless agent runner; runs Warp AI agents in scripts, CI/CD pipelines, or any environment without the terminal app

Terminal App

Worth it if: - You want MCP-native AI assistance in the terminal (Grafana, Forgejo, Proxmox MCPs surface directly in your terminal session) - Block-based output appeals (each command's output is a discrete block you can search, copy, or share — useful for long Ansible runs and Docker build logs) - You use Warp Drive to store/share frequent commands (Vault write workflow, Proxmox queries, n8n API calls) - WARP.md project context files per repo (analogous to CLAUDE.md)

Not worth it if: - You use tmux — Warp kills tmux compatibility. Hard blocker. - Privacy/closed-source is a concern — closed source, free tier requires telemetry for AI features, all AI routes through GCP (US) - Your current terminal (iTerm2/Ghostty/Kitty + zsh) is already well-configured

`oz` CLI (the more interesting piece)

Warp's headless agent runner — runs AI agents with MCP access inside Forgejo Actions pipelines. Drop oz agent run into a workflow job, give it your Grafana/Forgejo/Vault MCP servers, and it can reason about infra mid-pipeline (e.g., check Grafana for anomalies before deploying, open a Forgejo issue on failure).

This is new capability that doesn't exist elsewhere for a self-hosted setup.

Privacy & Drawbacks

Account required for AI features
Free tier: telemetry must be enabled for AI (paid plans can opt out)
All AI + Warp Drive sync routes through GCP US — not air-gap friendly
Closed source (vs iTerm2, Kitty, Ghostty, Alacritty which are all open source)
Higher resource usage than lightweight terminals

Recommendation

Try oz CLI experimentally in one Forgejo workflow. The ability to run an AI agent with MCP context inside CI is novel and worth a spike on a non-critical workflow.

Terminal app: Skip for now if tmux is part of your workflow. Revisit if that changes.

3. Linting Strategy

Recommended Stack

Layer	Tool	What it catches
Pre-commit (local)	`pre-commit` framework	Fast feedback before commit
YAML	`yamllint`	All YAML syntax + style
Ansible	`ansible-lint` (profile: `moderate`)	Playbook correctness, FQCN, idempotence
Shell	`shellcheck`	Script bugs, quoting issues
IaC security	`trivy fs`	Dockerfile/compose misconfigs + hardcoded secrets
Docker Compose	`docker compose config`	Syntax + variable resolution
Python	`ruff`	Fast replacement for flake8+isort
PR comments	`reviewdog`	Turns lint output into inline Forgejo PR annotations

Config Files

.ansible-lint:

profile: moderate
warn_list:
  - yaml[line-length]
skip_list:
  - experimental
exclude_paths:
  - .git/

.yamllint.yml:

extends: default
rules:
  line-length:
    max: 120
    allow-non-breakable-inline-mappings: true
  truthy:
    allowed-values: ["true", "false"]
    check-keys: false

.pre-commit-config.yaml:

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.6.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-merge-conflict
      - id: detect-private-key

  - repo: https://github.com/adrienverge/yamllint
    rev: v1.35.1
    hooks:
      - id: yamllint
        args: [-c, .yamllint.yml]

  - repo: https://github.com/ansible/ansible-lint
    rev: v25.1.3
    hooks:
      - id: ansible-lint

  - repo: https://github.com/shellcheck-py/shellcheck-py
    rev: v0.10.0.1
    hooks:
      - id: shellcheck
        args: [--severity=warning]

  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.21.2
    hooks:
      - id: gitleaks

Forgejo Workflows

.forgejo/workflows/lint.yml — runs on all pushes and PRs:

name: Lint

on: [push, pull_request]

jobs:
  yaml:
    runs-on: native
    container:
      image: pipelinecomponents/yamllint:latest
    steps:
      - uses: actions/checkout@v4
      - run: yamllint -c .yamllint.yml .

  ansible:
    runs-on: native
    container:
      image: ghcr.io/ansible/ansible-lint:latest
    steps:
      - uses: actions/checkout@v4
      - run: ansible-lint --profile moderate

  shellcheck:
    runs-on: native
    steps:
      - uses: actions/checkout@v4
      - run: |
          apt-get update -qq && apt-get install -y -qq shellcheck
          find . -name "*.sh" -print0 | xargs -0 shellcheck --severity=warning

  compose-validate:
    runs-on: native
    steps:
      - uses: actions/checkout@v4
      - run: |
          find . -name "docker-compose*.yml" -print0 | while IFS= read -r -d '' f; do
            echo "Validating $f"
            docker compose -f "$f" config --quiet
          done

.forgejo/workflows/security.yml — runs on push to main and PRs:

name: Security Scan

on:
  push:
    branches: [main]
  pull_request:

jobs:
  trivy:
    runs-on: native
    container:
      image: aquasec/trivy:latest
    steps:
      - uses: actions/checkout@v4
      - name: Scan IaC and secrets
        run: |
          trivy fs . \
            --scanners misconfig,secret \
            --severity HIGH,CRITICAL \
            --exit-code 1

On MegaLinter

MegaLinter bundles 100+ linters into one Docker image and can auto-fix and commit back. It works in Forgejo Actions via docker run (not via the marketplace action, which has resolution issues in Forgejo).

Tradeoff: Large image, requires Docker-in-Docker on your runner, harder to debug. The granular per-job approach above is easier to manage, parallelizes naturally, and caches better. Consider MegaLinter later if you want unified HTML reports.

MegaLinter flavor for this stack: oxsecurity/megalinter-ci_light:v8.

On AI Review Agents

For a self-hosted Forgejo setup, two options are viable today:

reviewdog — posts inline diff annotations on Forgejo PRs from deterministic linter output. No AI involved but dramatically better UX than CI pass/fail. Supports gitea reporter natively.
ai-review (Nikita-Filonov) — AI-powered PR review comments using Ollama (fully on-prem), Claude, or GPT-4. Posts inline Gitea/Forgejo PR comments. Experimental but working.

Both are worth setting up once baseline linting is clean, because AI review is most useful when it's not competing with dozens of style violations.

Adoption Order

Add .ansible-lint + .yamllint.yml to the repo — local config only, zero CI impact
Install pre-commit locally, run pre-commit run --all-files to see baseline
Fix highest-severity violations
Add lint.yml workflow in soft-fail mode first (--soft-fail / ignore exit code)
Remove soft-fail once the baseline is clean
Add security.yml
Add reviewdog for inline PR comments
Experiment with ai-review + Ollama

Pre-commit vs CI division of responsibility

Layer	Runs when	Speed	Purpose
Pre-commit	`git commit` (local)	Fast — staged files only	Immediate feedback, block bad commits early
CI (Forgejo Actions)	Push / PR	Full repo scan	Authoritative gate, environment-independent

Never rely solely on pre-commit — git commit --no-verify bypasses it. CI is the real enforcement layer.

RFC — Grafana Dashboards, Warp CLI & Linting

Table of Contents

1. Grafana Dashboard Proposals

Dashboard A: Homelab Overview

Dashboard B: Host Metrics (Node Exporter)

Dashboard C: Docker Containers (cAdvisor)

Dashboard D: CI/CD — Forgejo Actions

Dashboard E: Ollama / LLM Inference

Dashboard F: Observability Stack Self-Monitoring

Dashboard G: Security & Auth

Dashboard H: Application Services (Logs-based)

Dashboard I: Uptime / Gatus

Exporter Gap Summary

2. Warp CLI — Worth It?

Terminal App

oz CLI (the more interesting piece)

Privacy & Drawbacks

Recommendation

3. Linting Strategy

Recommended Stack

Config Files

Forgejo Workflows

On MegaLinter

On AI Review Agents

Adoption Order

Pre-commit vs CI division of responsibility

`oz` CLI (the more interesting piece)