Skip to content

Gatus — Runbook

Routine Tasks

Add a new monitor

Edit services/gatus/config.yaml, add an endpoint entry, and push to main.

Restart (to reload config without a full redeploy)

ssh -i ~/.ssh/homelab_claude [email protected] \
  "pct exec 119 -- docker restart gatus"

Logs

Log Contents Location Loki query Format
Application Health check results, endpoint monitoring, alert events Docker (LXC 119) stdout {job="infra-apps", container="gatus"} Plain text

Notes: - Gatus logs contain errors=0 in successful checks — filter errors with {container="gatus"} |= "error" != "errors=0" - SSH fallback: ssh [email protected] "pct exec 119 -- docker logs gatus"


Troubleshooting

Alert not firing for a down service

  1. Check Gatus logs — is it detecting the failure?
  2. Verify the alerting webhook URL in config.yaml points to the correct n8n endpoint
  3. Check n8n logs to see if the webhook is being received but failing to forward to Matrix
  4. Test the webhook manually: curl -X POST https://n8n.eva-00.network/webhook/uptime-kuma-alert -H "X-Webhook-Token: <token>" -d '{"test": true}'

False positives / flapping alerts

  • Adjust the failure-threshold and success-threshold in config.yaml for the noisy endpoint
  • Current defaults: 3 consecutive failures to alert, 2 consecutive successes to resolve