Gatus — Runbook
Routine Tasks
Add a new monitor
Edit services/gatus/config.yaml, add an endpoint entry, and push to main.
Restart (to reload config without a full redeploy)
ssh -i ~/.ssh/homelab_claude [email protected] \
"pct exec 119 -- docker restart gatus"
Logs
| Log | Contents | Location | Loki query | Format |
|---|---|---|---|---|
| Application | Health check results, endpoint monitoring, alert events | Docker (LXC 119) stdout | {job="infra-apps", container="gatus"} |
Plain text |
Notes:
- Gatus logs contain errors=0 in successful checks — filter errors with {container="gatus"} |= "error" != "errors=0"
- SSH fallback: ssh [email protected] "pct exec 119 -- docker logs gatus"
Troubleshooting
Alert not firing for a down service
- Check Gatus logs — is it detecting the failure?
- Verify the alerting webhook URL in
config.yamlpoints to the correct n8n endpoint - Check n8n logs to see if the webhook is being received but failing to forward to Matrix
- Test the webhook manually:
curl -X POST https://n8n.eva-00.network/webhook/uptime-kuma-alert -H "X-Webhook-Token: <token>" -d '{"test": true}'
False positives / flapping alerts
- Adjust the
failure-thresholdandsuccess-thresholdinconfig.yamlfor the noisy endpoint - Current defaults: 3 consecutive failures to alert, 2 consecutive successes to resolve