Ollama — Setup
Local LLM inference engine. Runs natively (systemd service) on a dedicated Debian LXC (107). Listens on
0.0.0.0:11434so other hosts and Docker containers can reach it.
Infrastructure
| Host | LXC ID | Internal | CPU | RAM | Disk |
|---|---|---|---|---|---|
| Debian LXC | 107 | 192.168.1.107:11434 | 4 cores | 32 GiB | 450 GiB (sde → ollama-disk) |
No public URL — not exposed through Caddy.
Observability
Logs
Ollama logs are written to /var/log/ollama.log on LXC 107 (systemd service). Logs are collected via Grafana Alloy and shipped to Loki.
| Query | Purpose |
|---|---|
{job="ollama"} |
All Ollama logs |
{job="ollama"} \|= "error" |
Errors only |
{job="ollama"} \|= "model" |
Model loading/inference |
Access: Grafana → Explore → Loki → Enter query
Metrics
Ollama exports Prometheus metrics at http://192.168.1.107:11434/api/metrics.
| Metric | PromQL | Purpose |
|---|---|---|
| Inference requests | rate(ollama_eval_count[5m]) |
Request throughput |
| Tokens generated | rate(ollama_tokens_predicted[5m]) |
Token generation rate |
| Model load time | ollama_load_duration_seconds |
Model performance |
Access: Add scrape target to Prometheus config, then: Grafana → Explore → Prometheus → Enter query
IaC
| Artifact | Path |
|---|---|
| Playbook | ansible/playbooks/ollama.yml |
| Workflow | .forgejo/workflows/ollama.yml |
The playbook manages the full lifecycle:
1. Formats /dev/sde as ext4, mounts at /mnt/pve/ollama-disk, registers as Proxmox dir storage (ollama-disk)
2. Destroys the old LXC 107 (if rootfs is not on ollama-disk) and creates a fresh one with the correct specs
3. Injects the runner's SSH key via pct exec so Ansible can connect
4. Installs and configures Ollama, then pulls all models
Re-running the playbook is safe — each step is idempotent.
Storage
| Device | Mount (host) | Proxmox storage | LXC rootfs size |
|---|---|---|---|
/dev/sde (466 GiB) |
/mnt/pve/ollama-disk |
ollama-disk (dir) |
450 GiB |
Currently configured models
| Model | Tag | Size |
|---|---|---|
| Llama 2 34B Q4 | llama2:34b-chat-q4_0 |
~20 GiB |
| Mistral 7B Q4 | mistral:7b-instruct-v0.3-q4_0 |
~5 GiB |
| Mistral-NeMo 12B | mistral-nemo |
~7 GiB |
| Phi 3.5 | phi3.5 |
~3.8 GiB |
| Code Llama 34B Q4 | codellama:34b-instruct-q4_0 |
~20 GiB |
| DeepSeek R1 32B | deepseek-r1:32b |
~19 GiB |
| Qwen 2.5 Coder 32B | qwen2.5-coder:32b |
~19 GiB |
Total: ~94 GiB. Disk has ~350 GiB headroom for additional models.
Unresolved model requests
These were requested but have no standard Ollama library tag:
- Mistral 34B Q4 — Mistral AI does not publish a 34B dense model. The MoE equivalent is
mixtral:8x7b-instruct-v0.1-q4_K_M(~26 GiB). Add to the playbook once confirmed. - Neural Chat 34B Q4 — Intel's
neural-chaton Ollama is 7B only. No 34B version exists. Add custom tag if sourced elsewhere.