Ollama — Setup

Local LLM inference engine. Runs natively (systemd service) on a dedicated Debian LXC (107). Listens on 0.0.0.0:11434 so other hosts and Docker containers can reach it.

Infrastructure

Host	LXC ID	Internal	CPU	RAM	Disk
Debian LXC	107	192.168.1.107:11434	4 cores	32 GiB	450 GiB (sde → `ollama-disk`)

No public URL — not exposed through Caddy.

Observability

Logs

Ollama logs are written to /var/log/ollama.log on LXC 107 (systemd service). Logs are collected via Grafana Alloy and shipped to Loki.

Query	Purpose
`{job="ollama"}`	All Ollama logs
`{job="ollama"} \\|= "error"`	Errors only
`{job="ollama"} \\|= "model"`	Model loading/inference

Access: Grafana → Explore → Loki → Enter query

Metrics

Ollama exports Prometheus metrics at http://192.168.1.107:11434/api/metrics.

Metric	PromQL	Purpose
Inference requests	`rate(ollama_eval_count[5m])`	Request throughput
Tokens generated	`rate(ollama_tokens_predicted[5m])`	Token generation rate
Model load time	`ollama_load_duration_seconds`	Model performance

Access: Add scrape target to Prometheus config, then: Grafana → Explore → Prometheus → Enter query

IaC

Artifact	Path
Playbook	`ansible/playbooks/ollama.yml`
Workflow	`.forgejo/workflows/ollama.yml`

The playbook manages the full lifecycle: 1. Formats /dev/sde as ext4, mounts at /mnt/pve/ollama-disk, registers as Proxmox dir storage (ollama-disk) 2. Destroys the old LXC 107 (if rootfs is not on ollama-disk) and creates a fresh one with the correct specs 3. Injects the runner's SSH key via pct exec so Ansible can connect 4. Installs and configures Ollama, then pulls all models

Re-running the playbook is safe — each step is idempotent.

Storage

Device	Mount (host)	Proxmox storage	LXC rootfs size
`/dev/sde` (466 GiB)	`/mnt/pve/ollama-disk`	`ollama-disk` (dir)	450 GiB

Currently configured models

Model	Tag	Size
Llama 2 34B Q4	`llama2:34b-chat-q4_0`	~20 GiB
Mistral 7B Q4	`mistral:7b-instruct-v0.3-q4_0`	~5 GiB
Mistral-NeMo 12B	`mistral-nemo`	~7 GiB
Phi 3.5	`phi3.5`	~3.8 GiB
Code Llama 34B Q4	`codellama:34b-instruct-q4_0`	~20 GiB
DeepSeek R1 32B	`deepseek-r1:32b`	~19 GiB
Qwen 2.5 Coder 32B	`qwen2.5-coder:32b`	~19 GiB

Total: ~94 GiB. Disk has ~350 GiB headroom for additional models.

Unresolved model requests

These were requested but have no standard Ollama library tag:

Mistral 34B Q4 — Mistral AI does not publish a 34B dense model. The MoE equivalent is mixtral:8x7b-instruct-v0.1-q4_K_M (~26 GiB). Add to the playbook once confirmed.
Neural Chat 34B Q4 — Intel's neural-chat on Ollama is 7B only. No 34B version exists. Add custom tag if sourced elsewhere.