Skip to content

Ollama — Setup

Local LLM inference engine. Runs natively (systemd service) on a dedicated Debian LXC (107). Listens on 0.0.0.0:11434 so other hosts and Docker containers can reach it.

Infrastructure

Host LXC ID Internal CPU RAM Disk
Debian LXC 107 192.168.1.107:11434 4 cores 32 GiB 450 GiB (sde → ollama-disk)

No public URL — not exposed through Caddy.

Observability

Logs

Ollama logs are written to /var/log/ollama.log on LXC 107 (systemd service). Logs are collected via Grafana Alloy and shipped to Loki.

Query Purpose
{job="ollama"} All Ollama logs
{job="ollama"} \|= "error" Errors only
{job="ollama"} \|= "model" Model loading/inference

Access: Grafana → Explore → Loki → Enter query

Metrics

Ollama exports Prometheus metrics at http://192.168.1.107:11434/api/metrics.

Metric PromQL Purpose
Inference requests rate(ollama_eval_count[5m]) Request throughput
Tokens generated rate(ollama_tokens_predicted[5m]) Token generation rate
Model load time ollama_load_duration_seconds Model performance

Access: Add scrape target to Prometheus config, then: Grafana → Explore → Prometheus → Enter query

IaC

Artifact Path
Playbook ansible/playbooks/ollama.yml
Workflow .forgejo/workflows/ollama.yml

The playbook manages the full lifecycle: 1. Formats /dev/sde as ext4, mounts at /mnt/pve/ollama-disk, registers as Proxmox dir storage (ollama-disk) 2. Destroys the old LXC 107 (if rootfs is not on ollama-disk) and creates a fresh one with the correct specs 3. Injects the runner's SSH key via pct exec so Ansible can connect 4. Installs and configures Ollama, then pulls all models

Re-running the playbook is safe — each step is idempotent.

Storage

Device Mount (host) Proxmox storage LXC rootfs size
/dev/sde (466 GiB) /mnt/pve/ollama-disk ollama-disk (dir) 450 GiB

Currently configured models

Model Tag Size
Llama 2 34B Q4 llama2:34b-chat-q4_0 ~20 GiB
Mistral 7B Q4 mistral:7b-instruct-v0.3-q4_0 ~5 GiB
Mistral-NeMo 12B mistral-nemo ~7 GiB
Phi 3.5 phi3.5 ~3.8 GiB
Code Llama 34B Q4 codellama:34b-instruct-q4_0 ~20 GiB
DeepSeek R1 32B deepseek-r1:32b ~19 GiB
Qwen 2.5 Coder 32B qwen2.5-coder:32b ~19 GiB

Total: ~94 GiB. Disk has ~350 GiB headroom for additional models.

Unresolved model requests

These were requested but have no standard Ollama library tag:

  • Mistral 34B Q4 — Mistral AI does not publish a 34B dense model. The MoE equivalent is mixtral:8x7b-instruct-v0.1-q4_K_M (~26 GiB). Add to the playbook once confirmed.
  • Neural Chat 34B Q4 — Intel's neural-chat on Ollama is 7B only. No 34B version exists. Add custom tag if sourced elsewhere.