Ollama — Reference

Links

Website: https://ollama.com
API docs: https://github.com/ollama/ollama/blob/main/docs/api.md
OpenAI compatibility: https://docs.ollama.com/api/openai-compatibility
Model library: https://ollama.com/library
Python client: https://github.com/ollama/ollama-python
GitHub: https://github.com/ollama/ollama

API

Ollama exposes a REST API at http://192.168.1.107:11434. No authentication required (LAN-only access).

Generate (single-turn completion)

curl -s http://192.168.1.107:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "Explain ZFS in one sentence",
  "stream": false
}' | python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"

Chat (multi-turn conversation)

curl -s http://192.168.1.107:11434/api/chat -d '{
  "model": "llama3.1",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is restic?"}
  ],
  "stream": false
}' | python3 -c "import json,sys; print(json.load(sys.stdin)['message']['content'])"

Generate embeddings

curl -s http://192.168.1.107:11434/api/embed -d '{
  "model": "llama3.1",
  "input": "homelab backup strategy"
}'

List local models

curl -s http://192.168.1.107:11434/api/tags | python3 -c \
  "import json,sys; [print(f'{m[\"name\"]:40} {m[\"size\"]/1e9:.1f}GB') for m in json.load(sys.stdin)['models']]"

Show model details

curl -s http://192.168.1.107:11434/api/show -d '{"name": "llama3.1"}'

Pull a model

curl -s http://192.168.1.107:11434/api/pull -d '{"name": "gemma2:9b", "stream": false}'

Delete a model

curl -s -X DELETE http://192.168.1.107:11434/api/delete -d '{"name": "gemma2:9b"}'

List running models (loaded in memory)

curl -s http://192.168.1.107:11434/api/ps | python3 -c \
  "import json,sys; [print(f'{m[\"name\"]} {m[\"size\"]/1e9:.1f}GB VRAM') for m in json.load(sys.stdin)['models']]"

Health check

curl -s http://192.168.1.107:11434/   # returns "Ollama is running"

Prometheus metrics

curl -s http://192.168.1.107:11434/api/metrics

OpenAI-Compatible Endpoints

These allow Ollama to be used as a drop-in replacement for OpenAI API clients.

# Chat completions (OpenAI format)
curl -s http://192.168.1.107:11434/v1/chat/completions -d '{
  "model": "llama3.1",
  "messages": [{"role": "user", "content": "Hello"}]
}'

# List models (OpenAI format)
curl -s http://192.168.1.107:11434/v1/models

# Embeddings (OpenAI format)
curl -s http://192.168.1.107:11434/v1/embeddings -d '{
  "model": "llama3.1",
  "input": "some text"
}'

CLI

Run from LXC 107 (ssh [email protected]).

# List models
ollama list

# Pull a model
ollama pull llama3.1

# Remove a model
ollama rm gemma2:9b

# Run interactive chat
ollama run llama3.1

# Show model info
ollama show llama3.1

# List running models
ollama ps

# Copy/alias a model
ollama cp llama3.1 my-custom-model

# Create a custom model from a Modelfile
ollama create my-model -f Modelfile

# Serve (already running via systemd)
ollama serve

What the API/CLI Cannot Do

Gap	Workaround
Cannot set GPU/CPU limits per model via API	Configure via `OLLAMA_NUM_GPU`, `OLLAMA_NUM_THREAD` env vars
Cannot schedule model preloading	Use `curl .../api/generate -d '{"model":"X","keep_alive":"24h"}'` to keep loaded
No built-in auth — anyone on LAN can use the API	Firewall rules or reverse proxy auth if needed
Cannot fine-tune models via API	Use external tools, then import via `ollama create`
No API to check disk usage per model	Use `ollama list` CLI (shows sizes) or `du -sh /root/.ollama/models/`
Cannot limit concurrent requests via API	Set `OLLAMA_MAX_LOADED_MODELS` and `OLLAMA_NUM_PARALLEL` env vars
Model download progress not available via non-streaming API	Use `"stream": true` when pulling to see progress