Skip to content

Ollama — Reference

  • Website: https://ollama.com
  • API docs: https://github.com/ollama/ollama/blob/main/docs/api.md
  • OpenAI compatibility: https://docs.ollama.com/api/openai-compatibility
  • Model library: https://ollama.com/library
  • Python client: https://github.com/ollama/ollama-python
  • GitHub: https://github.com/ollama/ollama

API

Ollama exposes a REST API at http://192.168.1.107:11434. No authentication required (LAN-only access).

Generate (single-turn completion)

curl -s http://192.168.1.107:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "Explain ZFS in one sentence",
  "stream": false
}' | python3 -c "import json,sys; print(json.load(sys.stdin)['response'])"

Chat (multi-turn conversation)

curl -s http://192.168.1.107:11434/api/chat -d '{
  "model": "llama3.1",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is restic?"}
  ],
  "stream": false
}' | python3 -c "import json,sys; print(json.load(sys.stdin)['message']['content'])"

Generate embeddings

curl -s http://192.168.1.107:11434/api/embed -d '{
  "model": "llama3.1",
  "input": "homelab backup strategy"
}'

List local models

curl -s http://192.168.1.107:11434/api/tags | python3 -c \
  "import json,sys; [print(f'{m[\"name\"]:40} {m[\"size\"]/1e9:.1f}GB') for m in json.load(sys.stdin)['models']]"

Show model details

curl -s http://192.168.1.107:11434/api/show -d '{"name": "llama3.1"}'

Pull a model

curl -s http://192.168.1.107:11434/api/pull -d '{"name": "gemma2:9b", "stream": false}'

Delete a model

curl -s -X DELETE http://192.168.1.107:11434/api/delete -d '{"name": "gemma2:9b"}'

List running models (loaded in memory)

curl -s http://192.168.1.107:11434/api/ps | python3 -c \
  "import json,sys; [print(f'{m[\"name\"]} {m[\"size\"]/1e9:.1f}GB VRAM') for m in json.load(sys.stdin)['models']]"

Health check

curl -s http://192.168.1.107:11434/   # returns "Ollama is running"

Prometheus metrics

curl -s http://192.168.1.107:11434/api/metrics

OpenAI-Compatible Endpoints

These allow Ollama to be used as a drop-in replacement for OpenAI API clients.

# Chat completions (OpenAI format)
curl -s http://192.168.1.107:11434/v1/chat/completions -d '{
  "model": "llama3.1",
  "messages": [{"role": "user", "content": "Hello"}]
}'

# List models (OpenAI format)
curl -s http://192.168.1.107:11434/v1/models

# Embeddings (OpenAI format)
curl -s http://192.168.1.107:11434/v1/embeddings -d '{
  "model": "llama3.1",
  "input": "some text"
}'

CLI

Run from LXC 107 (ssh [email protected]).

# List models
ollama list

# Pull a model
ollama pull llama3.1

# Remove a model
ollama rm gemma2:9b

# Run interactive chat
ollama run llama3.1

# Show model info
ollama show llama3.1

# List running models
ollama ps

# Copy/alias a model
ollama cp llama3.1 my-custom-model

# Create a custom model from a Modelfile
ollama create my-model -f Modelfile

# Serve (already running via systemd)
ollama serve

What the API/CLI Cannot Do

Gap Workaround
Cannot set GPU/CPU limits per model via API Configure via OLLAMA_NUM_GPU, OLLAMA_NUM_THREAD env vars
Cannot schedule model preloading Use curl .../api/generate -d '{"model":"X","keep_alive":"24h"}' to keep loaded
No built-in auth — anyone on LAN can use the API Firewall rules or reverse proxy auth if needed
Cannot fine-tune models via API Use external tools, then import via ollama create
No API to check disk usage per model Use ollama list CLI (shows sizes) or du -sh /root/.ollama/models/
Cannot limit concurrent requests via API Set OLLAMA_MAX_LOADED_MODELS and OLLAMA_NUM_PARALLEL env vars
Model download progress not available via non-streaming API Use "stream": true when pulling to see progress