Karakeep — AI Tagging Guide

Karakeep uses Ollama for AI-powered tagging and summarization of bookmarks. This guide covers configuration, model selection, troubleshooting, and bulk operations.

Architecture

Karakeep (LXC 117) ──HTTP──> Ollama (LXC 107)
   inference worker              llama3.1 / llava
   polls queue.db                CPU-only, 8 cores
   writes tags to db.db

Karakeep enqueues inference jobs in a SQLite queue (queue.db) when bookmarks are created or when triggered via the admin panel.
The inference worker processes jobs sequentially (1 worker by default).
Each bookmark gets two inference passes: text tagging (via INFERENCE_TEXT_MODEL) and image tagging (via INFERENCE_IMAGE_MODEL) if a screenshot exists.

Environment Variables

Provider Configuration

Variable	Current Value	Notes
`OLLAMA_BASE_URL`	`http://192.168.1.107:11434`	Native Ollama API
`OLLAMA_KEEP_ALIVE`	`-1`	Keep model loaded permanently (avoids cold starts)

Alternative (OpenAI-compatible endpoint):

OPENAI_API_KEY=ollama
OPENAI_BASE_URL=http://192.168.1.107:11434/v1

The OpenAI-compatible endpoint is more reliable for structured output. If set, OPENAI_API_KEY takes precedence over OLLAMA_BASE_URL.

Model Selection

Variable	Current Value	Notes
`INFERENCE_TEXT_MODEL`	`llama3.1`	8B params, accurate but slow on CPU
`INFERENCE_IMAGE_MODEL`	`llava`	7B params, standard vision model
`INFERENCE_LANG`	`english`	Language for generated tags

Timeouts

Variable	Current Value	Notes
`INFERENCE_JOB_TIMEOUT_SEC`	`120`	Per-job timeout. Default is 30 — too short for CPU
`INFERENCE_FETCH_TIMEOUT_SEC`	`300`	HTTP request timeout to Ollama

Known bugs: - A hardcoded 5-minute undici headers timeout in Node.js cannot be overridden (#1586) - INFERENCE_JOB_TIMEOUT_SEC has a 10-minute hard cap regardless of value (#2127)

Behavior

Variable	Current Value	Notes
`INFERENCE_CONTEXT_LENGTH`	`2048` (default)	Max input tokens. Lower = faster but less context for tagging
`INFERENCE_NUM_WORKERS`	`1` (default)	Keep at 1 for CPU. Increase to 2-3 with GPU
`INFERENCE_OUTPUT_SCHEMA`	`structured` (default)	Use `plain` if model struggles with JSON output
`INFERENCE_ENABLE_AUTO_TAGGING`	`true` (default)	Auto-tag new bookmarks
`INFERENCE_ENABLE_AUTO_SUMMARIZATION`	`false` (default)	AI summaries (separate from tagging)

Model Recommendations

For CPU-only inference (current setup)

Model	Params	Speed	Quality	Notes
`gemma3:1b`	1B	Very fast	Good	Best speed/quality for CPU
`gemma3:4b`	4B	Fast	Better	Multimodal — handles both text and images
`llama3.2:3b`	3B	Fast	Good	Lightweight Llama variant
`llama3.1`	8B	Slow (~30s/bookmark)	Best	Current config — accurate but slow

Community consensus: Smaller models produce better normalized, reusable tags. Larger models (11B+) tend to generate overly specific tags that aren't useful as categories.

For image tagging

Model	Notes
`llava`	Standard choice, 7B, works well
`gemma3:4b`+	Multimodal — can do both text and image tagging with one model

Single-model option

gemma3:4b can serve as both INFERENCE_TEXT_MODEL and INFERENCE_IMAGE_MODEL, reducing memory usage and model-swap overhead.

Ollama Thread Tuning

Ollama inside an LXC may not auto-detect available cores. To force thread count:

# Create a model variant with explicit thread count
echo "FROM llama3.1
PARAMETER num_thread 8" > /tmp/Modelfile

ollama create llama3.1-8t -f /tmp/Modelfile

Verify with top — CPU usage should be ~N00% where N is the thread count.

Bulk Operations (Admin Panel)

Navigate to Settings > Admin > Background Jobs to access:

Action	When to use
Regenerate AI Tags for Pending Bookmarks	After fixing model/timeout issues
Regenerate AI Tags for Failed Bookmarks	Retry after transient errors
Regenerate AI Tags for All Bookmarks	After switching models
Regenerate AI Summaries for *	Same, but for summaries
Recrawl Pending/Failed/All Links	Re-fetch page content

Do not manipulate taggingStatus in db.db directly — the worker only picks up jobs from queue.db, and the admin panel is the correct way to enqueue them.

Monitoring

Check tagging progress

ssh [email protected] 'pct exec 117 -- sqlite3 \
  /var/lib/docker/volumes/karakeep_data/_data/db.db \
  "SELECT taggingStatus, COUNT(*) FROM bookmarks GROUP BY taggingStatus;"'

Check Ollama resource usage

ssh [email protected] "pct exec 107 -- top -bn1 | grep ollama"
# CPU should be ~N00% where N = num_thread

ssh [email protected] "pct exec 107 -- /usr/local/bin/ollama ps"
# Shows loaded models and memory usage

Check inference logs

ssh [email protected] 'pct exec 117 -- docker logs karakeep-karakeep-1 --tail 20 2>&1 | grep -iE "infer|error|fail"'

Troubleshooting

"model not found" errors

Models must be pulled with full path inside LXC:

pct exec 107 -- /usr/local/bin/ollama pull llama3.1

pct exec does not inherit $PATH — using just ollama silently fails.

Inference timeouts

Check current timeout: docker exec karakeep-karakeep-1 env | grep TIMEOUT
If bookmarks consistently timeout, switch to a smaller model
Maximum practical timeout is ~5 minutes due to Node.js undici bug

Tagging stuck (pending but not processing)

Check if Ollama has a model loaded: ollama ps
Check inference logs for errors
Use admin panel "Regenerate AI Tags for Pending Bookmarks" to re-enqueue
Restart karakeep container if worker died

All bookmarks tagged as "failure"

Usually means Ollama is unreachable or model doesn't exist:

# Test from karakeep LXC
pct exec 117 -- curl -s http://192.168.1.107:11434/api/tags