Karakeep — AI Tagging Guide
Karakeep uses Ollama for AI-powered tagging and summarization of bookmarks. This guide covers configuration, model selection, troubleshooting, and bulk operations.
Architecture
Karakeep (LXC 117) ──HTTP──> Ollama (LXC 107)
inference worker llama3.1 / llava
polls queue.db CPU-only, 8 cores
writes tags to db.db
- Karakeep enqueues inference jobs in a SQLite queue (
queue.db) when bookmarks are created or when triggered via the admin panel. - The inference worker processes jobs sequentially (1 worker by default).
- Each bookmark gets two inference passes: text tagging (via
INFERENCE_TEXT_MODEL) and image tagging (viaINFERENCE_IMAGE_MODEL) if a screenshot exists.
Environment Variables
Provider Configuration
| Variable | Current Value | Notes |
|---|---|---|
OLLAMA_BASE_URL |
http://192.168.1.107:11434 |
Native Ollama API |
OLLAMA_KEEP_ALIVE |
-1 |
Keep model loaded permanently (avoids cold starts) |
Alternative (OpenAI-compatible endpoint):
OPENAI_API_KEY=ollama
OPENAI_BASE_URL=http://192.168.1.107:11434/v1
The OpenAI-compatible endpoint is more reliable for structured output. If set, OPENAI_API_KEY takes precedence over OLLAMA_BASE_URL.
Model Selection
| Variable | Current Value | Notes |
|---|---|---|
INFERENCE_TEXT_MODEL |
llama3.1 |
8B params, accurate but slow on CPU |
INFERENCE_IMAGE_MODEL |
llava |
7B params, standard vision model |
INFERENCE_LANG |
english |
Language for generated tags |
Timeouts
| Variable | Current Value | Notes |
|---|---|---|
INFERENCE_JOB_TIMEOUT_SEC |
120 |
Per-job timeout. Default is 30 — too short for CPU |
INFERENCE_FETCH_TIMEOUT_SEC |
300 |
HTTP request timeout to Ollama |
Known bugs:
- A hardcoded 5-minute undici headers timeout in Node.js cannot be overridden (#1586)
- INFERENCE_JOB_TIMEOUT_SEC has a 10-minute hard cap regardless of value (#2127)
Behavior
| Variable | Current Value | Notes |
|---|---|---|
INFERENCE_CONTEXT_LENGTH |
2048 (default) |
Max input tokens. Lower = faster but less context for tagging |
INFERENCE_NUM_WORKERS |
1 (default) |
Keep at 1 for CPU. Increase to 2-3 with GPU |
INFERENCE_OUTPUT_SCHEMA |
structured (default) |
Use plain if model struggles with JSON output |
INFERENCE_ENABLE_AUTO_TAGGING |
true (default) |
Auto-tag new bookmarks |
INFERENCE_ENABLE_AUTO_SUMMARIZATION |
false (default) |
AI summaries (separate from tagging) |
Model Recommendations
For CPU-only inference (current setup)
| Model | Params | Speed | Quality | Notes |
|---|---|---|---|---|
gemma3:1b |
1B | Very fast | Good | Best speed/quality for CPU |
gemma3:4b |
4B | Fast | Better | Multimodal — handles both text and images |
llama3.2:3b |
3B | Fast | Good | Lightweight Llama variant |
llama3.1 |
8B | Slow (~30s/bookmark) | Best | Current config — accurate but slow |
Community consensus: Smaller models produce better normalized, reusable tags. Larger models (11B+) tend to generate overly specific tags that aren't useful as categories.
For image tagging
| Model | Notes |
|---|---|
llava |
Standard choice, 7B, works well |
gemma3:4b+ |
Multimodal — can do both text and image tagging with one model |
Single-model option
gemma3:4b can serve as both INFERENCE_TEXT_MODEL and INFERENCE_IMAGE_MODEL, reducing memory usage and model-swap overhead.
Ollama Thread Tuning
Ollama inside an LXC may not auto-detect available cores. To force thread count:
# Create a model variant with explicit thread count
echo "FROM llama3.1
PARAMETER num_thread 8" > /tmp/Modelfile
ollama create llama3.1-8t -f /tmp/Modelfile
Verify with top — CPU usage should be ~N00% where N is the thread count.
Bulk Operations (Admin Panel)
Navigate to Settings > Admin > Background Jobs to access:
| Action | When to use |
|---|---|
| Regenerate AI Tags for Pending Bookmarks | After fixing model/timeout issues |
| Regenerate AI Tags for Failed Bookmarks | Retry after transient errors |
| Regenerate AI Tags for All Bookmarks | After switching models |
| Regenerate AI Summaries for * | Same, but for summaries |
| Recrawl Pending/Failed/All Links | Re-fetch page content |
Do not manipulate taggingStatus in db.db directly — the worker only picks up jobs from queue.db, and the admin panel is the correct way to enqueue them.
Monitoring
Check tagging progress
ssh [email protected] 'pct exec 117 -- sqlite3 \
/var/lib/docker/volumes/karakeep_data/_data/db.db \
"SELECT taggingStatus, COUNT(*) FROM bookmarks GROUP BY taggingStatus;"'
Check Ollama resource usage
ssh [email protected] "pct exec 107 -- top -bn1 | grep ollama"
# CPU should be ~N00% where N = num_thread
ssh [email protected] "pct exec 107 -- /usr/local/bin/ollama ps"
# Shows loaded models and memory usage
Check inference logs
ssh [email protected] 'pct exec 117 -- docker logs karakeep-karakeep-1 --tail 20 2>&1 | grep -iE "infer|error|fail"'
Troubleshooting
"model not found" errors
Models must be pulled with full path inside LXC:
pct exec 107 -- /usr/local/bin/ollama pull llama3.1
pct exec does not inherit $PATH — using just ollama silently fails.
Inference timeouts
- Check current timeout:
docker exec karakeep-karakeep-1 env | grep TIMEOUT - If bookmarks consistently timeout, switch to a smaller model
- Maximum practical timeout is ~5 minutes due to Node.js undici bug
Tagging stuck (pending but not processing)
- Check if Ollama has a model loaded:
ollama ps - Check inference logs for errors
- Use admin panel "Regenerate AI Tags for Pending Bookmarks" to re-enqueue
- Restart karakeep container if worker died
All bookmarks tagged as "failure"
Usually means Ollama is unreachable or model doesn't exist:
# Test from karakeep LXC
pct exec 117 -- curl -s http://192.168.1.107:11434/api/tags