Skip to content

Karakeep — AI Tagging Guide

Karakeep uses Ollama for AI-powered tagging and summarization of bookmarks. This guide covers configuration, model selection, troubleshooting, and bulk operations.


Architecture

Karakeep (LXC 117) ──HTTP──> Ollama (LXC 107)
   inference worker              llama3.1 / llava
   polls queue.db                CPU-only, 8 cores
   writes tags to db.db
  • Karakeep enqueues inference jobs in a SQLite queue (queue.db) when bookmarks are created or when triggered via the admin panel.
  • The inference worker processes jobs sequentially (1 worker by default).
  • Each bookmark gets two inference passes: text tagging (via INFERENCE_TEXT_MODEL) and image tagging (via INFERENCE_IMAGE_MODEL) if a screenshot exists.

Environment Variables

Provider Configuration

Variable Current Value Notes
OLLAMA_BASE_URL http://192.168.1.107:11434 Native Ollama API
OLLAMA_KEEP_ALIVE -1 Keep model loaded permanently (avoids cold starts)

Alternative (OpenAI-compatible endpoint):

OPENAI_API_KEY=ollama
OPENAI_BASE_URL=http://192.168.1.107:11434/v1

The OpenAI-compatible endpoint is more reliable for structured output. If set, OPENAI_API_KEY takes precedence over OLLAMA_BASE_URL.

Model Selection

Variable Current Value Notes
INFERENCE_TEXT_MODEL llama3.1 8B params, accurate but slow on CPU
INFERENCE_IMAGE_MODEL llava 7B params, standard vision model
INFERENCE_LANG english Language for generated tags

Timeouts

Variable Current Value Notes
INFERENCE_JOB_TIMEOUT_SEC 120 Per-job timeout. Default is 30 — too short for CPU
INFERENCE_FETCH_TIMEOUT_SEC 300 HTTP request timeout to Ollama

Known bugs: - A hardcoded 5-minute undici headers timeout in Node.js cannot be overridden (#1586) - INFERENCE_JOB_TIMEOUT_SEC has a 10-minute hard cap regardless of value (#2127)

Behavior

Variable Current Value Notes
INFERENCE_CONTEXT_LENGTH 2048 (default) Max input tokens. Lower = faster but less context for tagging
INFERENCE_NUM_WORKERS 1 (default) Keep at 1 for CPU. Increase to 2-3 with GPU
INFERENCE_OUTPUT_SCHEMA structured (default) Use plain if model struggles with JSON output
INFERENCE_ENABLE_AUTO_TAGGING true (default) Auto-tag new bookmarks
INFERENCE_ENABLE_AUTO_SUMMARIZATION false (default) AI summaries (separate from tagging)

Model Recommendations

For CPU-only inference (current setup)

Model Params Speed Quality Notes
gemma3:1b 1B Very fast Good Best speed/quality for CPU
gemma3:4b 4B Fast Better Multimodal — handles both text and images
llama3.2:3b 3B Fast Good Lightweight Llama variant
llama3.1 8B Slow (~30s/bookmark) Best Current config — accurate but slow

Community consensus: Smaller models produce better normalized, reusable tags. Larger models (11B+) tend to generate overly specific tags that aren't useful as categories.

For image tagging

Model Notes
llava Standard choice, 7B, works well
gemma3:4b+ Multimodal — can do both text and image tagging with one model

Single-model option

gemma3:4b can serve as both INFERENCE_TEXT_MODEL and INFERENCE_IMAGE_MODEL, reducing memory usage and model-swap overhead.


Ollama Thread Tuning

Ollama inside an LXC may not auto-detect available cores. To force thread count:

# Create a model variant with explicit thread count
echo "FROM llama3.1
PARAMETER num_thread 8" > /tmp/Modelfile

ollama create llama3.1-8t -f /tmp/Modelfile

Verify with top — CPU usage should be ~N00% where N is the thread count.


Bulk Operations (Admin Panel)

Navigate to Settings > Admin > Background Jobs to access:

Action When to use
Regenerate AI Tags for Pending Bookmarks After fixing model/timeout issues
Regenerate AI Tags for Failed Bookmarks Retry after transient errors
Regenerate AI Tags for All Bookmarks After switching models
Regenerate AI Summaries for * Same, but for summaries
Recrawl Pending/Failed/All Links Re-fetch page content

Do not manipulate taggingStatus in db.db directly — the worker only picks up jobs from queue.db, and the admin panel is the correct way to enqueue them.


Monitoring

Check tagging progress

ssh [email protected] 'pct exec 117 -- sqlite3 \
  /var/lib/docker/volumes/karakeep_data/_data/db.db \
  "SELECT taggingStatus, COUNT(*) FROM bookmarks GROUP BY taggingStatus;"'

Check Ollama resource usage

ssh [email protected] "pct exec 107 -- top -bn1 | grep ollama"
# CPU should be ~N00% where N = num_thread

ssh [email protected] "pct exec 107 -- /usr/local/bin/ollama ps"
# Shows loaded models and memory usage

Check inference logs

ssh [email protected] 'pct exec 117 -- docker logs karakeep-karakeep-1 --tail 20 2>&1 | grep -iE "infer|error|fail"'

Troubleshooting

"model not found" errors

Models must be pulled with full path inside LXC:

pct exec 107 -- /usr/local/bin/ollama pull llama3.1
pct exec does not inherit $PATH — using just ollama silently fails.

Inference timeouts

  1. Check current timeout: docker exec karakeep-karakeep-1 env | grep TIMEOUT
  2. If bookmarks consistently timeout, switch to a smaller model
  3. Maximum practical timeout is ~5 minutes due to Node.js undici bug

Tagging stuck (pending but not processing)

  1. Check if Ollama has a model loaded: ollama ps
  2. Check inference logs for errors
  3. Use admin panel "Regenerate AI Tags for Pending Bookmarks" to re-enqueue
  4. Restart karakeep container if worker died

All bookmarks tagged as "failure"

Usually means Ollama is unreachable or model doesn't exist:

# Test from karakeep LXC
pct exec 117 -- curl -s http://192.168.1.107:11434/api/tags