Paperless-ngx AI Integration Guide
How to connect Paperless-ngx to Ollama (LXC 107) for AI-powered document tagging, classification, and enhanced OCR using Paperless-AI and Paperless-GPT.
Overview
Paperless-ngx has no built-in AI, but two community projects bridge it to Ollama:
| Tool | Purpose | GitHub |
|---|---|---|
| Paperless-AI | Auto-classification, smart tagging, semantic search, document chat (RAG) | clusterzx/paperless-ai |
| Paperless-GPT | Vision-LLM-enhanced OCR, better text extraction from scans | icereed/paperless-gpt |
Both run as separate Docker containers alongside Paperless-ngx and communicate via its REST API.
Prerequisites
- Paperless-ngx running on LXC 124 (see setup)
- Ollama running on LXC 107 (
192.168.1.107:11434) with models pulled - A Paperless-ngx API token (generate via Settings → API Tokens in the web UI)
Option 1: Paperless-AI (Recommended)
Paperless-AI provides automatic document analysis, tagging, and a chat interface for querying your documents using RAG.
What it does
- Watches Paperless-ngx for new documents
- Sends document text to Ollama for analysis
- Auto-assigns tags, correspondents, and document types
- Provides a web UI for chatting with your documents (RAG)
Setup
Add this to a new compose file at /opt/paperless-ai/docker-compose.yml on LXC 124:
services:
paperless-ai:
image: clusterzx/paperless-ai:latest
container_name: paperless-ai
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- /opt/paperless-ai/data:/app/data
environment:
- PAPERLESS_API_URL=http://paperless:8000/api
- PAPERLESS_API_TOKEN=<your-paperless-api-token>
- AI_PROVIDER=ollama
- OLLAMA_API_URL=http://192.168.1.107:11434
- OLLAMA_MODEL=llama3.1
- SCAN_INTERVAL=300
- PROCESS_PREDEFINED_DOCUMENTS=yes
- ADD_AI_PROCESSED_TAG=yes
- AI_PROCESSED_TAG_NAME=ai-processed
networks:
- paperless_default
networks:
paperless_default:
external: true
name: paperless_default
Recommended Ollama models
| Model | Size | Use case |
|---|---|---|
llama3.1 |
4.7 GB | General classification and tagging |
llama3.1:70b |
40 GB | Better accuracy (if RAM allows) |
mistral |
4.1 GB | Lighter alternative |
Pull the model on LXC 107:
ssh [email protected]
ollama pull llama3.1
Verification
- Upload a document to Paperless-ngx
- Wait for
SCAN_INTERVAL(default 5 minutes) - Check if the document gets an
ai-processedtag - Check logs:
docker logs -f paperless-ai
Option 2: Paperless-GPT (Enhanced OCR)
Paperless-GPT uses vision-capable LLMs to re-OCR documents, producing dramatically better text from difficult scans (handwriting, faded receipts, complex layouts).
What it does
- Monitors Paperless-ngx for documents with a specific tag
- Sends document images to a vision LLM via Ollama
- Replaces the OCR text with the LLM's output
- Can also generate titles and tags
Setup
Add this to a new compose file at /opt/paperless-gpt/docker-compose.yml on LXC 124:
services:
paperless-gpt:
image: icereed/paperless-gpt:latest
container_name: paperless-gpt
restart: unless-stopped
ports:
- "8080:8080"
environment:
- PAPERLESS_BASE_URL=http://paperless:8000
- PAPERLESS_API_TOKEN=<your-paperless-api-token>
- LLM_PROVIDER=ollama
- LLM_MODEL=llama3.1
- VISION_LLM_PROVIDER=ollama
- VISION_LLM_MODEL=llava
- OLLAMA_HOST=http://192.168.1.107:11434
- LLM_LANGUAGE=English
- PAPERLESS_GPT_OCR_AUTO_TAG=paperless-gpt-ocr-auto
networks:
- paperless_default
networks:
paperless_default:
external: true
name: paperless_default
Recommended Ollama models for vision OCR
| Model | Size | Use case |
|---|---|---|
llava |
4.7 GB | Basic vision OCR |
llava:13b |
8 GB | Better accuracy on complex layouts |
minicpm-v |
5.5 GB | Good balance of speed and quality |
Pull the vision model on LXC 107:
ssh [email protected]
ollama pull llava
Usage
- In Paperless-ngx, create a tag called
paperless-gpt-ocr-auto - Apply this tag to any document you want re-OCR'd
- Paperless-GPT picks it up and processes it via the vision LLM
- The document's text is replaced with the LLM's extraction
- Check the Paperless-GPT web UI at
http://192.168.1.124:8080
Using Both Together
Paperless-AI and Paperless-GPT serve different purposes and work well together:
Document uploaded to Paperless-ngx
├── Paperless-AI: auto-tags and classifies based on text content
└── Paperless-GPT: re-OCRs difficult scans when tagged manually
IaC Considerations
These companion containers are not yet part of the main IaC playbook. To add them:
- Create a Vault secret at
secret/paperless-aiwith the Paperless API token - Add the compose files to
services/paperless-ai/andservices/paperless-gpt/ - Extend
ansible/playbooks/paperless.ymlto deploy them - The containers join
paperless_defaultnetwork to reach Paperless via container name
This is left as a follow-up task because:
- You need a running Paperless-ngx instance first to generate the API token
- Model selection depends on testing which works best for your documents
- Both tools are optional enhancements, not core functionality