Paperless-ngx AI Integration Guide

How to connect Paperless-ngx to Ollama (LXC 107) for AI-powered document tagging, classification, and enhanced OCR using Paperless-AI and Paperless-GPT.

Overview

Paperless-ngx has no built-in AI, but two community projects bridge it to Ollama:

Tool	Purpose	GitHub
Paperless-AI	Auto-classification, smart tagging, semantic search, document chat (RAG)	clusterzx/paperless-ai
Paperless-GPT	Vision-LLM-enhanced OCR, better text extraction from scans	icereed/paperless-gpt

Both run as separate Docker containers alongside Paperless-ngx and communicate via its REST API.

Prerequisites

Paperless-ngx running on LXC 124 (see setup)
Ollama running on LXC 107 (192.168.1.107:11434) with models pulled
A Paperless-ngx API token (generate via Settings → API Tokens in the web UI)

Option 1: Paperless-AI (Recommended)

Paperless-AI provides automatic document analysis, tagging, and a chat interface for querying your documents using RAG.

What it does

Watches Paperless-ngx for new documents
Sends document text to Ollama for analysis
Auto-assigns tags, correspondents, and document types
Provides a web UI for chatting with your documents (RAG)

Setup

Add this to a new compose file at /opt/paperless-ai/docker-compose.yml on LXC 124:

services:
  paperless-ai:
    image: clusterzx/paperless-ai:latest
    container_name: paperless-ai
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - /opt/paperless-ai/data:/app/data
    environment:
      - PAPERLESS_API_URL=http://paperless:8000/api
      - PAPERLESS_API_TOKEN=<your-paperless-api-token>
      - AI_PROVIDER=ollama
      - OLLAMA_API_URL=http://192.168.1.107:11434
      - OLLAMA_MODEL=llama3.1
      - SCAN_INTERVAL=300
      - PROCESS_PREDEFINED_DOCUMENTS=yes
      - ADD_AI_PROCESSED_TAG=yes
      - AI_PROCESSED_TAG_NAME=ai-processed
    networks:
      - paperless_default

networks:
  paperless_default:
    external: true
    name: paperless_default

Recommended Ollama models

Model	Size	Use case
`llama3.1`	4.7 GB	General classification and tagging
`llama3.1:70b`	40 GB	Better accuracy (if RAM allows)
`mistral`	4.1 GB	Lighter alternative

Pull the model on LXC 107:

ssh [email protected]
ollama pull llama3.1

Verification

Upload a document to Paperless-ngx
Wait for SCAN_INTERVAL (default 5 minutes)
Check if the document gets an ai-processed tag
Check logs: docker logs -f paperless-ai

Option 2: Paperless-GPT (Enhanced OCR)

Paperless-GPT uses vision-capable LLMs to re-OCR documents, producing dramatically better text from difficult scans (handwriting, faded receipts, complex layouts).

What it does

Monitors Paperless-ngx for documents with a specific tag
Sends document images to a vision LLM via Ollama
Replaces the OCR text with the LLM's output
Can also generate titles and tags

Setup

Add this to a new compose file at /opt/paperless-gpt/docker-compose.yml on LXC 124:

services:
  paperless-gpt:
    image: icereed/paperless-gpt:latest
    container_name: paperless-gpt
    restart: unless-stopped
    ports:
      - "8080:8080"
    environment:
      - PAPERLESS_BASE_URL=http://paperless:8000
      - PAPERLESS_API_TOKEN=<your-paperless-api-token>
      - LLM_PROVIDER=ollama
      - LLM_MODEL=llama3.1
      - VISION_LLM_PROVIDER=ollama
      - VISION_LLM_MODEL=llava
      - OLLAMA_HOST=http://192.168.1.107:11434
      - LLM_LANGUAGE=English
      - PAPERLESS_GPT_OCR_AUTO_TAG=paperless-gpt-ocr-auto
    networks:
      - paperless_default

networks:
  paperless_default:
    external: true
    name: paperless_default

Recommended Ollama models for vision OCR

Model	Size	Use case
`llava`	4.7 GB	Basic vision OCR
`llava:13b`	8 GB	Better accuracy on complex layouts
`minicpm-v`	5.5 GB	Good balance of speed and quality

Pull the vision model on LXC 107:

ssh [email protected]
ollama pull llava

Usage

In Paperless-ngx, create a tag called paperless-gpt-ocr-auto
Apply this tag to any document you want re-OCR'd
Paperless-GPT picks it up and processes it via the vision LLM
The document's text is replaced with the LLM's extraction
Check the Paperless-GPT web UI at http://192.168.1.124:8080

Using Both Together

Paperless-AI and Paperless-GPT serve different purposes and work well together:

Document uploaded to Paperless-ngx
  ├── Paperless-AI: auto-tags and classifies based on text content
  └── Paperless-GPT: re-OCRs difficult scans when tagged manually

IaC Considerations

These companion containers are not yet part of the main IaC playbook. To add them:

Create a Vault secret at secret/paperless-ai with the Paperless API token
Add the compose files to services/paperless-ai/ and services/paperless-gpt/
Extend ansible/playbooks/paperless.yml to deploy them
The containers join paperless_default network to reach Paperless via container name

This is left as a follow-up task because:

You need a running Paperless-ngx instance first to generate the API token
Model selection depends on testing which works best for your documents
Both tools are optional enhancements, not core functionality