Skip to content

Paperless-ngx AI Integration Guide

How to connect Paperless-ngx to Ollama (LXC 107) for AI-powered document tagging, classification, and enhanced OCR using Paperless-AI and Paperless-GPT.

Overview

Paperless-ngx has no built-in AI, but two community projects bridge it to Ollama:

Tool Purpose GitHub
Paperless-AI Auto-classification, smart tagging, semantic search, document chat (RAG) clusterzx/paperless-ai
Paperless-GPT Vision-LLM-enhanced OCR, better text extraction from scans icereed/paperless-gpt

Both run as separate Docker containers alongside Paperless-ngx and communicate via its REST API.

Prerequisites

  • Paperless-ngx running on LXC 124 (see setup)
  • Ollama running on LXC 107 (192.168.1.107:11434) with models pulled
  • A Paperless-ngx API token (generate via Settings → API Tokens in the web UI)

Paperless-AI provides automatic document analysis, tagging, and a chat interface for querying your documents using RAG.

What it does

  • Watches Paperless-ngx for new documents
  • Sends document text to Ollama for analysis
  • Auto-assigns tags, correspondents, and document types
  • Provides a web UI for chatting with your documents (RAG)

Setup

Add this to a new compose file at /opt/paperless-ai/docker-compose.yml on LXC 124:

services:
  paperless-ai:
    image: clusterzx/paperless-ai:latest
    container_name: paperless-ai
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - /opt/paperless-ai/data:/app/data
    environment:
      - PAPERLESS_API_URL=http://paperless:8000/api
      - PAPERLESS_API_TOKEN=<your-paperless-api-token>
      - AI_PROVIDER=ollama
      - OLLAMA_API_URL=http://192.168.1.107:11434
      - OLLAMA_MODEL=llama3.1
      - SCAN_INTERVAL=300
      - PROCESS_PREDEFINED_DOCUMENTS=yes
      - ADD_AI_PROCESSED_TAG=yes
      - AI_PROCESSED_TAG_NAME=ai-processed
    networks:
      - paperless_default

networks:
  paperless_default:
    external: true
    name: paperless_default
Model Size Use case
llama3.1 4.7 GB General classification and tagging
llama3.1:70b 40 GB Better accuracy (if RAM allows)
mistral 4.1 GB Lighter alternative

Pull the model on LXC 107:

ssh [email protected]
ollama pull llama3.1

Verification

  1. Upload a document to Paperless-ngx
  2. Wait for SCAN_INTERVAL (default 5 minutes)
  3. Check if the document gets an ai-processed tag
  4. Check logs: docker logs -f paperless-ai

Option 2: Paperless-GPT (Enhanced OCR)

Paperless-GPT uses vision-capable LLMs to re-OCR documents, producing dramatically better text from difficult scans (handwriting, faded receipts, complex layouts).

What it does

  • Monitors Paperless-ngx for documents with a specific tag
  • Sends document images to a vision LLM via Ollama
  • Replaces the OCR text with the LLM's output
  • Can also generate titles and tags

Setup

Add this to a new compose file at /opt/paperless-gpt/docker-compose.yml on LXC 124:

services:
  paperless-gpt:
    image: icereed/paperless-gpt:latest
    container_name: paperless-gpt
    restart: unless-stopped
    ports:
      - "8080:8080"
    environment:
      - PAPERLESS_BASE_URL=http://paperless:8000
      - PAPERLESS_API_TOKEN=<your-paperless-api-token>
      - LLM_PROVIDER=ollama
      - LLM_MODEL=llama3.1
      - VISION_LLM_PROVIDER=ollama
      - VISION_LLM_MODEL=llava
      - OLLAMA_HOST=http://192.168.1.107:11434
      - LLM_LANGUAGE=English
      - PAPERLESS_GPT_OCR_AUTO_TAG=paperless-gpt-ocr-auto
    networks:
      - paperless_default

networks:
  paperless_default:
    external: true
    name: paperless_default
Model Size Use case
llava 4.7 GB Basic vision OCR
llava:13b 8 GB Better accuracy on complex layouts
minicpm-v 5.5 GB Good balance of speed and quality

Pull the vision model on LXC 107:

ssh [email protected]
ollama pull llava

Usage

  1. In Paperless-ngx, create a tag called paperless-gpt-ocr-auto
  2. Apply this tag to any document you want re-OCR'd
  3. Paperless-GPT picks it up and processes it via the vision LLM
  4. The document's text is replaced with the LLM's extraction
  5. Check the Paperless-GPT web UI at http://192.168.1.124:8080

Using Both Together

Paperless-AI and Paperless-GPT serve different purposes and work well together:

Document uploaded to Paperless-ngx
  ├── Paperless-AI: auto-tags and classifies based on text content
  └── Paperless-GPT: re-OCRs difficult scans when tagged manually

IaC Considerations

These companion containers are not yet part of the main IaC playbook. To add them:

  1. Create a Vault secret at secret/paperless-ai with the Paperless API token
  2. Add the compose files to services/paperless-ai/ and services/paperless-gpt/
  3. Extend ansible/playbooks/paperless.yml to deploy them
  4. The containers join paperless_default network to reach Paperless via container name

This is left as a follow-up task because:

  • You need a running Paperless-ngx instance first to generate the API token
  • Model selection depends on testing which works best for your documents
  • Both tools are optional enhancements, not core functionality