Important
This application is an early preview. It may not run stably and extracted results can be inaccurate. Always check outputs for validity before using them in practice.
A web application that turns unstructured medical/lab documents into structured JSON using LLMs. Upload PDFs, images, or spreadsheets — extract data with configurable schemas and prompts, then evaluate results against ground truth.
Works with any OpenAI-compatible API: use official services (OpenAI, Mistral OCR) for convenience, or run everything fully local with self-hosted models (DeepSeek-OCR-2 via KatDocExtract, vision LLMs like Gemma 4 via vLLM) for sensitive environments.
- Upload & organize — PDF, DOC/DOCX, images, CSV/XLSX, TXT files with column selection and previews.
- Preprocessing & OCR — three extraction engines to choose from (see Preprocessing Guide)
- Visual schema editor — tree-based JSON schema editor with support for nested objects, arrays, all JSON types, import/export, and validation.
- LLM trials — run extraction trials across different prompts, schemas, and models. Temperature control, token tracking, batch execution. Works with any OpenAI-compatible endpoint.
- Evaluation — upload ground truth CSVs, compare field-by-field, compute per-field and overall accuracy metrics.
- Privacy-first — run fully local or with self-hosted providers. No forced external calls.
- Admin dashboard — user management (invitations, roles), provider configuration, Celery monitoring.
Tech stack: Vue 3 + Vite + TailwindCSS (frontend), FastAPI (backend), SQLAlchemy, Celery, Pydantic for configuration.
Before extracting data with an LLM, you need to turn your files into plain text. This step is called preprocessing (or OCR/text extraction). The app offers three engines — pick whichever works best for your document type.
Uses Docling + Tesseract to extract text directly on the server. The engine detects whether a PDF already has embedded text (like a digitally created PDF) and uses it directly. For scanned pages, it runs Tesseract OCR locally.
- Best for: Any document; works offline, no extra cost
- Limitations: Slower for large batches; Tesseract accuracy varies with image quality
- Force OCR: Enable this to treat all PDF pages as images (bypasses native text)
Sends pages to a Mistral OCR-compatible API. This can be the official Mistral cloud service or a self-hosted DeepSeek-OCR-2 instance (KatDocExtract).
- Best for: Complex layouts, tables, forms — higher accuracy than local OCR
- Limitations: Requires an API key and network access (or GPU + compose overlay)
- Tip: The engine automatically checks for embedded PDF text first. If enough text is found, it uses Docling without OCR locally, saving the API call. Disable this with "Force OCR".
Sends pages as images to any OpenAI-compatible vision model (GPT-4o, Gemma 4 via vLLM, etc.).
- Best for: Documents requiring understanding of layout and visual context
- Limitations: Slower and more expensive than dedicated OCR; requires a vision-capable model
- Tip: Same embedded-text shortcut as Mistral OCR — only sends pages to the vision model when really needed.
When enabled, the embedded-text pre-check is skipped, and every page goes through the selected engine. Useful for PDFs with garbled or incomplete embedded text.
Self-hosted: Use
docker compose -f compose.yml -f compose.deepseek.yml up -dfor local Mistral OCR, ordocker compose -f compose.yml -f compose.vllm.yml up -dfor a local vision LLM endpoint. See Compose Files.
- Docker & Docker Compose (recommended for deployment)
- An OpenAI-compatible API — official OpenAI, self-hosted vLLM, Ollama, llama.cpp, or any compatible gateway
- Optional: NVIDIA GPU + Container Toolkit (only needed for self-hosted OCR/LLM via the optional compose overlays)
- For local development: Node.js 18+ + Python 3.11+
- Clone the repository and set up environment:
git clone https://github.com/KatherLab/llmaixweb
cd llmaixweb
cp .env.example .env
# Edit .env with at minimum: SECRET_KEY, OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_MODEL- Start the stack:
docker compose up -d-
Open http://localhost:5173 and create an admin account on first visit.
-
In the admin panel, configure your LLM provider, then upload documents and run extraction trials.
First run: Images may take a few minutes to download. Watch progress with
docker compose logs -f— once the backend health check passes and migrations finish, the app is ready at http://localhost:5173.
Pre-built images at
ghcr.io/katherlab/llmaixweb-backend:latestandghcr.io/katherlab/llmaixweb-frontend:latest. Add--buildto build from source.
| File | Purpose | GPU required? |
|---|---|---|
compose.yml |
Main config — CPU-only, works with Docker & Podman | No |
compose.dev.yml |
Optional overlay — hot-reload for local development | No |
compose.deepseek.yml |
Optional overlay — self-hosted Mistral OCR API via DeepSeek-OCR-2 + KatDocExtract | Yes (24+ GB VRAM) |
compose.vllm.yml |
Optional overlay — self-hosted OpenAI-compatible endpoint via vLLM (e.g., Gemma 4 for Vision OCR) | Yes (VRAM depends on model) |
Usage examples:
# Minimal setup (CPU, uses your configured API provider)
docker compose up -d
# Development with hot-reload
docker compose -f compose.dev.yml up -d
# Self-hosted Mistral OCR via DeepSeek-OCR-2 (GPU required)
docker compose -f compose.yml -f compose.deepseek.yml up -d
# Self-hosted vision LLM via vLLM (GPU required)
docker compose -f compose.yml -f compose.vllm.yml up -d
# Combine overlays: e.g., all services
docker compose -f compose.yml -f compose.deepseek.yml -f compose.vllm.yml up -dThe overlays (
compose.deepseek.yml,compose.vllm.yml) are optional GPU-requiring services for running OCR/LLM locally. Without them, the app simply connects to your configured remote API.
Edit .env for your deployment. At minimum, configure your LLM provider and a secret key.
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
API key for LLM provider | (required) |
OPENAI_API_BASE |
Base URL for OpenAI-compatible API | (empty) |
OPENAI_API_MODEL |
Default model to use | (empty) |
SECRET_KEY |
Secret key for sessions (generate with python3 -c "import secrets; print(secrets.token_urlsafe(32))") |
(required) |
RustFS (default with docker-compose):
| Variable | Default |
|---|---|
AWS_ACCESS_KEY_ID |
rustfsadmin |
AWS_SECRET_ACCESS_KEY |
rustfsadmin |
S3_ENDPOINT_URL |
http://rustfs:9000 |
S3_BUCKET_NAME |
llmaixweb |
Local filesystem (alternative):
| Variable | Description |
|---|---|
LOCAL_DIRECTORY |
Path to local storage directory |
| Variable | Description | Default |
|---|---|---|
APP_URL |
Public app URL (for links in emails) | http://localhost:5173 |
BACKEND_CORS_ORIGINS |
Comma-separated allowed origins | http://localhost:5173 |
REQUIRE_INVITATION |
Require invitation for signup | false |
ALLOW_FIRST_ADMIN_SETUP |
Allow first user to become admin | true |
CELERY_PREPROCESS_POOL |
Pool type (auto, solo, prefork) |
auto (use solo on macOS) |
MISTRAL_API_BASE |
Mistral OCR API base URL | https://api.mistral.ai |
MISTRAL_API_KEY |
Mistral OCR API key (server default) | (empty) |
MISTRAL_OCR_ENABLED |
Enable Mistral OCR engine | false |
VISION_OCR_ENABLED |
Enable Vision LLM OCR engine | false |
VISION_OCR_API_KEY |
Vision OCR API key (server default) | (empty) |
VISION_OCR_API_BASE |
Vision OCR API base URL | (empty) |
VISION_OCR_MODEL |
Vision OCR default model | gpt-4o |
Self-hosted OCR: Use
docker compose -f compose.yml -f compose.deepseek.yml up -dfor a local Mistral OCR-compatible API (DeepSeek-OCR-2 via KatDocExtract). Usedocker compose -f compose.yml -f compose.vllm.yml up -dfor a local vision LLM endpoint (e.g. Gemma 4). Then set the correspondingMISTRAL_API_BASE/VISION_OCR_API_BASEenv vars. See.env.examplefor details.
Database & Advanced Settings (click to expand)
| Variable | Description | Default |
|---|---|---|
POSTGRES_SERVER |
Database host | postgres |
POSTGRES_USER |
Database user | postgres |
POSTGRES_PASSWORD |
Database password | postgres |
POSTGRES_DB |
Database name | llmaixweb |
CELERY_BROKER_URL |
Redis broker URL | redis://redis:6379/0 |
CELERY_RESULT_BACKEND |
Redis result backend | redis://redis:6379/0 |
DISABLE_CELERY |
Disable Celery workers | false |
INITIALIZE_CELERY |
Initialize Celery on startup | false |
ACCESS_TOKEN_EXPIRE_MINUTES |
Token expiry in minutes | 60*24*8 |
RUSTFS_ACCESS_KEY |
RustFS access key | rustfsadmin |
RUSTFS_SECRET_KEY |
RustFS secret key | rustfsadmin |
The frontend nginx proxies /api/ requests to the backend, so only one URL is needed.
The APP_URL env var controls invitation/password-reset link construction.
Testing in local network? Access from other devices requires:
- Set
APP_URLto your server IP (e.g.,http://192.168.1.100:5173)- Set
BACKEND_CORS_ORIGINSto include your server IP (e.g.,http://192.168.1.100:5173)- Restart the stack
Using a reverse proxy (nginx, Traefik, etc.)? Adjust:
APP_URLto your public domain (e.g.,https://app.example.com)BACKEND_CORS_ORIGINSto your public domain (e.g.,https://app.example.com)
The frontend binds to 5173:8080. Change the host port in compose.yml:
ports: ["5174:8080"]The backend waits for Postgres, Redis, and RustFS to be healthy. Ensure they're up:
docker compose ps # check all services are running
docker compose logs backend | tail # check backend logs
docker compose logs postgres | tail # check database logsYour .env has OPENAI_API_KEY / OPENAI_API_BASE / OPENAI_API_MODEL empty or unreachable. These are required for extraction. To skip the startup check (e.g. if you configure providers in the admin UI later), set:
OPENAI_NO_API_CHECK=trueIf the schema has changed between versions, migrations run automatically. Reset the database if needed (
docker compose down -v # removes volumes including pgdata
docker compose up -d # fresh startCelery workers may not be starting. Check:
docker compose logs worker_default
docker compose logs worker_preprocessOn macOS, multiprocessing issues can occur — set CELERY_PREPROCESS_POOL=solo in .env.
Open an issue at github.com/KatherLab/llmaixweb/issues.
See DEVELOPER.md for local development and testing instructions.
- Keep PHI strictly local unless you explicitly configure a remote provider.
- Prefer self-hosted, OpenAI-compatible endpoints for clinical data.
- Review your
.envsecrets and never commit them.
AGPL-3.0 — see LICENSE.
