A Mistral-API compatible document parsing server that converts PDFs, images, and Office documents into clean, formatted Markdown using DeepSeek-OCR.
This project is designed as a drop-in OCR replacement for LLM pipelines (RAG, chatbots, Open WebUI, etc.) that need high-quality extraction from complex documents.
Important: This repository does not ship the model runtime. The Docker Compose setup includes a vLLM container, but you can also point the wrapper at an external vLLM instance if you already run one.
vLLM recipe (DeepSeek-OCR):https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-OCR.html
-
Mistral Compatibility
The API structure mirrors Mistral OCR-style endpoints, making integration with existing tools easy. -
Smart Hybrid PDF Parsing
- Digital PDFs: Extracts text from the PDF text layer (fast & accurate), while using Vision AI only for figures/charts.
- Scanned PDFs / broken encoding: Automatically detects scanned pages or bad encoding (“mojibake”) and switches to full Vision OCR.
-
Office Support (
.docx,.pptx, …)
Converts Office docs to PDF via LibreOffice, then processes the resulting PDF. -
High Performance (Async Map-Reduce)
Pages and figure crops are processed concurrently, bounded by a configurable vLLM concurrency limit. -
Figure Understanding Detects charts/diagrams, crops them, and asks the Vision model to describe them, inserting the description into the Markdown flow.
- vLLM serves
deepseek-ai/DeepSeek-OCRas an OpenAI-compatible/v1endpoint. - This server accepts documents and:
- For PDFs: decides per page whether to use text-layer extraction or Vision OCR fallback.
- For figures: crops and re-queries the model for figure descriptions (optional).
- Returns a structured response with Markdown pages (and optional image/table objects).
- Docker + Docker Compose
- NVIDIA GPU + drivers (strongly recommended for vLLM)
- Python 3.13+ recommended
uv(recommended) or pip/venv- Optional but recommended:
- LibreOffice (
soffice) for Office docs (.doc/.docx/.ppt/.pptx/.odt/.odp)
- LibreOffice (
- System libs:
- Uses PyMuPDF (
fitz) for PDF rendering
- Uses PyMuPDF (
Note: The provided
Dockerfilealready includes LibreOffice and common fonts.
There are two ways to run this:
- Recommended: Docker Compose (includes vLLM + the wrapper)
- Manual: run vLLM + the API yourself (useful for dev or existing infra)
This starts both services:
vllm(OpenAI-compatible/v1endpoint servingdeepseek-ai/DeepSeek-OCR)ocr-api(this wrapper, exposed on http://localhost:3001)
Copy and edit the environment file:
cp .env.example .env
# Edit .env (optionally set VLLM_MODEL, CONCURRENT_REQUEST_LIMIT, heuristic knobs, etc.)Notes:
-
The wrapper talks to vLLM inside the Compose network.
compose.ymlsets:VLLM_BASE_URL=http://vllm:8000/v1
-
Hugging Face cache is persisted:
~/.cache/huggingface:/root/.cache/huggingface
-
File storage is persisted:
./files_store:/data/files_store
docker compose up -d --buildhttp://localhost:3001
View logs:
docker compose logs -fStop:
docker compose downUse this if you already run vLLM elsewhere, want to develop locally, or prefer a venv-based workflow.
This wrapper expects an OpenAI-compatible server at VLLM_BASE_URL (defaults to http://localhost:8000/v1).
uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend autovllm serve deepseek-ai/DeepSeek-OCR \
--logits_processors vllm.model_executor.models.deepseek_ocr:NGramPerReqLogitsProcessor \
--no-enable-prefix-caching \
--mm-processor-cache-gb 0Why these flags?
- The custom logits processor is important for best OCR/Markdown behavior.
- Prefix caching and multimodal processor caching usually don’t help OCR-style single-turn calls and may add overhead.
We recommend using uv.
uv syncCopy and edit the environment file:
cp .env.example .env
# Edit .env and set VLLM_BASE_URL (and optionally VLLM_MODEL).env.example (important fields)
VLLM_BASE_URL– your vLLM OpenAI-compatible endpoint (e.g.http://localhost:8000/v1)VLLM_MODEL– model name (defaultdeepseek-ai/DeepSeek-OCR)CONCURRENT_REQUEST_LIMIT– max concurrent requests to vLLM (tune for GPU memory)
uv run uvicorn app:app --host 0.0.0.0 --port 8001 --reloadOpen the Web UI:
http://localhost:8001
This is the main endpoint. It accepts a document object plus optional parsing controls.
Minimal example (remote PDF URL):
curl -X POST "http://localhost:3001/v1/ocr" \
-H "Content-Type: application/json" \
-d '{
"document": {
"type": "document_url",
"document_url": "https://example.com/report.pdf"
},
"pdf_mode": "auto",
"inline_figure_text": true
}'Minimal example (remote image URL):
curl -X POST "http://localhost:3001/v1/ocr" \
-H "Content-Type: application/json" \
-d '{
"document": {
"type": "image_url",
"image_url": {"url": "https://example.com/scan.png"}
}
}'Tip: This server also supports
data:URLs (base64-encoded) fordocument_urlandimage_url. The built-in web UI usesdata:URLs under the hood.
"auto"(default): per-page decision using heuristics"text": always use the PDF text layer (fast; fails on scans/bad encoding)"ocr": always render pages and run Vision OCR (slower; best for scanned PDFs)
MIN_TEXT_DENSITYIf a page’s text covers less than this fraction of the page, it is treated as scanned.MAX_MOJIBAKE_RATIOIf too many replacement/control characters appear in the text layer, the page falls back to OCR.PDF_TEXT_RENDER_DPIDPI used when rendering PDF pages for OCR fallback and figure crops.
If figure extraction is enabled (default behavior in the UI flow), the pipeline will:
- Detect image regions (either from PDF blocks in text-layer mode, or from DeepSeek OCR RefDet tags in OCR mode)
- Crop those regions
- Optionally ask DeepSeek-OCR to describe them and insert the description inline in Markdown
Useful request fields:
inline_figure_text(bool, default true): inline descriptions as blockquotes instead of image linksfigure_prompt(string): override the default prompt for figure descriptionimage_min_size(int): ignore very small cropsimage_limit(int): only keep the largest N images per page
Office formats are supported by converting them to PDF using LibreOffice.
If you use Docker, LibreOffice is already included in the image.
Manual install example for Debian/Ubuntu:
sudo apt-get update && sudo apt-get install -y libreofficeIf LibreOffice is missing, Office conversion requests will return an error.
CONCURRENT_REQUEST_LIMIT controls how many requests this server sends to vLLM concurrently.
- If you see GPU OOM, reduce it (e.g. 10–30).
- If your GPU has headroom, increase it for throughput.
Higher PDF_TEXT_RENDER_DPI improves OCR quality but increases:
- rendering time
- GPU compute
- memory usage (larger images)
150 DPI is a good starting point; 200–300 may help for small text in scans.
- Verify
VLLM_BASE_URLpoints to your running vLLM server and includes/v1 - Confirm the model name matches what you serve (
VLLM_MODEL)
- Reduce
CONCURRENT_REQUEST_LIMIT - Consider lowering DPI
- Ensure your vLLM server timeout is high enough (this project uses a long client timeout)
- Ensure
soffice/ LibreOffice is installed and available inPATH - Check server logs for LibreOffice stderr output
- FastAPI app entrypoint:
app.py - PDF strategy:
PdfEngine+heuristics.PageAnalyzer - OCR + figure tasks:
pipeline.py - vLLM OpenAI-compatible client:
deepseek_client.py
TBD
