OCR v2 defaults, captioning link, B200 nemotron-parse (NVBug 6204537)#2104
Conversation
Document local HuggingFace and Helm OCR NIM language defaults for Nemotron OCR v2.
Greptile SummaryThis documentation-only PR updates the extraction docs on
|
| Filename | Overview |
|---|---|
| docs/docs/extraction/prerequisites-support-matrix.md | Adds 'Nemotron OCR v2 language mode' subsection with MkDocs anchor and a note covering both local HF and Helm/NIM defaults; corrects nemotron-parse B200 cells from 'Not supported' to 1 GPU / ~16GB disk. |
| docs/docs/extraction/multimodal-extraction.md | Adds a one-paragraph OCR v2 multilingual-default explanation with a cross-link to the new anchor in the support matrix; trims redundant descriptor from the image-captioning related link. Anchor reference matches the newly added section. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User runs OCR extraction] --> B{Deployment type?}
B -->|Local HuggingFace| C[Nemotron OCR v2\ndefault: multilingual 'multi']
B -->|Helm / NIM| D[nimOperator.ocr block\nin values.yaml]
B -->|Remote NIM endpoint| E[NIM own model & language\nbehavior — CLI selectors\nnot forwarded]
C --> F{Language override needed?}
F -->|English only v2| G[--ocr-lang english\nocr_lang API param]
F -->|Legacy engine| H[--ocr-version v1]
F -->|Keep multilingual| I[No flag needed]
D --> J{Chart targets nemotron-ocr-v2?}
J -->|Yes| K[NIM runs in\nmultilingual mode by default]
J -->|No / other tag| L[Confirm repository & tag\nbefore upgrading]
Reviews (4): Last reviewed commit: "Merge branch 'main' into kheiss/docs-ocr..." | Re-trigger Greptile
|
|
||
| **Local Hugging Face inference:** When you deploy locally with HuggingFace model weights (for example `pip install "nemo-retriever[local]"` and GPU inference without remote OCR NIM URLs), the default OCR engine is **Nemotron OCR v2**, which runs in **multilingual** mode by default (`multi`). For English-only v2, pass `--ocr-lang english` on the [CLI](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/docs/cli) or set the equivalent `ocr_lang` parameter in the Python API. Use `--ocr-version v1` for the legacy English-only engine. Remote OCR NIM endpoints use their own model and language behavior; local OCR language selectors are not sent on remote requests. | ||
|
|
||
| **Helm / NIM:** The [NeMo Retriever Helm chart](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/README.md) deploys the core OCR NIM under [`nimOperator.ocr`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/values.yaml#L817-L852). When that block targets **nemotron-ocr-v2** for your release, the deployed NIM also runs in multilingual mode by default. Confirm the `repository` and `tag` in `values.yaml` before you upgrade. |
There was a problem hiding this comment.
Hard-coded line anchor points to the v1 block, not v2
The link values.yaml#L817-L852 lands on the image: sub-key of the ocr: block — which on main is configured with repository: nvcr.io/nim/nvidia/nemotron-ocr-v1 (comment on line 814 explicitly says "Nemotron OCR v1"). The ocr: key itself starts at line 815, so the anchor is also two lines low. A reader following this link will see a v1 configuration, directly contradicting the surrounding note about v2 multilingual defaults. The conditional hedge ("When that block targets nemotron-ocr-v2 for your release") does not prevent the confusion — it just shifts blame. Either update values.yaml on main to reference v2 first, or remove the fragile #L817-L852 fragment and link to the nimOperator.ocr section of the Helm README instead.
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/prerequisites-support-matrix.md
Line: 79
Comment:
**Hard-coded line anchor points to the v1 block, not v2**
The link `values.yaml#L817-L852` lands on the `image:` sub-key of the `ocr:` block — which on `main` is configured with `repository: nvcr.io/nim/nvidia/nemotron-ocr-v1` (comment on line 814 explicitly says "Nemotron OCR v1"). The `ocr:` key itself starts at line 815, so the anchor is also two lines low. A reader following this link will see a v1 configuration, directly contradicting the surrounding note about v2 multilingual defaults. The conditional hedge ("When that block targets nemotron-ocr-v2 for your release") does not prevent the confusion — it just shifts blame. Either update `values.yaml` on `main` to reference v2 first, or remove the fragile `#L817-L852` fragment and link to the `nimOperator.ocr` section of the Helm README instead.
How can I resolve this? If you propose a fix, please make it concise.Correct support matrix GPU and disk columns for B200; Helm can deploy nemotron-parse-v1.2 on B200. Doc-only; separate from SDK workflow tracking in NVBug 6198661.
Summary
Doc-only updates for extraction documentation on
main.1GPU,~16GBdisk), matching successful Helm deployment ofnemotron-parse-v1.2(NIMCache Ready, NIMService/Pod Running/Ready). RTX Pro 6000 and H200 NVL remain Not supported; footnote ² still applies only to 32GB (RTX PRO 4500). Documentation correctness only; separate from end-to-end SDK workflow issues in NVBug 6198661.multi), with--ocr-lang english/--ocr-version v1documented—and a cross-link from OCR and scanned documents.nimOperator.ocr; when the chart targets nemotron-ocr-v2, the deployed NIM also defaults to multilingual (confirmrepository/tagbefore upgrade).multimodal-extraction.md.Test plan
#nemotron-ocr-v2-language-moderesolves1/~16GBin the support matrix table