OCR v2 defaults, captioning link, B200 nemotron-parse (NVBug 6204537) by kheiss-uwzoo · Pull Request #2104 · NVIDIA/NeMo-Retriever

kheiss-uwzoo · 2026-05-22T20:01:49Z

Summary

Doc-only updates for extraction documentation on main.

NVBug 6204537: Correct the support matrix so B200 shows nemotron-parse as deployable (1 GPU, ~16GB disk), matching successful Helm deployment of nemotron-parse-v1.2 (NIMCache Ready, NIMService/Pod Running/Ready). RTX Pro 6000 and H200 NVL remain Not supported; footnote ² still applies only to 32GB (RTX PRO 4500). Documentation correctness only; separate from end-to-end SDK workflow issues in NVBug 6198661.
OCR v2 multilingual defaults: Adds Nemotron OCR v2 language mode—local Hugging Face inference defaults to multilingual (multi), with --ocr-lang english / --ocr-version v1 documented—and a cross-link from OCR and scanned documents.
Helm / NIM: Points to nimOperator.ocr; when the chart targets nemotron-ocr-v2, the deployed NIM also defaults to multilingual (confirm repository / tag before upgrade).
Captioning Related link: Trims redundant hardware prose from the image-captioning cross-link in multimodal-extraction.md.

Test plan

MkDocs build for extraction docs
Verify anchor #nemotron-ocr-v2-language-mode resolves
Confirm nemotron-parse B200 cells render as 1 / ~16GB in the support matrix table
Spot-check OCR section and captioning Related link in rendered HTML

Document local HuggingFace and Helm OCR NIM language defaults for Nemotron OCR v2.

greptile-apps · 2026-05-22T20:04:24Z

Greptile Summary

This documentation-only PR updates the extraction docs on main with three changes: it adds a new "Nemotron OCR v2 language mode" section to the support matrix explaining multilingual defaults for local HuggingFace inference and Helm/NIM deployments, adds a corresponding cross-link paragraph in the OCR section of multimodal-extraction.md, and updates the nemotron-parse hardware matrix to mark B200 as supported (1 GPU, ~16GB disk).

OCR v2 defaults: New #nemotron-ocr-v2-language-mode subsection documents multilingual default behavior for local HF inference and points to nimOperator.ocr in the Helm chart; a matching anchor link is added in multimodal-extraction.md.
B200 support matrix correction: nemotron-parse rows now correctly show 1 GPU / ~16GB disk for B200, consistent with a verified Helm deployment of nemotron-parse-v1.2; RTX Pro 6000 and H200 NVL remain "Not supported".
Captioning related link: Trailing descriptor text trimmed from the image-captioning cross-link.

Confidence Score: 5/5

Documentation-only change; no code paths altered. Safe to merge.

All changes are confined to two Markdown files. The new #nemotron-ocr-v2-language-mode anchor matches its cross-reference exactly, the B200 table cells are updated consistently across both nemotron-parse rows, and the trimmed captioning link remains valid.

No files require special attention.

Important Files Changed

Filename	Overview
docs/docs/extraction/prerequisites-support-matrix.md	Adds 'Nemotron OCR v2 language mode' subsection with MkDocs anchor and a note covering both local HF and Helm/NIM defaults; corrects nemotron-parse B200 cells from 'Not supported' to 1 GPU / ~16GB disk.
docs/docs/extraction/multimodal-extraction.md	Adds a one-paragraph OCR v2 multilingual-default explanation with a cross-link to the new anchor in the support matrix; trims redundant descriptor from the image-captioning related link. Anchor reference matches the newly added section.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User runs OCR extraction] --> B{Deployment type?}
    B -->|Local HuggingFace| C[Nemotron OCR v2\ndefault: multilingual 'multi']
    B -->|Helm / NIM| D[nimOperator.ocr block\nin values.yaml]
    B -->|Remote NIM endpoint| E[NIM own model & language\nbehavior — CLI selectors\nnot forwarded]
    C --> F{Language override needed?}
    F -->|English only v2| G[--ocr-lang english\nocr_lang API param]
    F -->|Legacy engine| H[--ocr-version v1]
    F -->|Keep multilingual| I[No flag needed]
    D --> J{Chart targets nemotron-ocr-v2?}
    J -->|Yes| K[NIM runs in\nmultilingual mode by default]
    J -->|No / other tag| L[Confirm repository & tag\nbefore upgrading]

_{Reviews (4): Last reviewed commit: "Merge branch 'main' into kheiss/docs-ocr..." | Re-trigger Greptile}

greptile-apps · 2026-05-22T20:04:28Z

+
+    **Local Hugging Face inference:** When you deploy locally with HuggingFace model weights (for example `pip install "nemo-retriever[local]"` and GPU inference without remote OCR NIM URLs), the default OCR engine is **Nemotron OCR v2**, which runs in **multilingual** mode by default (`multi`). For English-only v2, pass `--ocr-lang english` on the [CLI](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/docs/cli) or set the equivalent `ocr_lang` parameter in the Python API. Use `--ocr-version v1` for the legacy English-only engine. Remote OCR NIM endpoints use their own model and language behavior; local OCR language selectors are not sent on remote requests.
+
+    **Helm / NIM:** The [NeMo Retriever Helm chart](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/README.md) deploys the core OCR NIM under [`nimOperator.ocr`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/values.yaml#L817-L852). When that block targets **nemotron-ocr-v2** for your release, the deployed NIM also runs in multilingual mode by default. Confirm the `repository` and `tag` in `values.yaml` before you upgrade.


Hard-coded line anchor points to the v1 block, not v2

The link values.yaml#L817-L852 lands on the image: sub-key of the ocr: block — which on main is configured with repository: nvcr.io/nim/nvidia/nemotron-ocr-v1 (comment on line 814 explicitly says "Nemotron OCR v1"). The ocr: key itself starts at line 815, so the anchor is also two lines low. A reader following this link will see a v1 configuration, directly contradicting the surrounding note about v2 multilingual defaults. The conditional hedge ("When that block targets nemotron-ocr-v2 for your release") does not prevent the confusion — it just shifts blame. Either update values.yaml on main to reference v2 first, or remove the fragile #L817-L852 fragment and link to the nimOperator.ocr section of the Helm README instead.

Prompt To Fix With AI

This is a comment left during a code review. Path: docs/docs/extraction/prerequisites-support-matrix.md Line: 79 Comment: **Hard-coded line anchor points to the v1 block, not v2** The link `values.yaml#L817-L852` lands on the `image:` sub-key of the `ocr:` block — which on `main` is configured with `repository: nvcr.io/nim/nvidia/nemotron-ocr-v1` (comment on line 814 explicitly says "Nemotron OCR v1"). The `ocr:` key itself starts at line 815, so the anchor is also two lines low. A reader following this link will see a v1 configuration, directly contradicting the surrounding note about v2 multilingual defaults. The conditional hedge ("When that block targets nemotron-ocr-v2 for your release") does not prevent the confusion — it just shifts blame. Either update `values.yaml` on `main` to reference v2 first, or remove the fragile `#L817-L852` fragment and link to the `nimOperator.ocr` section of the Helm README instead. How can I resolve this? If you propose a fix, please make it concise.

Correct support matrix GPU and disk columns for B200; Helm can deploy nemotron-parse-v1.2 on B200. Doc-only; separate from SDK workflow tracking in NVBug 6198661.

docs(extraction): note OCR v2 multilingual defaults

e35c7b3

Document local HuggingFace and Helm OCR NIM language defaults for Nemotron OCR v2.

kheiss-uwzoo requested review from a team as code owners May 22, 2026 20:01

kheiss-uwzoo requested a review from drobison00 May 22, 2026 20:01

greptile-apps Bot reviewed May 22, 2026

View reviewed changes

docs(extraction): drop NIM hardware prose from captioning Related link

86f5a2d

kheiss-uwzoo changed the title ~~docs(extraction): note OCR v2 multilingual defaults (main)~~ note OCR v2 multilingual defaults (main) May 22, 2026

docs(extraction): mark B200 supported for nemotron-parse (NVBug 6204537)

556b610

Correct support matrix GPU and disk columns for B200; Helm can deploy nemotron-parse-v1.2 on B200. Doc-only; separate from SDK workflow tracking in NVBug 6198661.

kheiss-uwzoo changed the title ~~note OCR v2 multilingual defaults (main)~~ docs(extraction): OCR v2 defaults, captioning link, B200 nemotron-parse (NVBug 6204537) May 22, 2026

Merge branch 'main' into kheiss/docs-ocr-v2-multilingual-main

133f8b2

kheiss-uwzoo changed the title ~~docs(extraction): OCR v2 defaults, captioning link, B200 nemotron-parse (NVBug 6204537)~~ OCR v2 defaults, captioning link, B200 nemotron-parse (NVBug 6204537) May 22, 2026

sosahi approved these changes May 22, 2026

View reviewed changes

kheiss-uwzoo merged commit b1eaff9 into NVIDIA:main May 22, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR v2 defaults, captioning link, B200 nemotron-parse (NVBug 6204537)#2104

OCR v2 defaults, captioning link, B200 nemotron-parse (NVBug 6204537)#2104
kheiss-uwzoo merged 4 commits into
NVIDIA:mainfrom
kheiss-uwzoo:kheiss/docs-ocr-v2-multilingual-main

kheiss-uwzoo commented May 22, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 22, 2026 •

edited

Loading

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Local Hugging Face inference: When you deploy locally with HuggingFace model weights (for example `pip install "nemo-retriever[local]"` and GPU inference without remote OCR NIM URLs), the default OCR engine is Nemotron OCR v2, which runs in multilingual mode by default (`multi`). For English-only v2, pass `--ocr-lang english` on the [CLI](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/docs/cli) or set the equivalent `ocr_lang` parameter in the Python API. Use `--ocr-version v1` for the legacy English-only engine. Remote OCR NIM endpoints use their own model and language behavior; local OCR language selectors are not sent on remote requests.

		Helm / NIM: The [NeMo Retriever Helm chart](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/README.md) deploys the core OCR NIM under [`nimOperator.ocr`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/values.yaml#L817-L852). When that block targets nemotron-ocr-v2 for your release, the deployed NIM also runs in multilingual mode by default. Confirm the `repository` and `tag` in `values.yaml` before you upgrade.

Conversation

kheiss-uwzoo commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

greptile-apps Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kheiss-uwzoo commented May 22, 2026 •

edited

Loading

greptile-apps Bot commented May 22, 2026 •

edited

Loading