Skip to content

OCR v2 defaults, captioning link, B200 nemotron-parse (NVBug 6204537)#2104

Merged
kheiss-uwzoo merged 4 commits into
NVIDIA:mainfrom
kheiss-uwzoo:kheiss/docs-ocr-v2-multilingual-main
May 22, 2026
Merged

OCR v2 defaults, captioning link, B200 nemotron-parse (NVBug 6204537)#2104
kheiss-uwzoo merged 4 commits into
NVIDIA:mainfrom
kheiss-uwzoo:kheiss/docs-ocr-v2-multilingual-main

Conversation

@kheiss-uwzoo
Copy link
Copy Markdown
Collaborator

@kheiss-uwzoo kheiss-uwzoo commented May 22, 2026

Summary

Doc-only updates for extraction documentation on main.

  • NVBug 6204537: Correct the support matrix so B200 shows nemotron-parse as deployable (1 GPU, ~16GB disk), matching successful Helm deployment of nemotron-parse-v1.2 (NIMCache Ready, NIMService/Pod Running/Ready). RTX Pro 6000 and H200 NVL remain Not supported; footnote ² still applies only to 32GB (RTX PRO 4500). Documentation correctness only; separate from end-to-end SDK workflow issues in NVBug 6198661.
  • OCR v2 multilingual defaults: Adds Nemotron OCR v2 language mode—local Hugging Face inference defaults to multilingual (multi), with --ocr-lang english / --ocr-version v1 documented—and a cross-link from OCR and scanned documents.
  • Helm / NIM: Points to nimOperator.ocr; when the chart targets nemotron-ocr-v2, the deployed NIM also defaults to multilingual (confirm repository / tag before upgrade).
  • Captioning Related link: Trims redundant hardware prose from the image-captioning cross-link in multimodal-extraction.md.

Test plan

  • MkDocs build for extraction docs
  • Verify anchor #nemotron-ocr-v2-language-mode resolves
  • Confirm nemotron-parse B200 cells render as 1 / ~16GB in the support matrix table
  • Spot-check OCR section and captioning Related link in rendered HTML

Document local HuggingFace and Helm OCR NIM language defaults for Nemotron OCR v2.
@kheiss-uwzoo kheiss-uwzoo requested review from a team as code owners May 22, 2026 20:01
@kheiss-uwzoo kheiss-uwzoo requested a review from drobison00 May 22, 2026 20:01
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This documentation-only PR updates the extraction docs on main with three changes: it adds a new "Nemotron OCR v2 language mode" section to the support matrix explaining multilingual defaults for local HuggingFace inference and Helm/NIM deployments, adds a corresponding cross-link paragraph in the OCR section of multimodal-extraction.md, and updates the nemotron-parse hardware matrix to mark B200 as supported (1 GPU, ~16GB disk).

  • OCR v2 defaults: New #nemotron-ocr-v2-language-mode subsection documents multilingual default behavior for local HF inference and points to nimOperator.ocr in the Helm chart; a matching anchor link is added in multimodal-extraction.md.
  • B200 support matrix correction: nemotron-parse rows now correctly show 1 GPU / ~16GB disk for B200, consistent with a verified Helm deployment of nemotron-parse-v1.2; RTX Pro 6000 and H200 NVL remain "Not supported".
  • Captioning related link: Trailing descriptor text trimmed from the image-captioning cross-link.

Confidence Score: 5/5

Documentation-only change; no code paths altered. Safe to merge.

All changes are confined to two Markdown files. The new #nemotron-ocr-v2-language-mode anchor matches its cross-reference exactly, the B200 table cells are updated consistently across both nemotron-parse rows, and the trimmed captioning link remains valid.

No files require special attention.

Important Files Changed

Filename Overview
docs/docs/extraction/prerequisites-support-matrix.md Adds 'Nemotron OCR v2 language mode' subsection with MkDocs anchor and a note covering both local HF and Helm/NIM defaults; corrects nemotron-parse B200 cells from 'Not supported' to 1 GPU / ~16GB disk.
docs/docs/extraction/multimodal-extraction.md Adds a one-paragraph OCR v2 multilingual-default explanation with a cross-link to the new anchor in the support matrix; trims redundant descriptor from the image-captioning related link. Anchor reference matches the newly added section.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User runs OCR extraction] --> B{Deployment type?}
    B -->|Local HuggingFace| C[Nemotron OCR v2\ndefault: multilingual 'multi']
    B -->|Helm / NIM| D[nimOperator.ocr block\nin values.yaml]
    B -->|Remote NIM endpoint| E[NIM own model & language\nbehavior — CLI selectors\nnot forwarded]
    C --> F{Language override needed?}
    F -->|English only v2| G[--ocr-lang english\nocr_lang API param]
    F -->|Legacy engine| H[--ocr-version v1]
    F -->|Keep multilingual| I[No flag needed]
    D --> J{Chart targets nemotron-ocr-v2?}
    J -->|Yes| K[NIM runs in\nmultilingual mode by default]
    J -->|No / other tag| L[Confirm repository & tag\nbefore upgrading]
Loading

Reviews (4): Last reviewed commit: "Merge branch 'main' into kheiss/docs-ocr..." | Re-trigger Greptile


**Local Hugging Face inference:** When you deploy locally with HuggingFace model weights (for example `pip install "nemo-retriever[local]"` and GPU inference without remote OCR NIM URLs), the default OCR engine is **Nemotron OCR v2**, which runs in **multilingual** mode by default (`multi`). For English-only v2, pass `--ocr-lang english` on the [CLI](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/docs/cli) or set the equivalent `ocr_lang` parameter in the Python API. Use `--ocr-version v1` for the legacy English-only engine. Remote OCR NIM endpoints use their own model and language behavior; local OCR language selectors are not sent on remote requests.

**Helm / NIM:** The [NeMo Retriever Helm chart](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/README.md) deploys the core OCR NIM under [`nimOperator.ocr`](https://github.com/NVIDIA/NeMo-Retriever/blob/main/nemo_retriever/helm/values.yaml#L817-L852). When that block targets **nemotron-ocr-v2** for your release, the deployed NIM also runs in multilingual mode by default. Confirm the `repository` and `tag` in `values.yaml` before you upgrade.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Hard-coded line anchor points to the v1 block, not v2

The link values.yaml#L817-L852 lands on the image: sub-key of the ocr: block — which on main is configured with repository: nvcr.io/nim/nvidia/nemotron-ocr-v1 (comment on line 814 explicitly says "Nemotron OCR v1"). The ocr: key itself starts at line 815, so the anchor is also two lines low. A reader following this link will see a v1 configuration, directly contradicting the surrounding note about v2 multilingual defaults. The conditional hedge ("When that block targets nemotron-ocr-v2 for your release") does not prevent the confusion — it just shifts blame. Either update values.yaml on main to reference v2 first, or remove the fragile #L817-L852 fragment and link to the nimOperator.ocr section of the Helm README instead.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/prerequisites-support-matrix.md
Line: 79

Comment:
**Hard-coded line anchor points to the v1 block, not v2**

The link `values.yaml#L817-L852` lands on the `image:` sub-key of the `ocr:` block — which on `main` is configured with `repository: nvcr.io/nim/nvidia/nemotron-ocr-v1` (comment on line 814 explicitly says "Nemotron OCR v1"). The `ocr:` key itself starts at line 815, so the anchor is also two lines low. A reader following this link will see a v1 configuration, directly contradicting the surrounding note about v2 multilingual defaults. The conditional hedge ("When that block targets nemotron-ocr-v2 for your release") does not prevent the confusion — it just shifts blame. Either update `values.yaml` on `main` to reference v2 first, or remove the fragile `#L817-L852` fragment and link to the `nimOperator.ocr` section of the Helm README instead.

How can I resolve this? If you propose a fix, please make it concise.

@kheiss-uwzoo kheiss-uwzoo changed the title docs(extraction): note OCR v2 multilingual defaults (main) note OCR v2 multilingual defaults (main) May 22, 2026
Correct support matrix GPU and disk columns for B200; Helm can deploy nemotron-parse-v1.2 on B200. Doc-only; separate from SDK workflow tracking in NVBug 6198661.
@kheiss-uwzoo kheiss-uwzoo changed the title note OCR v2 multilingual defaults (main) docs(extraction): OCR v2 defaults, captioning link, B200 nemotron-parse (NVBug 6204537) May 22, 2026
@kheiss-uwzoo kheiss-uwzoo changed the title docs(extraction): OCR v2 defaults, captioning link, B200 nemotron-parse (NVBug 6204537) OCR v2 defaults, captioning link, B200 nemotron-parse (NVBug 6204537) May 22, 2026
@kheiss-uwzoo kheiss-uwzoo merged commit b1eaff9 into NVIDIA:main May 22, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants