Skip to content

skill: deeper rerank pool + duplicate-aware ranked_retrieved#2102

Open
mahikaw wants to merge 1 commit into
mainfrom
dev/wasonmahika/skill-topk-20-rerank-pool
Open

skill: deeper rerank pool + duplicate-aware ranked_retrieved#2102
mahikaw wants to merge 1 commit into
mainfrom
dev/wasonmahika/skill-topk-20-rerank-pool

Conversation

@mahikaw
Copy link
Copy Markdown
Collaborator

@mahikaw mahikaw commented May 22, 2026

Description

Summary

Single 4-line change to .claude/skills/nemo-retriever/SKILL.md:
bump the agent's retriever query invocation from --top-k 10 to
--top-k 20, and update the ranked_retrieved guidance so the agent
emits up to 10 distinct (doc, page) entries from the wider reranked
pool. No other behaviour changes — query budget, cost discipline,
fallbacks all untouched.

With --rerank enabled the retriever's internal refine_factor=4
pulls top_k × 4 candidates for the cross-encoder. Bumping top-k from
10 to 20 widens that pool from 40 → 80, lets the cross-encoder rerank
a deeper set, and exposes the agent to 20 reranked candidates
(vs 10) when it picks its top-10 page list.

Measured gains (vidore_v3 batch 1, 46 queries: 15 finance + 15 HR + 16 pharma)

Same agent (claude-sonnet-4-6), same judge (nvidia/llama-3.3-nemotron-super-49b-v1.5), same eval harness — only the SKILL.md changed:

Metric Baseline (top-k 10) This PR (top-k 20) Δ
Recall@1 0.224 0.300 +34%
Recall@5 0.505 0.565 +12%
Recall@10 0.605 0.639 +5.6%
Judge (0–5) 4.70 4.74 +0.9%
Cost / query $0.315 $0.344 +9%

R@10 lift is concentrated on HR (+23%) — long-PDF, multi-relevant-page
queries where the extra reranked headroom helps most.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@mahikaw mahikaw requested review from a team as code owners May 22, 2026 19:43
@mahikaw mahikaw requested a review from drobison00 May 22, 2026 19:43
@mahikaw mahikaw changed the title retriever 20 with dedup skill: deeper rerank pool + duplicate-aware ranked_retrieved May 22, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR makes two targeted changes to the Claude agent skill definition for the NeMo Retriever benchmark harness: it widens the retriever query pool from --top-k 10 to --top-k 20, and replaces the previous ambiguous ranked_retrieved guidance (which allowed duplicates freely) with a clear, deterministic dedup algorithm.

  • Pool expansion: --top-k 20 --rerank causes the retriever to internally fetch 80 candidates (refine_factor=4) and rerank them, exposing 20 top-ranked results to the agent instead of 10 — giving the cross-encoder a broader surface to work with, particularly helpful for long-PDF multi-page queries.
  • Deterministic dedup rule: the agent is now instructed to walk the 20 hits in reranked order, emitting each unique (doc_id, page_number) pair with a sequential emit-rank (1…10), and only fall back to emitting a duplicate if the full 20-candidate pool is exhausted before 10 distinct pairs are found — resolving the previously flagged ambiguity.

Confidence Score: 5/5

Documentation-only change to an agent skill prompt; no production code paths are affected and the updated dedup rule is deterministic and unambiguous.

The change touches only a Markdown skill-definition file used by the Claude agent harness. The top-k bump is a single integer change, and the new ranked_retrieved rule is a clearly stated deterministic walk-and-deduplicate algorithm that directly addresses the previously noted ambiguity. There are no code, schema, or configuration changes that could introduce regressions.

No files require special attention.

Important Files Changed

Filename Overview
.claude/skills/nemo-retriever/SKILL.md Two-line change: bumps retriever query pool from top-k 10 to 20 and replaces the old permissive duplicate-allowed ranked_retrieved rule with a clear deterministic dedup algorithm; logic is internally consistent and addresses previous review feedback.

Reviews (2): Last reviewed commit: "retriever 20 with dedup" | Re-trigger Greptile

Comment thread .claude/skills/nemo-retriever/SKILL.md Outdated
@mahikaw mahikaw force-pushed the dev/wasonmahika/skill-topk-20-rerank-pool branch from e12ad1e to 1bceee1 Compare May 22, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant