skill: deeper rerank pool + duplicate-aware ranked_retrieved#2102
skill: deeper rerank pool + duplicate-aware ranked_retrieved#2102mahikaw wants to merge 1 commit into
Conversation
Greptile SummaryThis PR makes two targeted changes to the Claude agent skill definition for the NeMo Retriever benchmark harness: it widens the retriever query pool from
|
| Filename | Overview |
|---|---|
| .claude/skills/nemo-retriever/SKILL.md | Two-line change: bumps retriever query pool from top-k 10 to 20 and replaces the old permissive duplicate-allowed ranked_retrieved rule with a clear deterministic dedup algorithm; logic is internally consistent and addresses previous review feedback. |
Reviews (2): Last reviewed commit: "retriever 20 with dedup" | Re-trigger Greptile
e12ad1e to
1bceee1
Compare
Description
Summary
Single 4-line change to
.claude/skills/nemo-retriever/SKILL.md:bump the agent's
retriever queryinvocation from--top-k 10to--top-k 20, and update theranked_retrievedguidance so the agentemits up to 10 distinct
(doc, page)entries from the wider rerankedpool. No other behaviour changes — query budget, cost discipline,
fallbacks all untouched.
With
--rerankenabled the retriever's internalrefine_factor=4pulls
top_k × 4candidates for the cross-encoder. Bumping top-k from10 to 20 widens that pool from 40 → 80, lets the cross-encoder rerank
a deeper set, and exposes the agent to 20 reranked candidates
(vs 10) when it picks its top-10 page list.
Measured gains (vidore_v3 batch 1, 46 queries: 15 finance + 15 HR + 16 pharma)
Same agent (claude-sonnet-4-6), same judge (nvidia/llama-3.3-nemotron-super-49b-v1.5), same eval harness — only the SKILL.md changed:
R@10 lift is concentrated on HR (+23%) — long-PDF, multi-relevant-page
queries where the extra reranked headroom helps most.
Checklist