Skip to content

feat: enforce retrieval relevance thresholds#2

Open
PipDscvr wants to merge 8 commits intomainfrom
codex/gtm-1103-retrieval-relevance-score-semantics
Open

feat: enforce retrieval relevance thresholds#2
PipDscvr wants to merge 8 commits intomainfrom
codex/gtm-1103-retrieval-relevance-score-semantics

Conversation

@PipDscvr
Copy link
Copy Markdown

@PipDscvr PipDscvr commented Apr 28, 2026

Summary

  • add normalized relevance threshold support for full, fast, and workspace search
  • expose separate search result semantics for semantic similarity, ranking score, and injection relevance while keeping score backward-compatible
  • add relevance filter trace diagnostics with source/namespace decisions and deterministic noisy retrieval regression coverage
  • add an explicit direct-fact precision gate and preserve default recall for scoped, temporal/current-state, complex, multi-hop, and aggregation queries
  • add profile-level ranking floors so importance/recency boosts cannot rescue weak direct-query matches
  • regenerate OpenAPI artifacts for the search threshold and response fields

Migration / Compatibility

  • The new threshold request field is opt-in and always takes precedence when supplied.
  • When no caller threshold is supplied, unscoped simple/medium non-temporal queries use runtime similarityThreshold as the default normalized relevance floor.
  • Source-site filtered reads, as-of reads, current/historical state queries, complex queries, multi-hop queries, and aggregation queries bypass that default floor to preserve recall unless the caller supplies an explicit threshold.
  • Retrieval profiles now define rankingMinSimilarity; simple/medium direct-query candidates below that semantic floor are filtered before injection candidate selection, and importance/recency ranking boosts only apply above the same floor. SQL and in-memory scorers clamp this value to [0,1].
  • This means similarityThreshold is not a universal post-retrieval floor; temporal/current-state phrasing intentionally keeps recency/current-state recall intact.
  • ranking_score is a composite ranking/debug value and is not normalized; relevance is normalized to [0,1] for filtering.

Threshold Semantics

  • Request threshold / service relevanceThreshold: per-request normalized injection relevance floor. It wins over config defaults and applies unless omitted.
  • Config similarityThreshold: fallback normalized relevance floor for unscoped simple/medium non-temporal searches when no request threshold is supplied. It is bypassed for recall-preserving query/source/as-of cases.
  • Profile rankingMinSimilarity: retrieval-profile semantic floor for ranking eligibility. It prevents importance/recency boosts below the floor in SQL/in-memory scoring and filters sub-floor simple/medium direct-query candidates before final selection.

Validation

  • npx tsc --noEmit
  • npm run build
  • npm run generate:openapi
  • git diff --check
  • npm test
  • npm test -- dual-write-representations
  • npm test -- retrieval-policy retrieval-relevance-regression pgvector-smoke dual-write-representations
  • npm test -- memory-search-runtime-config retrieval-policy retrieval-relevance-regression pgvector-smoke dual-write-representations
  • npm test -- retrieval-policy retrieval-relevance-regression current-state-ranking search-pipeline-runtime-config
  • npx fallow audit --health-baseline=.fallow/health-baseline.json --dupes-baseline=.fallow/dupes-baseline.json --base=origin/main
  • npm test -- retrieval-relevance-regression retrieval-trace memory-search-runtime-config scoped-dispatch search-pipeline-runtime-config namespace-retrieval response-schema-coverage memory-route-config-override

Linear Scope

Delivered or ready for review in this repo:

  • GTM-1105 / GTM-1111 / GTM-1112: deterministic noisy-retrieval fixture and trace diagnostics
  • GTM-1114: core threshold forwarding/enforcement across full, fast, and workspace search
  • GTM-1115: profile-defined semantic ranking floor plus direct-query ranking eligibility gate; implemented through the relevance gate and ranking/profile settings instead of only reshuffling weights
  • GTM-1117: explicit precision gate for direct-fact noisy retrieval regressions (precisionAtK >= 0.8) plus exact fixture exclusion checks
  • Core-side portion of GTM-1113: core response exposes semantic_similarity, ranking_score, and relevance
  • Core-side portion of GTM-1116: integration/source trace decisions and source-heavy regression coverage

Paired GTM-1113 SDK PRs:

Not closed by this PR:

  • Remaining source/namespace policy breadth for GTM-1116 beyond the core regression coverage here

Add explicit score semantics and pre-packaging relevance filtering for
memory search, including deterministic regression coverage for noisy
direct-fact retrieval with integration-heavy memory sets.
@PipDscvr PipDscvr requested a review from ethanj as a code owner April 28, 2026 18:28
@PipDscvr PipDscvr marked this pull request as draft April 28, 2026 19:12
@PipDscvr PipDscvr marked this pull request as ready for review April 28, 2026 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant