Endpoint for stats about verified occurrences#1307
Conversation
Pure-Python LCA over (taxon_id, rank, parents_json) tuples. Returns the deepest shared TaxonRank or None. Used by the upcoming human-model-agreement stat to bucket agreement at-or-finer-than ORDER. Plan: docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md Side-research: docs/claude/planning/occurrence-filter-driven-exports.md Co-Authored-By: Claude <noreply@anthropic.com>
… queryset Pure aggregation; caller wires apply_default_filters + OccurrenceFilter. Annotates best machine prediction, prefetches non-withdrawn identifications, batches Taxon fetch for parents_json, buckets exact / under-order / above-order. Co-Authored-By: Claude <noreply@anthropic.com>
Adds HumanModelAgreementSerializer and the human_model_agreement action on OccurrenceStatsViewSet. Extracts OccurrenceViewSet's filter backends + filterset_fields into a module-level tuple so OccurrenceStatsViewSet can reuse the same OccurrenceFilter pass-through (deployment, event, taxa lists, verified, score thresholds, apply_defaults=false, etc). The top_identifiers action keeps its current behavior — filter_queryset is only invoked by actions that opt in. Co-Authored-By: Claude <noreply@anthropic.com>
Adds 6 HTTP-level tests: missing project_id 400, draft 404, empty zeros, happy-path exact match, deployment filter pass-through, apply_defaults=false score-threshold bypass. Also adds DjangoFilterBackend to OccurrenceStatsViewSet.filter_backends so filterset_fields (event, deployment, determination__rank, ...) actually take effect. Without DjangoFilterBackend, filterset_fields are silently ignored and ?deployment=N returns the unfiltered set. Co-Authored-By: Claude <noreply@anthropic.com>
Mirrors useTopIdentifiers's useAuthorizedQuery pattern. Accepts an arbitrary filter map so the occurrence list page can thread its filter state through unchanged (deployment, event, taxon, score thresholds, apply_defaults). Co-Authored-By: Claude <noreply@anthropic.com>
✅ Deploy Preview for antenna-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for antenna-ssec ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughPR ChangesModel Agreement Stats Endpoint
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a new scalar stats endpoint GET /occurrences/stats/human-model-agreement/ that reports verified-occurrence and human-vs-model agreement rates (exact and "under-order") for an occurrence queryset, reusing the existing /occurrences/ filter stack and apply_default_filters.
Changes:
- New aggregation helper
human_model_agreement_for_projectplus pure-Pythonlca_rank_betweenoverTaxon.parents_jsoninami/main/models_future/occurrence.py. - New action on
OccurrenceStatsViewSetplusHumanModelAgreementSerializer; extractsOccurrenceViewSetfilter backends/fields into module-level tuples (OCCURRENCE_FILTER_BACKENDS,OCCURRENCE_FILTERSET_FIELDS) shared by both viewsets. - React Query hook
useHumanModelAgreementand supporting planning/scoping docs.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| ami/main/models_future/occurrence.py | New LCA helper + Python-side aggregation function over a pre-filtered Occurrence queryset. |
| ami/main/api/views.py | Extracts shared occurrence filter config and adds human_model_agreement action on OccurrenceStatsViewSet. |
| ami/main/api/serializers.py | New HumanModelAgreementSerializer describing the response shape. |
| ami/main/tests.py | Unit tests for lca_rank_between, aggregation tests, and HTTP-level tests for the new action. |
| ui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts | New typed React Query hook for the endpoint. |
| docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md | Implementation plan document for the feature. |
| docs/claude/planning/occurrence-filter-driven-exports.md | Side-research scoping stub for filter-driven exports (out of scope of this PR). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
ami/main/api/serializers.py (1)
1765-1769: ⚡ Quick winConstrain percentage fields to the documented 0.0..1.0 range
These fields are contractually bounded; adding serializer bounds gives fast failure on accidental regressions and keeps response validation self-documenting.
Proposed diff
- verified_pct = serializers.FloatField(help_text="verified_count / total_occurrences") + verified_pct = serializers.FloatField( + min_value=0.0, + max_value=1.0, + help_text="verified_count / total_occurrences", + ) @@ - agreed_exact_pct = serializers.FloatField(help_text="agreed_exact_count / verified_count") + agreed_exact_pct = serializers.FloatField( + min_value=0.0, + max_value=1.0, + help_text="agreed_exact_count / verified_count", + ) @@ - agreed_under_order_pct = serializers.FloatField(help_text="agreed_under_order_count / verified_count") + agreed_under_order_pct = serializers.FloatField( + min_value=0.0, + max_value=1.0, + help_text="agreed_under_order_count / verified_count", + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/main/api/serializers.py` around lines 1765 - 1769, The percentage fields verified_pct, agreed_exact_pct, and agreed_under_order_pct are currently unbounded; update their declarations to add validation bounds (min_value=0.0, max_value=1.0) on the serializers.FloatField instances so the serializer enforces the documented 0.0..1.0 contract and fails fast on invalid values.ui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts (2)
4-13: 💤 Low valueConsider renaming
Responseto avoid shadowing the global DOM type.
Responseis the name of the globalfetchresponse type. Shadowing it at module scope is harmless today but creates a foot-gun if anyone later references the DOMResponsein this file. A domain-prefixed name (e.g.,HumanModelAgreementResponse) is clearer at call sites too.♻️ Proposed rename
-interface Response { +interface HumanModelAgreementResponse { project_id: number total_occurrences: number verified_count: number verified_pct: number agreed_exact_count: number agreed_exact_pct: number agreed_under_order_count: number agreed_under_order_pct: number } @@ - const { data, isLoading, isFetching, error } = useAuthorizedQuery<Response>({ + const { data, isLoading, isFetching, error } = useAuthorizedQuery<HumanModelAgreementResponse>({🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts` around lines 4 - 13, Rename the module-scoped interface named Response to a domain-specific name (e.g., HumanModelAgreementResponse) to avoid shadowing the global DOM Response type; update the interface declaration and all references to it within useHumanModelAgreement.ts (and any exported types/imports) so code that needs the DOM Response can still reference it unambiguously and call sites use the new HumanModelAgreementResponse identifier.
20-32: ⚡ Quick winSingle-value filter map drops multi-value query params (e.g.,
algorithm).
OccurrenceAlgorithmFilterreadsalgorithmandnot_algorithmviarequest.query_params.getlist(...)on the backend, so callers can legitimately pass multiple algorithm IDs. The currentRecord<string, string | number | boolean | undefined>plusparams.set(...)collapses any such filter to a single value, so this hook can't fully reproduce the/occurrences/filter set the PR objectives describe.Consider widening the value type and switching to
appendper item:♻️ Proposed change
export const useHumanModelAgreement = ( projectId?: string, - filters?: Record<string, string | number | boolean | undefined> + filters?: Record< + string, + string | number | boolean | Array<string | number> | undefined + > ) => { const url = `${API_URL}/${API_ROUTES.OCCURRENCES}/stats/human-model-agreement/` const params = new URLSearchParams() if (projectId) params.set('project_id', projectId) if (filters) { Object.entries(filters).forEach(([key, value]) => { - if (value !== undefined && value !== '' && value !== null) { - params.set(key, String(value)) - } + if (value === undefined || value === null || value === '') return + if (Array.isArray(value)) { + value.forEach((v) => { + if (v !== undefined && v !== null && v !== '') { + params.append(key, String(v)) + } + }) + } else { + params.set(key, String(value)) + } }) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts` around lines 20 - 32, The hook useHumanModelAgreement currently types filters as Record<string, string | number | boolean | undefined> and calls params.set(...), which collapses multi-value query params; update the filters param type to allow string[] (e.g., Record<string, string | number | boolean | string[] | undefined>) and when iterating Object.entries(filters) detect arrays and call params.append(key, String(item)) for each element (fall back to params.set for single values), ensuring multi-value keys like "algorithm" and "not_algorithm" are preserved in the generated URL.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ami/main/models_future/occurrence.py`:
- Line 187: The code eagerly materializes the entire QuerySet into memory by
doing occurrences = list(qs); change this to a memory-safe iteration or paging
approach: replace the full list() with chunked processing using
qs.iterator(chunk_size=1000) or loop over qs in paginated batches (e.g.,
Paginator or manual offset/limit) and aggregate/write results per-chunk, and
avoid prefetching everything at once (adjust or remove the prefetch_related on
identifications or use values()/only()/defer() to limit fetched fields) so
memory usage stays bounded; update any downstream logic that expects a full list
to work with incremental processing or collect results into a streaming response
instead.
In `@docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md`:
- Around line 26-43: The fenced code block listing project files lacks a
language tag (triggering markdownlint MD040); update the opening fence for the
block that contains entries like "ami/ ... occurrence.py # ADD:
human_model_agreement_for_project(), _lca_rank_of() helper", "serializers.py #
ADD: HumanModelAgreementSerializer", and "useHumanModelAgreement.ts # ADD: typed
React Query hook" to include a language identifier (e.g., ```text) so the block
is explicitly labeled; keep the same block content and closing fence unchanged.
---
Nitpick comments:
In `@ami/main/api/serializers.py`:
- Around line 1765-1769: The percentage fields verified_pct, agreed_exact_pct,
and agreed_under_order_pct are currently unbounded; update their declarations to
add validation bounds (min_value=0.0, max_value=1.0) on the
serializers.FloatField instances so the serializer enforces the documented
0.0..1.0 contract and fails fast on invalid values.
In `@ui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts`:
- Around line 4-13: Rename the module-scoped interface named Response to a
domain-specific name (e.g., HumanModelAgreementResponse) to avoid shadowing the
global DOM Response type; update the interface declaration and all references to
it within useHumanModelAgreement.ts (and any exported types/imports) so code
that needs the DOM Response can still reference it unambiguously and call sites
use the new HumanModelAgreementResponse identifier.
- Around line 20-32: The hook useHumanModelAgreement currently types filters as
Record<string, string | number | boolean | undefined> and calls params.set(...),
which collapses multi-value query params; update the filters param type to allow
string[] (e.g., Record<string, string | number | boolean | string[] |
undefined>) and when iterating Object.entries(filters) detect arrays and call
params.append(key, String(item)) for each element (fall back to params.set for
single values), ensuring multi-value keys like "algorithm" and "not_algorithm"
are preserved in the generated URL.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a97a61f4-518e-4bf7-b6b5-fb325dc4e97d
📒 Files selected for processing (7)
ami/main/api/serializers.pyami/main/api/views.pyami/main/models_future/occurrence.pyami/main/tests.pydocs/claude/planning/2026-05-14-human-model-agreement-endpoint.mddocs/claude/planning/occurrence-filter-driven-exports.mdui/src/data-services/hooks/occurrences/stats/useHumanModelAgreement.ts
|
Oh yes!! |
… review fixes Captures: review findings from Copilot + CodeRabbit, perf bench evidence (43k rows → 159s timeout on apply_defaults=false), and the planned changes for the next session (rename to model-agreement, push aggregation into SQL/ORM, fix UNKNOWN rank LCA + denominator + verified_by_me anon gap + test gaps). Co-Authored-By: Claude <noreply@anthropic.com>
…ion to SQL Addresses review feedback on PR #1307: Rename (drop "human"): - URL: /occurrences/stats/human-model-agreement/ -> /model-agreement/ - Function: human_model_agreement_for_project -> model_agreement_for_project - Serializer: HumanModelAgreementSerializer -> ModelAgreementSerializer - Viewset action + url_path: human_model_agreement -> model_agreement - FE hook: useHumanModelAgreement -> useModelAgreement (file + symbol) - FE type: Response -> ModelAgreementResponse (fixes DOM Response shadow) - Test class: TestHumanModelAgreementForProject -> TestModelAgreementForProject SQL push-down (Copilot+CodeRabbit perf flag): - Replace list(qs) full-row materialization with annotated aggregate(). - Annotate best_user_taxon_id via Subquery over Identification (BEST_IDENTIFICATION_ORDER). Drop the prefetch + select_related("taxon") on identifications since only taxon_id is read. - aggregate() Count(filter=Q(...)) for total/verified/exact/no-prediction. - For under-order disagreement: group disagreement set by distinct (user_taxon, machine_taxon) pair before LCA. Each pair's LCA runs once. - Bench against project 18 (43,149 occurrences): pre-rework apply_defaults=false curl timed out at 159s; post-rework 1.96s unfiltered / 3.4s with bypass (93,019 occurrences post-filter). Denominator fix (Copilot): - agreed_*_pct now divides by verified_with_prediction_count instead of verified_count. A verified occurrence with no machine prediction can't agree or disagree; including it in the denominator drags the rate down without representing actual model disagreement. - Surface no_prediction_count + verified_with_prediction_count as sibling fields so consumers can see how many such occurrences exist. UNKNOWN rank bug (Copilot): - TaxonRank.UNKNOWN sorts after SPECIES in OrderedEnum definition order, so without explicit exclusion UNKNOWN >= ORDER is True and a shared UNKNOWN ancestor would wrongly count as under-order agreement. Filter UNKNOWN out of lca_rank_between's candidate ranks. Add regression test. Tests: - New: test_unknown_rank_excluded_from_lca (LCA regression) - New: test_agreement_under_order_bucket (HTTP coverage for sister-species case, previously only exact-match shortcut was exercised) - Updated: happy-path asserts verified_with_prediction_count and no_prediction_count. 22/22 backend tests green: docker compose exec django python manage.py test ami.main.tests.TestLcaRankBetween ami.main.tests.TestModelAgreementForProject ami.main.tests.TestOccurrenceStatsViewSet Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Replace the .aggregate() over the full filtered queryset with a two-step
approach:
1. SQL Count('pk') for total_occurrences (no joins, no subqueries).
2. Fetch the verified set (occurrences with at least one non-withdrawn
ident) with both best_user_taxon_id and best_machine_prediction_taxon_id
annotated, then bucket counts + LCA in Python.
Why: the previous version evaluated two correlated subqueries (best user
identification + best machine prediction) on every row of the filtered
queryset. For typical projects, >95% of occurrences have no identification
— those rows ran the user-ident subquery only to discover NULL, then ran
the (much more expensive) machine-prediction subquery on detections that
won't contribute to any agreement bucket. Scoping the subqueries to the
verified set avoids that waste.
Bench (cold, cache invalidated):
Project Total Verified Pre Post
P#85 SEC-SEQ 36,253 13,140 — 1.18s
P#20 BCI 40,958 1,351 — 0.92s
P#84 Pennsylvania 18,407 251 — 0.56s
P#24 Atlantic Forestry 2,797 274 — 0.50s
P#18 Vermont 43,149 45 ~928ms 0.35s
P#23 Insectarium Montreal 20,393 74 — 0.43s
Warm via django-cachalot: 122–343ms across all projects.
For P#85 (highest absolute identification count in the system), the cost
is dominated by apply_default_filters' score-threshold join, not the
subqueries. apply_defaults=false actually runs faster (0.69s cold,
179,466 total / 13,140 verified) because the classification join is
skipped.
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ui/src/data-services/hooks/occurrences/stats/useModelAgreement.ts (1)
22-33:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSupport repeated query params for multi-select filters.
Record<string, primitive>+params.set(...)drops repeated keys, so multi-value filters (e.g., repeatedalgorithm/not_algorithm) can’t be forwarded faithfully from occurrence filters.💡 Proposed fix
-export const useModelAgreement = ( - projectId?: string, - filters?: Record<string, string | number | boolean | undefined> -) => { +type FilterPrimitive = string | number | boolean +type FilterValue = FilterPrimitive | FilterPrimitive[] | null | undefined + +export const useModelAgreement = ( + projectId?: string, + filters?: Record<string, FilterValue> +) => { const url = `${API_URL}/${API_ROUTES.OCCURRENCES}/stats/model-agreement/` const params = new URLSearchParams() if (projectId) params.set('project_id', projectId) if (filters) { Object.entries(filters).forEach(([key, value]) => { - if (value !== undefined && value !== '' && value !== null) { - params.set(key, String(value)) - } + if (Array.isArray(value)) { + value.forEach((item) => { + if (item !== undefined && item !== '' && item !== null) { + params.append(key, String(item)) + } + }) + return + } + if (value !== undefined && value !== '' && value !== null) { + params.set(key, String(value)) + } }) } + const queryString = params.toString() const { data, isLoading, isFetching, error } = useAuthorizedQuery<ModelAgreementResponse>({ queryKey: [ API_ROUTES.OCCURRENCES, 'stats', 'model-agreement', projectId, - filters, + queryString, ], - url: `${url}?${params.toString()}`, + url: `${url}?${queryString}`, })Also applies to: 38-46
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ui/src/data-services/hooks/occurrences/stats/useModelAgreement.ts` around lines 22 - 33, The current implementation converts multi-value filters into a Record<string, primitive> and uses params.set(...), which overwrites duplicate query keys and loses multi-select filters; update the filters type to allow arrays (e.g., Record<string, string | number | boolean | string[] | undefined>), and when building URLSearchParams switch to using params.append(...) for repeated values: if the filter value is an array loop and params.append(key, String(v)) for each entry, otherwise call params.append(key, String(value)); ensure the same change is applied in the other occurrence mentioned (the block around the second params handling at lines ~38-46). Use the existing params and filters identifiers so the change is localized.
🧹 Nitpick comments (1)
docs/claude/prompts/NEXT_SESSION_PROMPT.md (1)
1-86: ⚡ Quick winPlanning document appears stale and may confuse future readers.
This file is titled "Next session" and describes tasks "for this session" (lines 7-68), but according to the PR objectives summary, the work described here has already been completed:
- Renaming from "human-model-agreement" to "model-agreement" ✓
- SQL aggregation push ✓
- UNKNOWN rank bug fix ✓
- Denominator fix (verified_with_prediction_count) ✓
Including a "NEXT_SESSION_PROMPT" document that describes already-completed work as if it's pending creates confusion for future developers who might try to execute these tasks again or wonder what state the codebase is in.
Additionally, line 5 references the old endpoint URL that will 404 after the renaming.
Consider one of:
- Archive/rename this to
docs/claude/planning/2026-05-14-session-notes-pr-1307.md(historical record, past tense)- Remove it if the other planning doc at line 18 already serves as the historical record
- Add a header clearly stating "Historical planning document - work completed in commits X-Y"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/claude/prompts/NEXT_SESSION_PROMPT.md` around lines 1 - 86, The "NEXT_SESSION_PROMPT.md" planning doc is stale and misleading; update it by either (a) renaming/archiving it (e.g., to docs/claude/planning/2026-05-14-session-notes-pr-1307.md) and leaving as historical record, (b) deleting it if redundant, or (c) editing the top of NEXT_SESSION_PROMPT.md to a clear "Historical planning document — work completed in commits <sha-range>" header and update/remove the old endpoint URL reference; ensure you touch the file named NEXT_SESSION_PROMPT.md and fix the line that references the old endpoint URL (http://localhost:8000/api/v2/occurrences/stats/human-model-agreement/?) so it no longer points to a non-existent route and include the completed-commits SHAs or a pointer to the merged PR in the header.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ami/main/models_future/occurrence.py`:
- Around line 201-213: The count is taken from the raw queryset (total =
queryset.count()) but the verified branch uses a deduped queryset (.distinct()),
so duplicates in the incoming queryset can inflate total; change to operate on a
deduplicated base queryset (e.g., replace/count using queryset.distinct() or
assign deduped = queryset.distinct() and use deduped for total and downstream
operations like the block that builds verified_rows and any other aggregations)
so that total, verified_rows and agreement numerators use the same deduplicated
set (refer to total, verified_rows and the use of .distinct() in this
file/function).
In `@docs/claude/prompts/NEXT_SESSION_PROMPT.md`:
- Line 86: The TODO about updating MEMORY.md is incomplete—either perform the
update or remove/clarify the note: add a new entry named
project_pr_1307_human_model_agreement.md into MEMORY.md summarizing the current
PR state (references to PR `#1307`, the plan doc at
docs/claude/planning/occurrence-filter-driven-exports.md, and the exported
stub), or delete the parenthetical “(TODO this session start)” and replace it
with a clear status line (e.g., “updated” or “needs follow-up”) so the commit
message and NEXT_SESSION_PROMPT.md reflect an accurate, actionable state.
---
Outside diff comments:
In `@ui/src/data-services/hooks/occurrences/stats/useModelAgreement.ts`:
- Around line 22-33: The current implementation converts multi-value filters
into a Record<string, primitive> and uses params.set(...), which overwrites
duplicate query keys and loses multi-select filters; update the filters type to
allow arrays (e.g., Record<string, string | number | boolean | string[] |
undefined>), and when building URLSearchParams switch to using
params.append(...) for repeated values: if the filter value is an array loop and
params.append(key, String(v)) for each entry, otherwise call params.append(key,
String(value)); ensure the same change is applied in the other occurrence
mentioned (the block around the second params handling at lines ~38-46). Use the
existing params and filters identifiers so the change is localized.
---
Nitpick comments:
In `@docs/claude/prompts/NEXT_SESSION_PROMPT.md`:
- Around line 1-86: The "NEXT_SESSION_PROMPT.md" planning doc is stale and
misleading; update it by either (a) renaming/archiving it (e.g., to
docs/claude/planning/2026-05-14-session-notes-pr-1307.md) and leaving as
historical record, (b) deleting it if redundant, or (c) editing the top of
NEXT_SESSION_PROMPT.md to a clear "Historical planning document — work completed
in commits <sha-range>" header and update/remove the old endpoint URL reference;
ensure you touch the file named NEXT_SESSION_PROMPT.md and fix the line that
references the old endpoint URL
(http://localhost:8000/api/v2/occurrences/stats/human-model-agreement/?) so it
no longer points to a non-existent route and include the completed-commits SHAs
or a pointer to the merged PR in the header.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c229eb53-6f25-4d87-bd0b-622992ee75eb
📒 Files selected for processing (8)
ami/main/api/serializers.pyami/main/api/views.pyami/main/models_future/occurrence.pyami/main/tests.pydocs/claude/planning/2026-05-14-human-model-agreement-endpoint.mddocs/claude/prompts/NEXT_SESSION_PROMPT.mddocs/claude/reference/api-stats-pattern.mdui/src/data-services/hooks/occurrences/stats/useModelAgreement.ts
✅ Files skipped from review due to trivial changes (1)
- docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md
… param Replaces hardcoded `lca >= TaxonRank.ORDER` agreement gate with two layers: - Always returned: `agreed_any_rank_*` — exact matches plus any non-null LCA at a real rank (UNKNOWN excluded). The upstream filter (e.g. a Lepidoptera include list) is what bounds the meaningful scope, not a hardcoded threshold in this function. - Optional `?agreement_coarsest_rank=FAMILY`: when supplied, response also includes `agreed_coarser_rank_*` (exact + LCAs at or below the threshold). The applied rank is echoed in `agreement_coarsest_rank`; null when absent. Also addresses CodeRabbit feedback on the existing branch: - Dedupe base queryset before counting (joins from default-filter chain can inflate Occurrence rows). - Bound `*_pct` FloatFields to [0.0, 1.0] in the serializer. Param validation: invalid rank → 400; UNKNOWN rejected as not meaningful. Tests cover any-rank fallback, threshold filtering, invalid + UNKNOWN rejection, and threshold echo. Co-Authored-By: Claude <noreply@anthropic.com>
…pport Renames `agreed_under_order_*` → `agreed_any_rank_*` to match the backend's dropped ORDER threshold. Adds optional `agreement_coarsest_rank` + `agreed_coarser_rank_*` fields to the response type (not consumed by the UI yet — the stats panel still renders `verified_pct` + `agreed_any_rank_pct`). Also widens `filters` to accept arrays and appends repeated query params so multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via `request.query_params.getlist(...)`) survive. Same fix CodeRabbit flagged in PR #1307 review. Co-Authored-By: Claude <noreply@anthropic.com>
…ry params - Rename `agreed_under_order_*` → `agreed_any_rank_*` to match the endpoint's dropped ORDER threshold (0565f06). - Add optional `agreement_coarsest_rank` + `agreed_coarser_rank_*` fields to the response type (not consumed yet — UI follows in #1308). - Widen `filters` to accept arrays and append repeated query params so multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via `request.query_params.getlist(...)`) survive. Per CodeRabbit review. Co-Authored-By: Claude <noreply@anthropic.com>
…pport Renames `agreed_under_order_*` → `agreed_any_rank_*` to match the backend's dropped ORDER threshold. Adds optional `agreement_coarsest_rank` + `agreed_coarser_rank_*` fields to the response type (not consumed by the UI yet — the stats panel still renders `verified_pct` + `agreed_any_rank_pct`). Also widens `filters` to accept arrays and appends repeated query params so multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via `request.query_params.getlist(...)`) survive. Same fix CodeRabbit flagged in PR #1307 review. Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude says: Pushed two commits addressing this round of CodeRabbit feedback + a follow-on design extension the user signed off on. Code changes (0565f06) —
FE follow-up (4a92c0b) —
Tests: 19/19 in Stale 🤖 Generated with Claude Code |
Session-scratchpad doc — belongs in local notes, not the merged branch. Co-Authored-By: Claude <noreply@anthropic.com>
- 2026-05-14-human-model-agreement-endpoint.md — design narrative; superseded by code + PR description. - occurrence-filter-driven-exports.md — side-research stub Copilot flagged as out-of-scope. Promoted to a PR-description follow-up item. Co-Authored-By: Claude <noreply@anthropic.com>
create_detections assigns the classification taxon via .order_by("?"),
so the previous test picked a random machine taxon and then required a
sister species under the same genus. Random non-species picks (ORDER /
FAMILY / GENUS) have no sister, flaking ~50% of runs.
Pin both the machine prediction and the human ID to two fixed Vanessa
species, so the LCA is always GENUS (any-rank bucket, not exact) and the
test is deterministic.
Co-Authored-By: Claude <noreply@anthropic.com>

Summary
Adds
GET /api/v2/occurrences/stats/model-agreement/— verified-occurrence rate + human↔model agreement rates over the same filter set the/occurrences/list view accepts. Designed for the project-overview dashboard widget and occurrence-list sidebar panel (consumed by #1308).Stats viewset convention established in #1296 (see
docs/claude/reference/api-stats-pattern.md): scalar response under the entity it's computed over, namespaced under/stats/.Filter parity
Stats endpoint accepts every query param the
/occurrences/list endpoint accepts, minus ordering/search (don't apply to scalars).project_id=<int>apply_defaults=true/falseapply_default_filterstrue. Bypass = ignore project default taxa lists + score thresholds.taxon=<id>ordetermination=<id>CustomOccurrenceDeterminationFilterparents_json— matches the taxon and all descendants.event=<id>deployment=<id>determination__rank=<RANK>SPECIES,GENUS,FAMILY.detections__source_image=<id>collection=<id>orcollection_id=<id>OccurrenceCollectionFilteralgorithm=<id>(repeatable)OccurrenceAlgorithmFilternot_algorithm=<id>(repeatable)OccurrenceAlgorithmFilterdate_start=<YYYY-MM-DD>OccurrenceDateFilterdate_end=<YYYY-MM-DD>OccurrenceDateFilterverified=true/falseOccurrenceVerifiedverified_by_me=true/falseOccurrenceVerifiedByMeFiltertaxa_list_id=<id>OccurrenceTaxaListFilternot_taxa_list_id=<id>OccurrenceTaxaListFilterBacked by the same
OCCURRENCE_FILTER_BACKENDS+OCCURRENCE_FILTERSET_FIELDStuples wired intoOccurrenceViewSet, so the two endpoints stay in lock-step.Endpoint-specific param
agreement_coarsest_rank=<RANK>agreed_coarser_rank_*counting only LCAs at or deeper than the given rank. Accepts anyTaxonRankname (case-insensitive);UNKNOWNand unknown strings → 400.Response shape
{ "project_id": 18, "total_occurrences": 43149, "verified_count": 45, "verified_pct": 0.001, "verified_with_prediction_count": 24, "no_prediction_count": 21, "agreed_exact_count": 12, "agreed_exact_pct": 0.5, "agreed_any_rank_count": 17, "agreed_any_rank_pct": 0.7083, "agreement_coarsest_rank": null, "agreed_coarser_rank_count": null, "agreed_coarser_rank_pct": null }With
?agreement_coarsest_rank=FAMILY, the bottom three fields populate:{ "agreement_coarsest_rank": "FAMILY", "agreed_coarser_rank_count": 14, "agreed_coarser_rank_pct": 0.5833 }Field semantics
verified_*= at least one non-withdrawnIdentification.verified_with_prediction_count= verified AND has a machine prediction; used as the denominator foragreed_*_pctsince occurrences with no prediction can't agree or disagree.no_prediction_count= verified but no machine prediction (surfaced so consumers can see why the agreement denominator differs fromverified_count).agreed_exact_*= user's best identification taxon equals the model's best prediction.agreed_any_rank_*= exact matches plus disagreements whose LCA is at any real taxonomic rank (UNKNOWNexcluded, since it sorts afterSPECIESinTaxonRank.OrderedEnum). The upstream filter (e.g. a Lepidoptera include list) is what bounds the meaningful scope, not a hardcoded threshold in this function.agreed_coarser_rank_*= exact matches plus disagreements whose LCA is at the suppliedagreement_coarsest_rankor deeper.nullwhen no threshold supplied.agreement_coarsest_rank= the threshold rank that was applied (echoed back to the caller).nullwhen the param was absent.Disagreement counts are not surfaced explicitly — derivable as
verified_with_prediction_count - agreed_*_count.Usage examples
The frontend consumer (#1308) wraps this in
useModelAgreement(projectId, filters), which accepts an arbitrary filter map (including arrays for repeated params), so the occurrence list page's filter state can be threaded through unchanged.Implementation notes
queryset.distinct()) before counting so the join chain fromapply_default_filters(e.g.verified_by_me→Identification,taxa_list_id→parents_json) can't inflatetotal_occurrencesvs the verified branch.Identification) and best machine prediction (overClassification) — evaluate only on verified rows, not on the full filtered queryset.(user_taxon, machine_taxon)pairs before computation.apply_default_filterssoapply_defaults=falsebypasses project default taxa lists + score thresholds.*_pctfields are bounded to[0.0, 1.0]in the serializer.Bench
Project 18 (43,149 occurrences, 45 verified): 928ms → 350ms cold / 146ms warm after scoping subqueries to the verified set.
Across all production projects with non-zero identifications:
Pre-rework state on project 18 with
apply_defaults=false: 159s curl timeout.Test plan
ami.main.tests.TestLcaRankBetween— 7 unit tests including UNKNOWN-rank regression.ami.main.tests.TestModelAgreementForProject— empty-project + 4-bucket canonical case + coarsest_rank threshold filtering.ami.main.tests.TestOccurrenceStatsViewSet— HTTP coverage for envelope shape, draft-project 404, filter passthrough,apply_defaults=falsebypass, exact-match happy path, sister-species any-rank bucket, invalid rank → 400, UNKNOWN rejection, threshold echo.TestLcaRankBetween+TestModelAgreementForProject+TestOccurrenceStatsViewSet). The any-rank bucket test was made deterministic (it previously flaked ~50% on the random fixture taxon — pinned to two fixed Vanessa species).This PR is backend-only. The frontend consumer — the
useModelAgreementhook + the occurrence-list stats panel — lives in #1308.Follow-ups (out of scope, calling out for next rounds)
apply_default_filtersis the dominant cost on hot stats pathsFor the heaviest project (P#85, 36k post-filter occurrences) the agreement subqueries on the verified set run in <50ms. The rest of the response time is the
apply_default_filters+valid()filter stack onOccurrence.EXPLAIN ANALYZEon P#85 reveals:(project_id, determination_score)— Postgres does a Parallel Seq Scan onmain_occurrenceand discards 195,549 rows by filter to find 36,253 matching the project-default score threshold. Hot path: ~60ms.Taxon.parents_json— for projects with default include/exclude taxa lists, theparents_json__containsJSONB containment check is a row-by-row evaluation. P#85 has no taxa lists so this didn't show up here, but it would dominate for projects that do (e.g. viaOccurrenceFilter's recursive taxa filter).valid()'s anti-join tomain_detectionis fine (index scan, 36k loops on hot cache, <60ms).This affects every endpoint that calls
apply_default_filters()orOccurrence.objects.valid()—/occurrences/,/captures/,/events/, and the other stats actions on this viewset. Anywhere a project default threshold is non-zero, the same seq-scan is happening.Note: the entire bench table above was measured without these indexes (they don't exist yet), so those numbers are the no-index baseline — worst case 1.18s cold / 343ms warm. That's acceptable for a cached dashboard widget, so the indexes are a follow-up, not a blocker for this PR.
Likely cheap wins:
CREATE INDEX CONCURRENTLY main_occurrence_project_score_idx ON main_occurrence (project_id, determination_score)— index range scan instead of seq-scan-then-filter.CREATE INDEX CONCURRENTLY main_taxon_parents_json_gin_idx ON main_taxon USING gin (parents_json jsonb_path_ops)— index lookup forparents_json__containsinstead of full-row JSONB eval.Filter-driven occurrence exports
This PR's filter parity wiring (
OCCURRENCE_FILTER_BACKENDS+OCCURRENCE_FILTERSET_FIELDS) sets up a natural follow-up: let users click "Export" on/occurrences/with the current filters applied and get a job whose output matches that filtered set, without first materializing aSourceImageCollection. The export infra already has a "filters JSON → re-run backends in worker" pattern (ami/exports/utils.py:13-72generate_fake_request()+apply_filters()) but is hardwired toOccurrenceCollectionFilter. Wiring the same shared backend tuple into the exporter would close the gap.Explicit auth gate on stats viewset
OccurrenceStatsViewSetusesIsActiveStaffOrReadOnly.verified_by_me=truefrom an anon caller is safe today only becauseOccurrenceVerifiedByMeFilter.filter_querysetshort-circuits onis_authenticated. Worth an explicit gate at the viewset level rather than relying on the filter's internal short-circuit.🤖 Generated with Claude Code