feat(ui): live stats panel in occurrence list sidebar#1308
Conversation
✅ Deploy Preview for antenna-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
326cd68 to
4ae69ec
Compare
…ry params - Rename `agreed_under_order_*` → `agreed_any_rank_*` to match the endpoint's dropped ORDER threshold (0565f06). - Add optional `agreement_coarsest_rank` + `agreed_coarser_rank_*` fields to the response type (not consumed yet — UI follows in #1308). - Widen `filters` to accept arrays and append repeated query params so multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via `request.query_params.getlist(...)`) survive. Per CodeRabbit review. Co-Authored-By: Claude <noreply@anthropic.com>
d621ac3 to
3692eba
Compare
Adds an OccurrenceStats panel above the filter sections on the occurrence list page. Consumes the /occurrences/stats/model-agreement/ endpoint, threading the same active filter array the list view sends so the numbers always reflect the current result set. Shows two metrics: verified occurrences % and human-model agreement rate % (rank-level / under-order agreement). Co-Authored-By: Claude <noreply@anthropic.com>
3692eba to
d0669ee
Compare
`StatBar` takes an optional `count` rendered as "0% (121)". Wired into the Verified occurrences bar so a small-but-nonzero verified set that rounds to 0% still surfaces the underlying count. Co-Authored-By: Claude <noreply@anthropic.com>

Summary
Frontend consumer for the
/occurrences/stats/model-agreement/endpoint added in #1307. Adds a Stats panel at the top of the occurrence list sidebar, above the filter sections.OccurrenceStatscomponent (ui/src/pages/occurrences/occurrence-stats.tsx)occurrences.tsx, threading the same active filter array the list view sends touseOccurrences— so the stats always match the current result set (taxon, deployment, date, verification status, default filters, etc.)verified_pct, with the rawverified_countshown alongside (e.g.0% (121)) so a small-but-nonzero set that rounds to 0% still surfaces the count.agreed_any_rank_pct(exact matches plus any disagreement whose LCA is at a real taxonomic rank; the upstream filter scope bounds what counts as meaningful)Stacked on the backend branch — base is
feat/human-model-agreement-endpoint(#1307), notmain. Rebase/retarget tomainonce #1307 merges.Filter parity
The panel reuses the list view's
filtersarray verbatim and converts it to query params with the same active/error rules asgetFetchUrl(value?.length && !error). The endpoint accepts the full occurrence-list filter set (#1307), so the numbers stay consistent with the visible results.Test plan
tsc --noEmit— no errors in touched fileseslint+prettierclean on new/modified files0% (121), HUMAN-MODEL AGREEMENT RATE94%.?apply_defaults=falseand the Stats panel re-queried with the same param. Same filter array drives both list and stats.Toolchain note for reviewers
The worktree
ui/has nonode_modules. Installing under the host's Node 22 breaks the dev server (nova-ui-kit dereferences a React-18 internal removed in React 19 at tailwind-config eval). Use the repo-pinned Node 18 (.nvmrc→ 18.12.0):nvm use 18.12.0 && yarn install && yarn start. Under Node 18 it boots cleanly.Design discussion (open — feedback wanted)
The "agreement rate" is the share of human-verified occurrences where the human pick matched the model's pick. The catch: only a handful of occurrences are usually verified, so the rate can swing wildly. If 1 person verified 4 occurrences and agreed on 3, the panel says "75%" — which feels solid but is really just 3 out of 4. Decisions to make before this is more than a rough indicator:
1. Show the raw counts, not just the percentage (done for verified).
A percentage hides how much data is behind it. "94%" could be 94-out-of-100 or 47-out-of-50. Verified occurrences now shows
0% (121)— the count makes "0%" readable (it's 121 out of ~24k, not literally zero). Open question: do the same for the agreement rate so it reads94% (94 of 100)— the reader instantly sees how many verifications the number is built on.2. Should we hide the agreement rate when too few occurrences are verified?
A rate built on 3 verifications isn't trustworthy. Options:
3. Show a margin of error instead of a hard cutoff.
Rather than a yes/no "enough data" line, we can show how shaky the number is. A confidence interval (specifically a Wilson score interval, which behaves well for small samples) turns "94%" into something like "94%, somewhere between 87% and 97%". When few occurrences are verified the range is wide; as more get verified it tightens. This is more honest than a binary cutoff and needs only the count + total we already return.
4. A fairer agreement score that accounts for luck (follow-up).
Plain agreement % has a blind spot: if 95% of moths in a project are one common species, the model and human will "agree" most of the time just by both guessing the common one — that's luck, not skill. Cohen's kappa (κ) is the standard fix: it measures how much they agree beyond what you'd expect by chance. κ of 1.0 = perfect, 0 = no better than guessing. It's a more defensible "how good is the model, really" number than raw %. We can compute it from the exact same human/model pairs the endpoint already collects — no extra database work. Same caveat as #2: it still only describes the occurrences people chose to verify, not the whole project. Worth doing as a follow-up if the team wants a real quality metric rather than a rough indicator.
None of these block the panel landing as a quick live indicator — they're about how much statistical weight to let users put on the number.
🤖 Generated with Claude Code