fix(occurrences): remove all n+1 queries from occurrence API #1274
Conversation
… remove N+1 Issue: #1271 OccurrenceViewSet.list issued one query per occurrence to fetch detection paths, the best machine prediction, and the best identification, scaling roughly linearly with page size and producing the 34-40s cold-cache responses tracked in Sentry as AMI-PLATFORM-API-BMY. This change adds reusable Prefetch factories in ami/main/models_future/occurrence.py and wires them into the list path: - prefetch_detection_images() populates a `prefetched_detection_images` to_attr list of detections that have a path - prefetch_classifications_for_best_prediction() populates detections__classifications with select_related taxon and algorithm - best_prediction_from_prefetch / best_identification_from_prefetch pick the winners in Python without re-querying OccurrenceQuerySet.with_list_prefetches() bundles these prefetches and is called from OccurrenceViewSet.get_queryset only when action == 'list'. OccurrenceListSerializer reads from the prefetch caches when present and falls back to the existing model methods for non-list callers (admin, exports, signals). Local measurement on a project with 25 occurrences: page size | before | after | reduction limit=5 | 24 | 13 | 1.8x limit=25 | 64 | 7 | 9.1x A new TestOccurrenceListQueryCount guards against regression by asserting the query count does not grow with page size. Follow-ups (separate PR): - django-zen-queries QueriesDisabledViewMixin to fail loudly on lazy queries during list rendering - audit other large-list serializers (SourceImage, Taxon) for the same pattern Co-Authored-By: Claude <noreply@anthropic.com>
✅ Deploy Preview for antenna-preview canceled.
|
✅ Deploy Preview for antenna-ssec canceled.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughList/detail queryset prefetches are split and centralized into new helpers; serializers consume prefetched detections/classifications/identifications via helper functions for best-identification, best-prediction, and detection image URLs. Tests, exporter usage, and a benchmark script were added/updated to exercise and validate the prefetch contract. ChangesOccurrence list/detail prefetch + serializer helpers
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client as Client
participant View as OccurrenceViewSet\n(get_queryset)
participant QS as OccurrenceQuerySet\n(with_*_prefetches)
participant DB as Database
participant Serializer as OccurrenceList/Detail\nSerializer
participant Helpers as models_future/occurrence.py
Client->>View: GET /api/v2/occurrences/ or /api/v2/occurrences/{id}/
View->>QS: build queryset -> with_list_prefetches() / with_detail_prefetches()
QS->>DB: query occurrences + prefetch(detections, classifications, identifications...)
DB-->>QS: rows + prefetched related objects
View->>Serializer: pass queryset / instance
Serializer->>Helpers: best_identification_from_prefetch / best_prediction_from_prefetch / detection_image_urls_from_prefetch
Helpers-->>Serializer: chosen identification/prediction and image URLs
Serializer-->>Client: JSON response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR targets the /api/v2/occurrences/ list endpoint’s dominant N+1 sources by adding reusable Prefetch factories, wiring them into the list queryset, and teaching the list serializer to consume prefetched caches (with fallbacks), plus adding a regression query-count test.
Changes:
- Added
ami/main/models_future/occurrence.pywith reusablePrefetchfactories and Python helpers for selecting “best” records from prefetched data. - Added
OccurrenceQuerySet.with_list_prefetches()and applied it only forOccurrenceViewSetlist actions. - Updated
OccurrenceListSerializerto read detection images / best prediction / best identification from prefetch caches when present, and added a query-count regression test.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| ami/main/tests.py | Adds TestOccurrenceListQueryCount to guard against query-count scaling with page size. |
| ami/main/models_future/occurrence.py | Introduces prefetch factories and helper functions to pick best prediction/identification and derive detection image URLs from caches. |
| ami/main/models.py | Adds OccurrenceQuerySet.with_list_prefetches() to apply the list serializer’s required prefetches. |
| ami/main/api/views.py | Applies list-only prefetching via with_list_prefetches(); keeps existing detail prefetch behavior. |
| ami/main/api/serializers.py | Routes list fields through prefetched data when available, with fallbacks to existing model methods. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ami/main/models_future/occurrence.py`:
- Around line 69-91: best_prediction_from_prefetch currently aggregates all
det.classifications and ranks them, which diverges from
Occurrence.best_prediction because that method first applies per-algorithm
max-score filtering via Occurrence.predictions(); update
best_prediction_from_prefetch to call/use Occurrence.predictions() (or replicate
its per-algorithm max-score grouping) to produce the filtered set of
Classification objects, then apply the same sort key (terminal, -score, -pk) and
return the first element; ensure you reference and reuse
Occurrence.predictions() so the prefetch path preserves identical semantics to
best_prediction.
In `@ami/main/tests.py`:
- Around line 2956-2959: The try/except around caches["default"].clear() is
swallowing failures—remove the broad except or replace it with a specific catch
that fails the test; call caches["default"].clear() directly (so exceptions
propagate) or if you must catch, re-raise or call pytest.fail() with the caught
exception text so a cache-clear failure causes the test to fail and exposes the
error rather than hiding it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 829c8693-706f-40db-89be-84b6b55a965a
📒 Files selected for processing (5)
ami/main/api/serializers.pyami/main/api/views.pyami/main/models.pyami/main/models_future/occurrence.pyami/main/tests.py
- Prevent detail-path N+1: `_best_prediction` now requires both
`detections` AND nested `classifications` to be in the prefetch cache
before calling `best_prediction_from_prefetch`. Detail viewset only
prefetches `detections`, so the previous gate still walked
`det.classifications.all()` lazily (one query per detection).
- Mirror `Occurrence.best_prediction` semantics in the prefetch helper:
`Occurrence.predictions()` filters classifications to the per-algorithm
max-score row before ordering by `-terminal, -score`. The helper now
applies the same per-algorithm grouping in Python so list and detail
return the same winner for occurrences whose top-scoring classification
is non-terminal.
- Add deterministic tiebreaker to `best_identification_from_prefetch`:
tuple-keyed comparison (`created_at_is_set, created_at, pk`) treats
`None` timestamps as lowest and breaks ties by `pk`, matching
`order_by('-created_at', '-pk')`.
- Collapse the duplicate `detections` prefetch: prior PR loaded
detections twice (once via a `to_attr` filtered list, once via
`detections__classifications`). Replace both with a single
`prefetch_detections_for_list()` that nests classifications, and
filter for non-null paths in Python. Saves one query per request.
- Don't swallow cache-clear failures in the regression guard. A silent
cache-clear failure would let the warm cache hide query-count
regressions.
Refs: #1271
PR: #1274
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ami/main/api/serializers.py`:
- Around line 1377-1379: The fast-path should only run for list responses, not
any object with a cached 'detections' relation. Change the condition in the
serializer method (the branch using detection_image_urls_from_prefetch and
obj.detection_images) to require an explicit list-only marker (e.g. check
getattr(obj, "_prefetched_for_list", False) && "detections" in getattr(obj,
"_prefetched_objects_cache", {})); then set that marker when you build the list
queryset that prefetches detections (for example, after the prefetch in the list
view or in the queryset iteration set obj._prefetched_for_list = True) so
OccurrenceSerializer and other detail callers still call detection_images() and
preserve ordering/filtering.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 274a54ab-5ddf-4af8-9c00-c63dab62b87b
📒 Files selected for processing (3)
ami/main/api/serializers.pyami/main/models_future/occurrence.pyami/main/tests.py
Mirror BEST_MACHINE_PREDICTION_ORDER: define the identification ordering
once and use it everywhere a "best identification" is selected.
- New constant `BEST_IDENTIFICATION_ORDER = ("-created_at", "-pk")`
alongside `BEST_MACHINE_PREDICTION_ORDER`.
- `Occurrence.best_identification` and `OccurrenceQuerySet.with_verification_info`
now order by the constant. Adds a `-pk` deterministic tiebreaker; created_at
is auto_now_add so equal timestamps are vanishingly rare in practice.
- `best_identification_from_prefetch` references the constant in its
docstring (the helper's tuple key already encodes the same ordering).
Co-Authored-By: Claude <noreply@anthropic.com>
…paths Per PR review: the cache-key-gated fallbacks in `OccurrenceListSerializer` exercised the same N+1 path the PR closed, and silently triggered on the detail endpoint via shared serializer inheritance. - Drop `_best_prediction`, `_best_identification`, `get_detection_images` fallbacks. Serializer trusts the prefetch contract. - Add `prefetch_detections_for_detail()` factory + `with_detail_prefetches()` queryset method. Detail viewset now applies it instead of an inline Prefetch. - Drop `has_prefetched_classifications()` helper (no longer needed). - Add `TestOccurrenceDetailQueryCount` regression guard. Detail endpoint measures 12 queries; list (warm test client) measures 6. Closes review threads on PR #1274 (mihow comments 3164296309/3164306519, CodeRabbit comment 3163836361). Co-Authored-By: Claude <noreply@anthropic.com>
…on_images
Per PR review (mihow): adopt the 'disable model methods' safety pattern as
strict prefetch contract. Helpers now raise RuntimeError when required
relations are missing from `_prefetched_objects_cache` instead of silently
slow-pathing through Django's lazy lookups.
- `_require_prefetch()` gate on best_prediction / best_identification /
detection_image_urls helpers.
- Restore old behavior: list responses cap detection_images at 3
(`OccurrenceListSerializer.detection_images_limit`). Detail subclass
overrides to None (unbounded) — TODO: bound when occurrence tracking
enables thousands of detections per occurrence.
- New tests:
- TestOccurrenceResponseShape pins list + detail JSON keys + the list cap.
- TestOccurrencePrefetchHelpersEdgeCases asserts safe defaults on empty
prefetched relations, and RuntimeError when prefetch is missing.
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ami/main/models_future/occurrence.py`:
- Around line 83-85: The strict-path prefetch check currently only asserts
detections were prefetched but then iterates det.classifications.all(), which
can trigger per-detection queries; update the check to require the nested
classifications prefetch (use _require_prefetch(occurrence,
"detections__classifications") or otherwise validate that each Detection object
has a prefetched 'classifications' attribute) before iterating, so the code
fails fast when nested prefetching is missing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dff16c79-6ad9-4049-9270-883a29b5cf76
📒 Files selected for processing (5)
ami/main/api/serializers.pyami/main/api/views.pyami/main/models.pyami/main/models_future/occurrence.pyami/main/tests.py
🚧 Files skipped from review as they are similar to previous changes (2)
- ami/main/api/views.py
- ami/main/tests.py
Drop TestOccurrenceResponseShape — most assertions are key-presence checks already enforced by serializer Meta. Fold the only valuable assertion (the detection_images per-row cap on list responses) into _list_occurrences in TestOccurrenceListQueryCount where the response data is already inspected. Co-Authored-By: Claude <noreply@anthropic.com>
…cover image to 1
Exports invoke OccurrenceExportSerializer (subclasses OccurrenceSerializer),
whose inherited get_determination_details / _best_prediction call the strict
prefetch helpers in models_future/occurrence.py. JSONExporter.get_queryset()
did not apply the prefetch, so two CI tests in ami.exports.tests.DataExportTest
raised RuntimeError ("Occurrence X is missing prefetched relations
['detections']"). Add .with_detail_prefetches() to the exporter queryset.
Cap OccurrenceListSerializer.detection_images_limit from 3 to 1 — list cards
render a single cover image; payload stays tight without changing response
shape.
Document the nested-prefetch caveat in _require_prefetch: it only checks
top-level relations, so a caller prefetching 'detections' without nested
'classifications' silently re-introduces per-detection queries. Tighter
enforcement is deferred to the django-zen-queries follow-up (issue #1271),
which catches per-row queries at the view boundary regardless of depth.
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
best_prediction_from_prefetch:
- Skip classifications with score=None (mirrors SQL NULL semantics:
Max(NULL) = NULL, score IN (NULL) never matches).
- Removes the asymmetry between -inf in max calc and 0.0 in sort key.
- Document Python-vs-SQL semantic difference around per-algorithm
max grouping (Python is the stricter interpretation).
TestOccurrenceDetailQueryCount:
- Add test_detail_query_count_does_not_scale_with_detections,
inflating one occurrence to 9 detections x 4 classifications.
The single-detection test alone could not catch a multi-detection
N+1; this guards the actual property the test name asserts.
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
ami/main/api/serializers.py (1)
1384-1426: ⚡ Quick winMemoize
_best_predictionper occurrence to avoid duplicate scans.
_best_prediction(obj)is used in bothget_determination_details(Line 1406) andget_best_machine_prediction(Line 1425). Caching it once on the object avoids repeated Python traversal of prefetched detections/classifications.Suggested refactor
def _best_prediction(self, obj: Occurrence): from ami.main.models_future.occurrence import best_prediction_from_prefetch - return best_prediction_from_prefetch(obj) + cache_attr = "_cached_best_prediction_from_prefetch" + if hasattr(obj, cache_attr): + return getattr(obj, cache_attr) + value = best_prediction_from_prefetch(obj) + setattr(obj, cache_attr, value) + return value🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ami/main/api/serializers.py` around lines 1384 - 1426, The _best_prediction traversal is called twice; modify _best_prediction to memoize its result on the Occurrence instance (e.g., check for and return obj._cached_best_prediction if present, otherwise compute via best_prediction_from_prefetch(obj), assign it to obj._cached_best_prediction and return it) so both get_determination_details and get_best_machine_prediction reuse the cached value and avoid duplicate scans; update only the _best_prediction function (reference symbols: _best_prediction, best_prediction_from_prefetch, get_determination_details, get_best_machine_prediction) and ensure you cache None results as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ami/main/tests.py`:
- Around line 3049-3054: The test creates a Detection with bbox set to
normalized floats but the codebase expects absolute pixel coordinates; update
the Detection.objects.create call to supply integer pixel bbox values instead of
normalized floats—for example compute with the fixture image dimensions (e.g.,
[int(source_image.width*0.1), int(source_image.height*0.1),
int(source_image.width*0.2), int(source_image.height*0.2)]) or use explicit
pixel values like [10, 10, 20, 20] for the bbox field so the Detection.bbox
matches production assumptions.
---
Nitpick comments:
In `@ami/main/api/serializers.py`:
- Around line 1384-1426: The _best_prediction traversal is called twice; modify
_best_prediction to memoize its result on the Occurrence instance (e.g., check
for and return obj._cached_best_prediction if present, otherwise compute via
best_prediction_from_prefetch(obj), assign it to obj._cached_best_prediction and
return it) so both get_determination_details and get_best_machine_prediction
reuse the cached value and avoid duplicate scans; update only the
_best_prediction function (reference symbols: _best_prediction,
best_prediction_from_prefetch, get_determination_details,
get_best_machine_prediction) and ensure you cache None results as well.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2a4cdc38-93a6-40e4-883d-075bc1815960
📒 Files selected for processing (4)
ami/exports/format_types.pyami/main/api/serializers.pyami/main/models_future/occurrence.pyami/main/tests.py
🚧 Files skipped from review as they are similar to previous changes (1)
- ami/main/models_future/occurrence.py
Staging A/B benchmarkSide-by-side run against Single-request latency (5 runs each, varied offset to bypass cachalot)
The wall-clock speedup is smaller than the 9.1× query-count reduction at limit=25 — per-request fixed costs (TLS, auth, project filter, presigned URL generation) don't scale with query count, so they dominate the residual. Detail endpoint (3 runs)
Concurrent load (10 in flight, 30 total,
|
| metric | arctia (PR) | serbia (baseline) | speedup |
|---|---|---|---|
| wall | 17.2s | 47.8s | 2.78× |
| p50 | 5.92s | 14.80s | 2.50× |
| p95 | 6.51s | 21.29s | 3.27× |
| p99 | 6.54s | 21.68s | 3.31× |
Tail widens under load — the PR holds steady, the baseline blows out at p95/p99.
Worker-pool unblocking — /captures/?project_id=5&limit=10 while 8 /occurrences/?limit=25 are in flight
| env | captures baseline | captures under occurrence load |
|---|---|---|
| arctia (PR) | 0.32–0.50s | 1.46–2.37s |
| serbia (baseline) | 0.91–1.66s | 3.50–5.72s |
Captures-under-occurrence-load on the PR (~2s) is roughly equivalent to captures-no-load on the baseline (~1.2s). Workers free up ~3× faster, so the "every page feels frozen" symptom under multi-reviewer use should largely go away.
Other view A/B (PR doesn't change these — sanity check that nothing regressed)
| view | arctia (PR) | serbia (baseline) |
|---|---|---|
| captures | 0.40s | 0.89s |
| taxa | 0.27s | 0.44s |
| events | 0.45s | 0.90s |
| deployments | 0.32s | 0.48s |
| jobs | 0.64s | 1.10s |
| sourceimages | 0.27s | 0.26s |
sourceimages is identical on both → confirms baseline path. Other deltas are infra/cache noise (sequential probes, no shared load); the PR doesn't touch these viewsets.
Production sizing reference (no benchmark run on prod, just for scale)
api.antenna.insectai.org Vermont Atlas of Life (project 18, 31,741 occurrences):
| limit | latency (no PR) |
|---|---|
| 5 | 1.25s avg |
| 25 | 2.24s avg |
| 100 | 6.07s avg |
Post-deploy: based on the staging multipliers, expect roughly halved latency at limit=25 (~1.0s) and ~2× at limit=5 (~0.6s). Larger pages benefit less in absolute terms because per-row serializer cost dominates above the N+1 ceiling.
Correctness
- Top-level keys identical on PR vs baseline ✓
detection_imageslength = 1 in list response on both (this dataset has 1 detection per occurrence; cap not visible)- Detail content diff shows only env-specific URLs and a re-run classification (different
id/score/created_at) — expected, since arctia ran ML again - Identification create flow exercised in UI on arctia (occurrences 50781, 50804 identified live, no 500) — confirms strict prefetch contract holds under POST
- Occurrences-by-collection JSON export triggered for 7,090 records — validates
JSONExporter.get_queryset()strict-prefetch wiring
Still pending
- Soak window on
AMI-PLATFORM-API-BMY(the original 34–40s cold-response Sentry issue) — needs ~30 min of real reviewer load on arctia - Production benchmark on
api.antenna.insectai.orgafter merge
Production load test (api.antenna.insectai.org, project 18 = 31,741 occurrences)Light concurrency, no concurrent users at probe time.
Prod baseline holds up better than Export verification status
|
Reusable curl + xargs runner that probes the project's occurrence count, then issues N parallel requests at varied offsets to bypass cachalot. Reports wall, avg, min, p50, p95, p99, max, and per-status error counts. Used to A/B compare staging deployments (arctia vs serbia) and to characterize prod baseline for PR #1274. Re-run after deploy to verify the speedup against prod's pre-merge numbers in the PR thread. Co-Authored-By: Claude <noreply@anthropic.com>
Production load test — scaledRuns against
Findings:
Inferred post-merge expectation, based on the staging multipliers (~3.3× p95 reduction at limit=25 under concurrent load):
Reproducer# Pre-merge prod baseline
./scripts/benchmark_occurrences_list.sh https://api.antenna.insectai.org 18 10 30 25
./scripts/benchmark_occurrences_list.sh https://api.antenna.insectai.org 18 20 60 25
./scripts/benchmark_occurrences_list.sh https://api.antenna.insectai.org 18 10 30 100
# Post-merge prod re-run (same exact commands)
# Compare against the table above.The script:
|
…rings Simplify the prefetch contract surface introduced in this branch: - Drop _best_prediction / _best_identification thin wrappers in OccurrenceListSerializer; inline best_*_from_prefetch calls. - Drop prefetches_for_list_serializer / prefetches_for_detail_serializer 1-line list wrappers; with_list_prefetches / with_detail_prefetches now call the Prefetch factories directly. - Compress _require_prefetch and best_*_from_prefetch docstrings; keep the strict-prefetch behavior and the SQL/Python null-handling note as a single line each. - OccurrenceSerializer.detection_images_limit = 100 (was None) — bound detail responses from the start; pagination is the long-term answer. Net -47 lines, no behavior change beyond the detail detection_images cap. Co-Authored-By: Claude <noreply@anthropic.com>
Audit follow-up to PR #1274 (Occurrence list N+1 fix). Both viewsets already scale flat (4q at limit=25 with cachalot disabled and warm pool). Tests pin the current behaviour so future serializer changes can't silently reintroduce per-row queries. Co-Authored-By: Claude <noreply@anthropic.com>
Field was declared as TaxonNestedSerializer(read_only=True) with no source, but Taxon has no `parents` attribute (only `parents_json` JSONB cache and the `parent` FK). DRF resolved getattr(taxon, "parents") to nothing, so the field rendered null for every taxon in the list response. Match the detail serializer at TaxonSerializer.parents (line 865) by reading from parents_json with TaxonParentSerializer. UI in ui/src/data-services/models/taxa.ts already consumes this shape and falls back when missing — fix lets the list path populate ranks directly. Found during audit follow-up to PR #1274. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
ami/main/tests.py (1)
3056-3063: 💤 Low valueUse a timezone-aware timestamp here for consistency with the surrounding fixtures.
datetime.datetime.now()is naive, while every other timestamp in this test (and the seed detection on Line 3052) comes fromsource_image.timestamp, which is tz-aware. Mixing a naive value can trigger Django'sRuntimeWarning: DateTimeField received a naive datetime(and is order-fragile underUSE_TZ=True). Reusingsource_image.timestampkeeps the test self-consistent and avoids the warning.Suggested fix
- timestamp=datetime.datetime.now(), + timestamp=source_image.timestamp,The unused
import datetimeat Line 3034 can then also be dropped.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/main/tests.py` around lines 3056 - 3063, The test creates Classification rows with a naive timestamp via datetime.datetime.now(); update the Classification.objects.create calls in the loop to use the existing tz-aware source_image.timestamp (or seed_classification.timestamp if more appropriate) instead of datetime.datetime.now() to ensure consistency with surrounding fixtures and avoid Django's naive datetime warning, and remove the now-unused import datetime at the top of the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/benchmark_occurrences_list.sh`:
- Around line 39-44: The script fails when MAX_OFFSET is 0 because offset is
computed with RANDOM % ${MAX_OFFSET}; update the offset logic in the benchmark
loop (the block using variables START, TOTAL, CONCURRENCY, MAX_OFFSET, BASE_URL,
PROJECT_ID, LIMIT, offset, OUT) to guard against division by zero—compute offset
only when MAX_OFFSET > 0 (e.g., use RANDOM % MAX_OFFSET) and otherwise set
offset to 0 (or a fixed safe value); ensure this conditional is applied inside
the xargs/bash -c invocation so curl uses a valid offset for small datasets.
---
Nitpick comments:
In `@ami/main/tests.py`:
- Around line 3056-3063: The test creates Classification rows with a naive
timestamp via datetime.datetime.now(); update the Classification.objects.create
calls in the loop to use the existing tz-aware source_image.timestamp (or
seed_classification.timestamp if more appropriate) instead of
datetime.datetime.now() to ensure consistency with surrounding fixtures and
avoid Django's naive datetime warning, and remove the now-unused import datetime
at the top of the file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 933dc148-f716-4ac5-8055-f00c3fe63cdb
📒 Files selected for processing (5)
ami/main/api/serializers.pyami/main/models.pyami/main/models_future/occurrence.pyami/main/tests.pyscripts/benchmark_occurrences_list.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- ami/main/models.py
- ami/main/tests.py: switch Detection.bbox in the inflated-detections fixture to pixel-space ints, matching the rest of the codebase's bbox conventions. - scripts/benchmark_occurrences_list.sh: guard against RANDOM % 0 when the project has fewer than LIMIT occurrences (MAX_OFFSET = 0). Co-Authored-By: Claude <noreply@anthropic.com>
Summary
Speeds up the occurrences list endpoint by ~9x and frees up gunicorn workers that were previously pinned for 30+ seconds per request, which under multi-user load was stalling every other API endpoint sharing the pool (captures, jobs, deployments — "every page feels frozen"). Fewer queries per request also drops Postgres load and lets the same hardware support more concurrent reviewers. Closes #1271 (Sentry
AMI-PLATFORM-API-BMY, 34–40s cold responses).Mechanically: introduces reusable
Prefetchfactories and wires them intoOccurrenceViewSetlist and detail paths. Strict prefetch contract — serializer raisesRuntimeErrorif a caller skips the prefetch instead of silently re-introducing per-row queries.Measurement
Query count (in-test,
CaptureQueriesContext)In-test project with 25 occurrences (each: one detection, one classification). Cachalot cache cleared between runs.
Baseline grew ~1.6 queries per occurrence (clear N+1). Post-fix is flat regardless of page size.
Concurrent latency A/B (arctia=PR vs serbia=baseline, project 5, ~5k occurrences)
Prod broke at 20 concurrent (4× 502 errors at limit=25) and saturated at 10P × limit=100 with 59.6s max. Arctia (PR-deployed) holds 40 concurrent without errors; breaks at 60. ~3× capacity vs prod baseline. Reproducible via
scripts/benchmark_occurrences_list.sh.Changes
ami/main/models_future/occurrence.py(new module)prefetch_detections_for_list()— detections + nested classifications, ordered for stable galleriesprefetch_detections_for_detail()— detections + nested classifications +source_image, ordered most-recent-firstbest_prediction_from_prefetch()— mirrorsOccurrence.predictions()per-algorithm max-score grouping in Pythonbest_identification_from_prefetch()— mirrorsBEST_IDENTIFICATION_ORDER+withdrawn=Falsedetection_image_urls_from_prefetch(occurrence, limit=...)— bounded URL list_require_prefetch()— strict gate; raises if relations missing from_prefetched_objects_cacheOccurrenceQuerySet.with_list_prefetches()/with_detail_prefetches()call the Prefetch factories directly.OccurrenceListSerializerdetection_images,determination_details,best_machine_predictionroute through the strict helpers — no fallback to model methods.detection_images_limit: int | None = 1— list cards render a single cover image.OccurrenceSerializer(detail) overridesdetection_images_limit = 100.OccurrenceViewSet.get_querysetcallswith_list_prefetches()on list,with_detail_prefetches()on detail.JSONExporter.get_queryset(ami/exports/format_types.py) addswith_detail_prefetches(). The export serializer subclassesOccurrenceSerializerand inherits the strict helpers, so it needs the same prefetch contract.scripts/benchmark_occurrences_list.sh— reproducible concurrent load script with randomised offsets to bypass cachalot, p50/p95/p99/max + per-status error counts.TaxonListSerializer.parentsbug fix — field was declared asTaxonNestedSerializer(read_only=True)with nosource, butTaxonhas noparentsattribute (onlyparents_jsonJSONB +parentFK). Field rendered null for every taxon. Fixed to read fromparents_jsonviaTaxonParentSerializer(many=True, source="parents_json")— matches the detail serializer pattern atTaxonSerializer.parents. Found during the audit follow-up below.ami/main/tests.py)TestSourceImageListQueryCount— 3 variants (no_detections, with_detections, with_counts+detections)TestTaxonListQueryCount— base case@override_settings(CACHALOT_ENABLED=False)to measure cold query count.Behavior changes (read carefully)
detection_imagescapped at 1 (was unbounded via the model method, but in practice fed from the same query and rarely larger).detection_imagescapped at 100 (was unbounded; per-occurrence detection counts can reach thousands once tracking lands, pagination is the long-term answer).OccurrenceListSerializerorOccurrenceSerializermust applywith_list_prefetches()orwith_detail_prefetches()on the queryset. CI surfaced one missing site (JSONExporter) — fixed here. Audited all six occurrence-related serializers; only the three that hit the strict helpers (OccurrenceListSerializer,OccurrenceSerializer,OccurrenceExportSerializer) need prefetches, and all three are wired.TaxonListSerializer.parentsnow returns a populated[{id, name, rank}]array instead of null. UI inui/src/data-services/models/taxa.tsalready expected this shape and falls back when missing — fix lets the list path populate ranks directly.SourceImage + Taxon list audit (completed)
Audited both viewsets against the same N+1 pattern. Result: both list endpoints already scale flat — no fix needed.
Empirical cold-query counts (cachalot disabled, double-warmup):
/captures/?with_detections=false/captures/?with_detections=true/captures/?with_detections=true&with_counts=true/taxa/SourceImageViewSetis annotation- and prefetch-driven from the start (select_related("event","deployment"),Prefetch("detections", to_attr="filtered_detections")with nestedselect_related("occurrence","occurrence__determination"), subquery-basedwith_counts, cached count columns).TaxonViewSet.get_taxa_observeduses correlated subqueries foroccurrences_count/last_detected/best_determination_scoreplus aPrefetch("tags", to_attr="prefetched_tags").TaxonListSerializerrenders only annotations + JSONB + prefetched tags — no FK dereferences per row.No unified abstraction warranted right now.
_require_prefetchwas needed for Occurrence because that serializer aggregates overdetections.classificationsandidentificationsin Python. SourceImage / Taxon are annotation-driven, not aggregation-driven; lifting_require_prefetchinto a base mixin would be solution-without-problem.Caveats
_require_prefetchonly checks top-level relations. A caller prefetchingdetectionswithout nestedclassificationswill silently re-introduce per-detection queries. Mitigated by always going through the list/detail factories. Tighter depth-enforcement deferred to django-zen-queries follow-up.Deferred to follow-up
django-zen-queriesQueriesDisabledViewMixinas a guardrail (cachalot interaction needs a quick spike — cache-key lookups may register as queries)best_identification+best_detectionfields onTaxonListSerializeras a feature add — cheap subquery annotation per the audit memo (not an N+1 fix)?detection_images_limit=Nquery param onOccurrenceViewSet. The cap (list=1, detail=100) is currently a serializer class attribute; bundle into theTaxonListSerializerfeature PR when tracking-merged occurrences are common enough that a frontend gallery card actually needs to request more than 1 image. Today's data hasmax_det=1per occurrence everywhere, so exposing the knob is premature.Post-deploy validation (2026-05-11)
Merged + deployed to prod as
ad0826d5at ~22:53 UTC. Posted a production APM deployment marker for timeline correlation.Ran a synthetic UI traffic sweep against production both before deploy and after (4 separate sweeps each side, 5 iterations × 5 pages × 3 projects per sweep, 300 page-loads per side, 600 total). Each iteration runs in a fresh isolated browser context (headless Chromium, Playwright) so cache state is consistent between runs. Token-authenticated against three projects covering different occurrence-count regimes (~37k, ~2.3k, ~1.4k).
target_api_msmedians on/api/v2/occurrences/familyEvery PR-target page dropped on every project. Project 120 (smaller, detection-heavy ratio) gained the most — consistent with the fix removing per-row detection re-queries.
Control pages (audited but not modified by this PR)
No regressions.
captures_listandtaxa_list(SourceImage + Taxon list endpoints from the audit) behave as predicted.Variance tightened on the touched pages
Coefficient-of-variation (CV%) of the median target_api_ms across the 4 sweeps:
The N+1 removal eliminates a variable amount of per-row Python work, so the post-fix timing is more deterministic. Secondary win.
End-to-end page-load (DOM + assets + React hydration)
User-visible page load on the occurrences list dropped roughly 10% — directly attributable to the API speedup (TLS handshake, asset load, hydration cost is roughly constant).
Caveats
AMI-PLATFORM-API-BMYsoak (the cold 34–40s tail) is the separate signal tracked in the test plan and is left unchecked here. The APM averages above show typical traffic, not the long-tail cold cases.Test plan
python manage.py test ami.main.tests.TestOccurrenceListQueryCountpassespython manage.py test ami.main.tests.TestOccurrenceDetailQueryCountpassespython manage.py test ami.main.tests.TestOccurrencePrefetchHelpersEdgeCasespasses (zero-detection no 500, missing prefetch raises)python manage.py test ami.main.tests.TestSourceImageListQueryCount ami.main.tests.TestTaxonListQueryCountpasses (audit regression guards)python manage.py test ami.exports.tests.DataExportTest.test_export_record_count ami.exports.tests.DataExportTest.test_json_export_record_countpasses (regression caught the missing exporter prefetch)occurrences_simple_csvformat)occurrences_api_json, ~24s wall time, every record populateddetection_images+determination_details+best_machine_predictionwith zeroRuntimeErrorfrom the strict-prefetch contractAMI-PLATFORM-API-BMYsoak post-merge — confirm 34–40s cold responses go awaySummary by CodeRabbit
Release Notes
New Features
Performance Improvements