docs: dev-box live testing pattern + reprocess flag interaction [abandoned]#1314
docs: dev-box live testing pattern + reprocess flag interaction [abandoned]#1314mihow wants to merge 2 commits into
Conversation
Issue #1310: null detections (empty-bbox sentinels marking "image processed, nothing found") were created before create_detections / create_classifications / create_and_update_occurrences_for_detections ran. Two consequences: 1. If any of those downstream steps failed, the image was already flagged as processed via the null marker — filter_processed_images would skip it on the next run, leaving the image permanently in a "processed but no detections" state. Observed on project 171 (400 captures with only null detections). 2. create_and_update_occurrences_for_detections iterated every detection including nulls, so each null marker spawned a phantom Occurrence with determination=NULL. Fix in ami/ml/models/pipeline.py save_results: - Run create_detections / create_classifications / create_and_update_occurrences on the real DetectionResponses only. - After those succeed, build null DetectionResponses for images that ended up without any detections and persist them via a second create_detections call. - Null responses never enter the classification / occurrence loops, so no phantom Occurrence is created even in the happy path. Tests in ami/ml/tests.py TestPipeline: - test_null_detection_does_not_create_phantom_occurrence: asserts the happy path "pipeline found nothing" creates the null marker but no Occurrence. - test_captures_not_marked_processed_after_failure: asserts that when a downstream step (create_classifications) raises, the image without a real detection is left unmarked and filter_processed_images re-yields it. Co-Authored-By: Claude <noreply@anthropic.com>
… flag interaction - live-dev-testing-atomic-rollback.md: transaction.atomic() + rollback sentinel pattern for exercising mutating code paths against shared dev DB without leaving state. Includes the docker-cp + manage.py shell stdin recipe and the AppRegistryNotReady gotcha. - reprocess-flags.md: reprocess_all_images vs reprocess_existing_detections interaction, why a job can succeed with zero DB delta, and why test_ml_job_e2e "Detections: N" is a pipeline-response count not a database delta. Both surfaced during PR #1312 e2e validation on the serbia dev box. Co-Authored-By: Claude <noreply@anthropic.com>
✅ Deploy Preview for antenna-ssec canceled.
|
✅ Deploy Preview for antenna-preview canceled.
|
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThe pipeline's ChangesML Pipeline Null Detection Deferral
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR adds internal reference documentation for ML-job live testing and flag behavior, and also updates save_results behavior around null-detection sentinels to avoid creating phantom Occurrence rows (with accompanying regression tests).
Changes:
- Add docs describing a safe
transaction.atomic()rollback pattern for live testing on shared dev DBs. - Add docs describing the interaction between
reprocess_all_imagesandreprocess_existing_detectionsacrosscollect_images/process_images/save_results. - Change
save_resultsto create null-bbox sentinel detections only after real detections/classifications/occurrences are processed; add tests for Issue #1310 behaviors.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| docs/claude/reference/reprocess-flags.md | New reference doc on reprocess flag interactions and common “DB delta” debugging pitfalls. |
| docs/claude/reference/live-dev-testing-atomic-rollback.md | New reference doc for rollback-protected live testing patterns and container script execution recipes. |
| ami/ml/tests.py | Adds regression tests ensuring null markers don’t create phantom occurrences and aren’t persisted on downstream failure. |
| ami/ml/models/pipeline.py | Reorders null-detection sentinel creation to happen after downstream save steps. |
Comments suppressed due to low confidence (1)
docs/claude/reference/reprocess-flags.md:56
- This bullet is internally inconsistent with the earlier section: in
process_images,reprocess_existing_detectionsis always set toreprocess_all_images(and only ever forced to True). So whenfeature_flags.reprocess_all_imagesis on, existing detections are always reprocessed regardless of the other flag. Consider rewording to refer explicitly to the project feature flag (vs the computed variable) or removing the impossible “and … is off” branch.
- **For regression / behavior tests on `save_results`:** the unit tests already use synthetic `PipelineResultsResponse` objects, so flag state doesn't matter there.
- **For e2e tests on a dev box:** if you want to see `Detection` row count grow, you need `reprocess_all_images=False` AND images that are genuinely unprocessed by the target pipeline. Easiest way: pick a pipeline whose detector algorithm has not run on the images yet (different `detection_algorithm_id`), OR use a fresh `SourceImageCollection` whose images have never been processed.
- **For "I ran the job and nothing changed in the DB" debugging:** check the project's `feature_flags.reprocess_all_images` first. If it's on and `reprocess_existing_detections` is off, your job ran fine but performed updates in place. Use celeryworker logs (`Saved pipeline results to database with N detections`) for evidence of activity.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - All collected images are sent to the pipeline (filter bypassed). | ||
| - BUT `reprocess_existing_detections` becomes `True` too (line 232), so existing detection rows are sent with them. | ||
| - The pipeline returns those same detections (often with updated classifications). | ||
| - `create_detections` in `save_results` matches incoming detections against existing rows by `(source_image, bbox, detection_algorithm)` and **updates in place** — no new `Detection` rows. | ||
| - Net effect: detection count and PKs unchanged after the job. **Do not interpret this as "save_results didn't run."** It did; it just didn't need to create new rows. |
|
|
||
| ## Example: validating PR #1312 `save_results` fix | ||
|
|
||
| Full scripts in `docs/claude/sessions/2026-05-19-pr-1312-premptive-processed-marker.md`. The pattern lets you: |
| @@ -981,6 +972,22 @@ def save_results( | |||
| logger=job_logger, | |||
| ) | |||
|
|
|||
| # Mark images with no real detections as processed by creating null-bbox sentinels. | |||
| # Issue #1310: must run AFTER the real-detection / classification / occurrence steps | |||
| # so a failure earlier in the pipeline leaves the image unmarked (and therefore | |||
| # re-processed by filter_processed_images on the next run). Null DetectionResponses | |||
| # are kept out of the real-detection list so they bypass occurrence creation entirely. | |||
| null_detection_responses = create_null_detections_for_undetected_images( | |||
| results=results, | |||
| detection_algorithm=detection_algorithm, | |||
| logger=job_logger, | |||
| ) | |||
| create_detections( | |||
| detections=null_detection_responses, | |||
| algorithms_known=algorithms_known, | |||
| logger=job_logger, | |||
| ) | |||
|
|
|||
[PR abandoned — closed by author. See PR #1312 for the underlying fix.]