docs: dev-box live testing pattern + reprocess flag interaction [abandoned] by mihow · Pull Request #1314 · RolnickLab/antenna

mihow · 2026-05-20T21:40:37Z

[PR abandoned — closed by author. See PR #1312 for the underlying fix.]

Issue #1310: null detections (empty-bbox sentinels marking "image processed, nothing found") were created before create_detections / create_classifications / create_and_update_occurrences_for_detections ran. Two consequences: 1. If any of those downstream steps failed, the image was already flagged as processed via the null marker — filter_processed_images would skip it on the next run, leaving the image permanently in a "processed but no detections" state. Observed on project 171 (400 captures with only null detections). 2. create_and_update_occurrences_for_detections iterated every detection including nulls, so each null marker spawned a phantom Occurrence with determination=NULL. Fix in ami/ml/models/pipeline.py save_results: - Run create_detections / create_classifications / create_and_update_occurrences on the real DetectionResponses only. - After those succeed, build null DetectionResponses for images that ended up without any detections and persist them via a second create_detections call. - Null responses never enter the classification / occurrence loops, so no phantom Occurrence is created even in the happy path. Tests in ami/ml/tests.py TestPipeline: - test_null_detection_does_not_create_phantom_occurrence: asserts the happy path "pipeline found nothing" creates the null marker but no Occurrence. - test_captures_not_marked_processed_after_failure: asserts that when a downstream step (create_classifications) raises, the image without a real detection is left unmarked and filter_processed_images re-yields it. Co-Authored-By: Claude <noreply@anthropic.com>

… flag interaction - live-dev-testing-atomic-rollback.md: transaction.atomic() + rollback sentinel pattern for exercising mutating code paths against shared dev DB without leaving state. Includes the docker-cp + manage.py shell stdin recipe and the AppRegistryNotReady gotcha. - reprocess-flags.md: reprocess_all_images vs reprocess_existing_detections interaction, why a job can succeed with zero DB delta, and why test_ml_job_e2e "Detections: N" is a pipeline-response count not a database delta. Both surfaced during PR #1312 e2e validation on the serbia dev box. Co-Authored-By: Claude <noreply@anthropic.com>

netlify · 2026-05-20T21:40:41Z

✅ Deploy Preview for antenna-ssec canceled.

Name	Link
🔨 Latest commit	`c5b1df4`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-ssec/deploys/6a0e2a5737a3e20009d39825

netlify · 2026-05-20T21:40:41Z

✅ Deploy Preview for antenna-preview canceled.

Name	Link
🔨 Latest commit	`c5b1df4`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-preview/deploys/6a0e2a57a34dcc0007e306f3

coderabbitai · 2026-05-20T21:40:48Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

The pipeline's save_results method relocates null-bbox detection creation from the early pipeline stage to after all real detections have been classified and occurrences determined, with two regression tests verifying correct null-detection behavior and operational documentation explaining testing patterns and reprocessing flag interactions.

Changes

ML Pipeline Null Detection Deferral

Layer / File(s)	Summary
Deferred null detection creation in save_results `ami/ml/models/pipeline.py`	The early injection of null-bbox sentinel detections into `results.detections` is removed; after classification and occurrence determination complete, null detections for unmarked images are created and persisted via `create_detections`, ensuring sentinels don't participate in downstream processing.
Null detection regression test coverage `ami/ml/tests.py`	Import of `Occurrence` and two new regression tests: one asserts null detections don't create phantom `Occurrence` rows with `determination=None`; the other verifies that null marker creation is deferred and skipped if `save_results()` fails after detection bulk creation, leaving images available for reprocessing.
Operational and testing documentation `docs/claude/reference/live-dev-testing-atomic-rollback.md`, `docs/claude/reference/reprocess-flags.md`	New documentation for a safe Django atomic/rollback testing pattern against shared dev databases, and a guide to `reprocess_all_images` and `reprocess_existing_detections` flag interactions, including testing implications and DB change verification strategies.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit hops through code with care,
Null sentinels deferred with flair,
No phantom rows or ghost Detection,
Just ordered truth and right direction. 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description covers the summary, list of changes, and test plan. However, it lacks several template sections including Related Issues, Detailed Description, How to Test, Screenshots, Deployment Notes, and the required Checklist.	Add the missing template sections: explicitly reference Related Issues (e.g., `#1312`), expand Detailed Description explaining the impact, add How to Test instructions, and complete the Checklist with checkboxes to confirm testing and documentation review.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding two reference documentation files about dev-box live testing patterns and reprocess flag interactions.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/dev-testing-references

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR adds internal reference documentation for ML-job live testing and flag behavior, and also updates save_results behavior around null-detection sentinels to avoid creating phantom Occurrence rows (with accompanying regression tests).

Changes:

Add docs describing a safe transaction.atomic() rollback pattern for live testing on shared dev DBs.
Add docs describing the interaction between reprocess_all_images and reprocess_existing_detections across collect_images/process_images/save_results.
Change save_results to create null-bbox sentinel detections only after real detections/classifications/occurrences are processed; add tests for Issue #1310 behaviors.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
docs/claude/reference/reprocess-flags.md	New reference doc on reprocess flag interactions and common “DB delta” debugging pitfalls.
docs/claude/reference/live-dev-testing-atomic-rollback.md	New reference doc for rollback-protected live testing patterns and container script execution recipes.
ami/ml/tests.py	Adds regression tests ensuring null markers don’t create phantom occurrences and aren’t persisted on downstream failure.
ami/ml/models/pipeline.py	Reorders null-detection sentinel creation to happen after downstream save steps.

Comments suppressed due to low confidence (1)

docs/claude/reference/reprocess-flags.md:56

This bullet is internally inconsistent with the earlier section: in process_images, reprocess_existing_detections is always set to reprocess_all_images (and only ever forced to True). So when feature_flags.reprocess_all_images is on, existing detections are always reprocessed regardless of the other flag. Consider rewording to refer explicitly to the project feature flag (vs the computed variable) or removing the impossible “and … is off” branch.

- **For regression / behavior tests on `save_results`:** the unit tests already use synthetic `PipelineResultsResponse` objects, so flag state doesn't matter there.
- **For e2e tests on a dev box:** if you want to see `Detection` row count grow, you need `reprocess_all_images=False` AND images that are genuinely unprocessed by the target pipeline. Easiest way: pick a pipeline whose detector algorithm has not run on the images yet (different `detection_algorithm_id`), OR use a fresh `SourceImageCollection` whose images have never been processed.
- **For "I ran the job and nothing changed in the DB" debugging:** check the project's `feature_flags.reprocess_all_images` first. If it's on and `reprocess_existing_detections` is off, your job ran fine but performed updates in place. Use celeryworker logs (`Saved pipeline results to database with N detections`) for evidence of activity.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+- All collected images are sent to the pipeline (filter bypassed).
+- BUT `reprocess_existing_detections` becomes `True` too (line 232), so existing detection rows are sent with them.
+- The pipeline returns those same detections (often with updated classifications).
+- `create_detections` in `save_results` matches incoming detections against existing rows by `(source_image, bbox, detection_algorithm)` and **updates in place** — no new `Detection` rows.
+- Net effect: detection count and PKs unchanged after the job. **Do not interpret this as "save_results didn't run."** It did; it just didn't need to create new rows.


+
+## Example: validating PR #1312 `save_results` fix
+
+Full scripts in `docs/claude/sessions/2026-05-19-pr-1312-premptive-processed-marker.md`. The pattern lets you:


@@ -981,6 +972,22 @@ def save_results(
        logger=job_logger,
    )

+    # Mark images with no real detections as processed by creating null-bbox sentinels.
+    # Issue #1310: must run AFTER the real-detection / classification / occurrence steps
+    # so a failure earlier in the pipeline leaves the image unmarked (and therefore
+    # re-processed by filter_processed_images on the next run). Null DetectionResponses
+    # are kept out of the real-detection list so they bypass occurrence creation entirely.
+    null_detection_responses = create_null_detections_for_undetected_images(
+        results=results,
+        detection_algorithm=detection_algorithm,
+        logger=job_logger,
+    )
+    create_detections(
+        detections=null_detection_responses,
+        algorithms_known=algorithms_known,
+        logger=job_logger,
+    )
+


mihow and others added 2 commits May 19, 2026 17:41

Copilot AI review requested due to automatic review settings May 20, 2026 21:40

Copilot started reviewing on behalf of mihow May 20, 2026 21:41 View session

mihow closed this May 20, 2026

mihow deleted the docs/dev-testing-references branch May 20, 2026 21:43

Copilot AI reviewed May 20, 2026

View reviewed changes

mihow changed the title ~~docs: dev-box live testing pattern + reprocess flag interaction~~ docs: dev-box live testing pattern + reprocess flag interaction [abandoned] May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: dev-box live testing pattern + reprocess flag interaction [abandoned]#1314

docs: dev-box live testing pattern + reprocess flag interaction [abandoned]#1314
mihow wants to merge 2 commits into
mainfrom
docs/dev-testing-references

mihow commented May 20, 2026 •

edited

Loading

Uh oh!

netlify Bot commented May 20, 2026 •

edited

Loading

Uh oh!

netlify Bot commented May 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Example: validating PR #1312 `save_results` fix

		Full scripts in `docs/claude/sessions/2026-05-19-pr-1312-premptive-processed-marker.md`. The pattern lets you:

Conversation

mihow commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-ssec canceled.

Uh oh!

netlify Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-preview canceled.

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mihow commented May 20, 2026 •

edited

Loading

netlify Bot commented May 20, 2026 •

edited

Loading

netlify Bot commented May 20, 2026 •

edited

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading