Skip to content

fix(compress-report-section): tolerate trailing characters from small-model structured output#727

Merged
neoneye merged 1 commit into
mainfrom
fix/compress-section-lenient-json-parse
May 17, 2026
Merged

fix(compress-report-section): tolerate trailing characters from small-model structured output#727
neoneye merged 1 commit into
mainfrom
fix/compress-section-lenient-json-parse

Conversation

@neoneye
Copy link
Copy Markdown
Member

@neoneye neoneye commented May 17, 2026

Summary

compress_premortem.md was consistently failing for plans whose premortem section is dense with multi-clause failure-mode rows (most recently 20251110_4DWW_India). The root cause is at the validator, not the model: the default LLM (openrouter-gemini-2.5-flash-lite-preview-09-2025) emits a syntactically valid JSON object then continues with extra tokens, and Pydantic's model_validate_json strict-rejects the whole payload with json_invalid ("trailing characters at line N column M"). The retry loop wastes its budget on a response whose prefix is already correct.

This PR adds a LenientJsonModel mixin in compress_report_section.py that catches that specific failure, extracts the first balanced JSON value via json.JSONDecoder().raw_decode, and validates that against the same schema. If the extracted value still fails (genuine schema problem), the original ValidationError is re-raised, so the recovery never masks structural issues.

All six per-bucket schemas (SectionSummaryOnly, NumericValuesOnly, LoadBearingAssumptionsOnly, GatesAndThresholdsOnly, RisksAndShocksOnly, MissingDataOnly) inherit the mixin.

Failure analysis

Captured against the 4DWW India case in experiments/napkin_math/docs/20260517_compress_premortem_failure.md — symptoms, failure pattern across 3 outer × 3 inner retries, root-cause hypothesis, and three escalation tiers (the validator-trim fix here is tier 1).

Verification

  • 17/17 unit tests pass (4 new): trailing-object, trailing-prose, well-formed input, preserved-schema-error.
  • Re-ran prepare_extract_input.py against 20251110_4DWW_India with the default LLM unchanged. compress_premortem.md and compress_premortem_raw.json were both produced on the first outer attempt, with no [premortem] Attempt N failed lines. Premortem section is now in the bundled digest.

Test plan

  • pytest worker_plan_internal/parameter_extraction/tests/test_compress_report_section.py — 17/17 green
  • Re-run compress on the previously-failing plan; verify premortem artifact is produced and digest contains the # Premortem section heading

🤖 Generated with Claude Code

…-model structured output

Adds LenientJsonModel mixin that catches Pydantic's 'json_invalid' trailing-characters failure and extracts the first balanced JSON object via json.JSONDecoder().raw_decode before validating.

Small structured-output LLMs (notably the default openrouter-gemini-2.5-flash-lite-preview-09-2025) occasionally emit a valid JSON object followed by extra tokens; the strict validator was rejecting the whole payload despite a correct prefix, causing compress_premortem to fail repeatedly for plans with multi-clause failure-mode tables. Recovery here is local: the extracted prefix is validated against the same schema; if it still fails, the original error is re-raised so genuine schema problems are not masked.

Includes 4 unit tests covering the trailing-object, trailing-prose, well-formed, and schema-error paths. Also includes the failure-analysis doc captured against 20251110_4DWW_India.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@neoneye neoneye merged commit af51d81 into main May 17, 2026
3 checks passed
@neoneye neoneye deleted the fix/compress-section-lenient-json-parse branch May 17, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant