fix(audit): distinguish README comparison failure from content mismatch by sadlilas · Pull Request #282 · microsoft/amplifier

sadlilas · 2026-05-05T03:14:51Z

Symptom

During the Microsoft ecosystem audit run on 2026-05-04, the check-readme-sections step in recipes/repo-audit.yaml reported a false positive on microsoft/amplifier-collections:

Finding: README.md sections exist but content differs from template (Trademarks)
Actual: The Trademarks section was verbatim identical to the reference template

Re-running the recipe's extraction logic against the same files and reference template produced TRADE_MATCH=true, confirming the content matched perfectly and the "false" result was a comparison failure being mis-reported as a content mismatch.

Root Cause

The match flags in check-readme-sections use a binary false-default pattern:

TRADE_MATCH="false"
if [ "$HAS_TRADE" -gt 0 ] && [ -n "$TRADE_START" ]; then
   # ... extraction and diff ...
   if diff -q ... ; then TRADE_MATCH="true"; fi
fi

This collapses three distinct conditions into one outcome:

Section does not exist (HAS_TRADE=0) → TRADE_MATCH=false ✓ (correct finding)
Section exists and matches template (diff succeeds, no output) → TRADE_MATCH=true ✓ (correct)
Section exists but differs from template (diff finds differences) → TRADE_MATCH=false ✓ (correct)
Comparison failed (reference fetch empty, extraction malformed, sed/grep error, partial file read) → TRADE_MATCH=false ✗ (misleading — this is "I couldn't compare", not "they differ")

Conditions 3 and 4 both produce TRADE_MATCH=false and surface to the user as:

"README.md sections exist but content differs from template"

But condition 4 is a transient failure requiring retry, while condition 3 is a real content mismatch requiring a README edit. Users cannot distinguish the two from the current finding text.

The Fix

Three-state match flags replace binary defaults:

true = comparison completed, content matched
false = comparison completed, content differed
unknown = comparison could not be completed (missing/empty files, extraction failure, sanity checks failed)

Sanity-check the reference README before comparing:

Verify reference is non-empty
Verify it contains both ## Contributing and ## Trademarks headings
Set REF_README_OK=false if not; skip comparisons that rely on it

Skip comparisons when extraction yields empty content:

If either extracted section is empty, mark match as unknown rather than silently failing the diff
This prevents malformed extraction (missing heading, sed error) from silently appearing as "content differs"

Distinct findings per condition:

missing-section → ERROR severity, "section not found" (user action required: add section)
unknown-match → WARNING severity, "comparison could not complete; reference unavailable or extraction failed; retry recommended" (transient, requires retry)
false-match → WARNING severity, "content differs from template" (real mismatch, user action required: update section)

The JSON contract still emits boolean *_verbatim fields (for machine consumers), but they now derive correctly from the three-state flag: only true maps to true; both false and unknown map to false. The distinct signal for "comparison failed" is now carried by severity and finding text, allowing users and downstream tooling to respond appropriately.

Scope Deliberately Limited

Not included (deferred pending evidence):

Per-repo reference directories: I hypothesized a race condition in the shared reference/ directory (one dir written by all parallel repo-audits in step 3). Investigation showed this is plausible but not proven — one false positive over 112 repos is consistent with a transient blip, not a structural race. Restructuring the working directory is speculative. Deferring until the same symptom reproduces or multiple repos show false positives.
Canonicalization logic changes: The current whitespace-normalized awk diff (awk '...' | sort | diff) is sufficient when extraction succeeds. No change needed.

Verification

YAML syntax: Validated via python3 -c "import yaml; yaml.safe_load(...)"
Bash syntax: Validated via bash -n on extracted command blocks
Uniqueness: All 79 replacements verified unique before applying
Diff stats: 79 insertions, 19 deletions across recipes/repo-audit.yaml

Generated with Amplifier

The check-readme-sections step initialized CONTRIB_MATCH and TRADE_MATCH to "false", then only flipped them to "true" on a successful diff comparison. Any failure inside the extraction pipeline (empty reference, missing heading, sed/grep error, partial fetch) silently left the flags at "false" and the recipe reported "README.md sections exist but content differs from template" even when the actual content was verbatim correct. This produced a false positive on amplifier-collections in the 2026-05-04 Microsoft ecosystem audit. Re-running the recipe's exact extraction logic against the same files produced TRADE_MATCH=true, confirming the content was identical and the prior "false" was a comparison failure being mis-reported as a content mismatch. Changes: - Three-state match flags ("true" / "false" / "unknown") replace the binary default-false flags. "unknown" means "comparison could not be performed." - Sanity-check the reference README before relying on it: must be non-empty and must contain both ## Contributing and ## Trademarks headings, otherwise flag REF_README_OK=false. - Skip a comparison if either extracted file is empty (treat as "unknown" rather than silently mismatching). - Severity mapping now distinguishes: * missing-section -> error (real finding, fix the README) * unknown match -> warning, "comparison could not complete; retry recommended" (transient; further differentiates "reference unavailable" from "extraction failed" so the report points to the right remediation) * false match -> warning, "content differs from template" (real content mismatch, the original finding) - The JSON contract still emits boolean *_verbatim fields, but they now derive from the three-state flag (only "true" maps to true). The distinct signal for "comparison failed" is carried by severity and finding text. This is the smallest fix that addresses the user-visible bug. It does not restructure the working directory layout (deferred until evidence shows parallel writes are causing extraction failures) and does not change the canonicalization logic (the existing whitespace-normalized awk diff is sufficient when extraction succeeds). Future audit runs that hit the same transient condition will now report "comparison could not complete; retry recommended" instead of falsely asserting that the README content differs from the template. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

…repos The canonical Microsoft amplifier branch-protection ruleset is not applied to private repos in this org by policy. Auditing private repos against public-repo ruleset expectations produced 100% false-positive findings on every private repo in the 2026-05-04 audit (15/15), drowning out actual issues in the report. This commit makes the two ruleset-related steps short-circuit early when the target repo is private: - check-branch-protection: emits a "pass" verdict noting the skip - check-bypass-actors: emits a "pass" verdict noting the skip Implementation: - fetch-repo-info now requests the isPrivate field from gh repo view --json - Both check steps parse isPrivate from repo_info_raw and exit 0 with a "Skipped: private repo (canonical ruleset is not applied to private repos per policy)" finding when isPrivate=true - Severity "pass" ensures these no longer count as errors or warnings in the aggregator. The finding text explains the skip so reviewers reading per-repo output can see why no real check ran Scope deliberately limited to ruleset-related checks (branch protection and bypass actors). Other private-repo checks remain in scope: - Document compliance (LICENSE, SECURITY.md, SUPPORT.md, CODE_OF_CONDUCT.md, README sections) -- these are still expected to match templates - Feature-toggle checks (Wiki, Projects, Issues) -- not ruleset-related, governed by separate policy Net effect on a re-run: the 15/15 private-repo branch-protection findings disappear from the report, and remaining private-repo findings (Wiki, Projects, document drift) become visible without ruleset noise. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

microsoft-amplifier added 2 commits May 4, 2026 20:13

sadlilas merged commit 4f8741b into microsoft:main May 5, 2026
1 check passed

sadlilas deleted the fix/repo-audit-readme-comparison-distinguish-failure branch May 5, 2026 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(audit): distinguish README comparison failure from content mismatch#282

fix(audit): distinguish README comparison failure from content mismatch#282
sadlilas merged 2 commits intomicrosoft:mainfrom
sadlilas:fix/repo-audit-readme-comparison-distinguish-failure

sadlilas commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sadlilas commented May 5, 2026

Symptom

Root Cause

The Fix

Scope Deliberately Limited

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants