Skip to content

fix(audit): distinguish README comparison failure from content mismatch#282

Merged
sadlilas merged 2 commits intomicrosoft:mainfrom
sadlilas:fix/repo-audit-readme-comparison-distinguish-failure
May 5, 2026
Merged

fix(audit): distinguish README comparison failure from content mismatch#282
sadlilas merged 2 commits intomicrosoft:mainfrom
sadlilas:fix/repo-audit-readme-comparison-distinguish-failure

Conversation

@sadlilas
Copy link
Copy Markdown
Collaborator

@sadlilas sadlilas commented May 5, 2026

Symptom

During the Microsoft ecosystem audit run on 2026-05-04, the check-readme-sections step in recipes/repo-audit.yaml reported a false positive on microsoft/amplifier-collections:

  • Finding: README.md sections exist but content differs from template (Trademarks)
  • Actual: The Trademarks section was verbatim identical to the reference template

Re-running the recipe's extraction logic against the same files and reference template produced TRADE_MATCH=true, confirming the content matched perfectly and the "false" result was a comparison failure being mis-reported as a content mismatch.

Root Cause

The match flags in check-readme-sections use a binary false-default pattern:

TRADE_MATCH="false"
if [ "$HAS_TRADE" -gt 0 ] && [ -n "$TRADE_START" ]; then
   # ... extraction and diff ...
   if diff -q ... ; then TRADE_MATCH="true"; fi
fi

This collapses three distinct conditions into one outcome:

  1. Section does not exist (HAS_TRADE=0) → TRADE_MATCH=false ✓ (correct finding)
  2. Section exists and matches template (diff succeeds, no output) → TRADE_MATCH=true ✓ (correct)
  3. Section exists but differs from template (diff finds differences) → TRADE_MATCH=false ✓ (correct)
  4. Comparison failed (reference fetch empty, extraction malformed, sed/grep error, partial file read) → TRADE_MATCH=false ✗ (misleading — this is "I couldn't compare", not "they differ")

Conditions 3 and 4 both produce TRADE_MATCH=false and surface to the user as:

"README.md sections exist but content differs from template"

But condition 4 is a transient failure requiring retry, while condition 3 is a real content mismatch requiring a README edit. Users cannot distinguish the two from the current finding text.

The Fix

Three-state match flags replace binary defaults:

  • true = comparison completed, content matched
  • false = comparison completed, content differed
  • unknown = comparison could not be completed (missing/empty files, extraction failure, sanity checks failed)

Sanity-check the reference README before comparing:

  • Verify reference is non-empty
  • Verify it contains both ## Contributing and ## Trademarks headings
  • Set REF_README_OK=false if not; skip comparisons that rely on it

Skip comparisons when extraction yields empty content:

  • If either extracted section is empty, mark match as unknown rather than silently failing the diff
  • This prevents malformed extraction (missing heading, sed error) from silently appearing as "content differs"

Distinct findings per condition:

  • missing-sectionERROR severity, "section not found" (user action required: add section)
  • unknown-matchWARNING severity, "comparison could not complete; reference unavailable or extraction failed; retry recommended" (transient, requires retry)
  • false-matchWARNING severity, "content differs from template" (real mismatch, user action required: update section)

The JSON contract still emits boolean *_verbatim fields (for machine consumers), but they now derive correctly from the three-state flag: only true maps to true; both false and unknown map to false. The distinct signal for "comparison failed" is now carried by severity and finding text, allowing users and downstream tooling to respond appropriately.

Scope Deliberately Limited

Not included (deferred pending evidence):

  • Per-repo reference directories: I hypothesized a race condition in the shared reference/ directory (one dir written by all parallel repo-audits in step 3). Investigation showed this is plausible but not proven — one false positive over 112 repos is consistent with a transient blip, not a structural race. Restructuring the working directory is speculative. Deferring until the same symptom reproduces or multiple repos show false positives.

  • Canonicalization logic changes: The current whitespace-normalized awk diff (awk '...' | sort | diff) is sufficient when extraction succeeds. No change needed.

Verification

  • YAML syntax: Validated via python3 -c "import yaml; yaml.safe_load(...)"
  • Bash syntax: Validated via bash -n on extracted command blocks
  • Uniqueness: All 79 replacements verified unique before applying
  • Diff stats: 79 insertions, 19 deletions across recipes/repo-audit.yaml

Generated with Amplifier

The check-readme-sections step initialized CONTRIB_MATCH and TRADE_MATCH to
"false", then only flipped them to "true" on a successful diff comparison.
Any failure inside the extraction pipeline (empty reference, missing heading,
sed/grep error, partial fetch) silently left the flags at "false" and the
recipe reported "README.md sections exist but content differs from template"
even when the actual content was verbatim correct.

This produced a false positive on amplifier-collections in the 2026-05-04
Microsoft ecosystem audit. Re-running the recipe's exact extraction logic
against the same files produced TRADE_MATCH=true, confirming the content
was identical and the prior "false" was a comparison failure being
mis-reported as a content mismatch.

Changes:

- Three-state match flags ("true" / "false" / "unknown") replace the binary
  default-false flags. "unknown" means "comparison could not be performed."
- Sanity-check the reference README before relying on it: must be non-empty
  and must contain both ## Contributing and ## Trademarks headings,
  otherwise flag REF_README_OK=false.
- Skip a comparison if either extracted file is empty (treat as "unknown"
  rather than silently mismatching).
- Severity mapping now distinguishes:
  * missing-section -> error (real finding, fix the README)
  * unknown match -> warning, "comparison could not complete; retry
    recommended" (transient; further differentiates "reference unavailable"
    from "extraction failed" so the report points to the right remediation)
  * false match -> warning, "content differs from template" (real content
    mismatch, the original finding)
- The JSON contract still emits boolean *_verbatim fields, but they now
  derive from the three-state flag (only "true" maps to true). The distinct
  signal for "comparison failed" is carried by severity and finding text.

This is the smallest fix that addresses the user-visible bug. It does not
restructure the working directory layout (deferred until evidence shows
parallel writes are causing extraction failures) and does not change the
canonicalization logic (the existing whitespace-normalized awk diff is
sufficient when extraction succeeds).

Future audit runs that hit the same transient condition will now report
"comparison could not complete; retry recommended" instead of falsely
asserting that the README content differs from the template.

Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…repos

The canonical Microsoft amplifier branch-protection ruleset is not applied
to private repos in this org by policy. Auditing private repos against
public-repo ruleset expectations produced 100% false-positive findings on
every private repo in the 2026-05-04 audit (15/15), drowning out actual
issues in the report.

This commit makes the two ruleset-related steps short-circuit early when
the target repo is private:

- check-branch-protection: emits a "pass" verdict noting the skip
- check-bypass-actors: emits a "pass" verdict noting the skip

Implementation:

- fetch-repo-info now requests the isPrivate field from gh repo view --json
- Both check steps parse isPrivate from repo_info_raw and exit 0 with a
  "Skipped: private repo (canonical ruleset is not applied to private
  repos per policy)" finding when isPrivate=true
- Severity "pass" ensures these no longer count as errors or warnings in
  the aggregator. The finding text explains the skip so reviewers reading
  per-repo output can see why no real check ran

Scope deliberately limited to ruleset-related checks (branch protection
and bypass actors). Other private-repo checks remain in scope:
- Document compliance (LICENSE, SECURITY.md, SUPPORT.md, CODE_OF_CONDUCT.md,
  README sections) -- these are still expected to match templates
- Feature-toggle checks (Wiki, Projects, Issues) -- not ruleset-related,
  governed by separate policy

Net effect on a re-run: the 15/15 private-repo branch-protection findings
disappear from the report, and remaining private-repo findings (Wiki,
Projects, document drift) become visible without ruleset noise.

Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
@sadlilas sadlilas merged commit 4f8741b into microsoft:main May 5, 2026
1 check passed
@sadlilas sadlilas deleted the fix/repo-audit-readme-comparison-distinguish-failure branch May 5, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants