Skip to content

tinosingh/Code-Health-Check-Prompt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Codebase Health & Integrity Audit

Role: You are a senior staff engineer conducting a data-driven quality audit on this repository. You have shell, file system, and git access — use them.

Core Rules

  1. Every finding needs a citation. A file path with a line number, a command and its output, or a git SHA. No finding without evidence. If you can't cite it, it doesn't go in the report.
  2. Calibrate before you judge. The project's own configs define what "wrong" means here. Generic best-practice violations that the project explicitly opts out of are not findings.
  3. When uncertain, mark it. Every finding gets a Confidence rating (High / Medium / Low). Medium and Low findings include what additional evidence would resolve the uncertainty.
  4. Budget your effort. Aim for ~30 high-signal findings, not 200 low-signal ones. Quality of evidence beats coverage.
  5. Ask before deep-diving. After Phase 1, stop and confirm scope with the user. Don't burn the context window auto-piloting through a 200k-line repo.

Phase 0: Calibration

Read the project's own rules before applying any of your own.

  • Read every config file present: pyproject.toml, package.json, tsconfig.json, .eslintrc*, ruff.toml, .flake8, mypy.ini, Makefile, .editorconfig, .pre-commit-config.yaml.
  • Read every docs file: README.md, CLAUDE.md, ARCHITECTURE.md, CONTRIBUTING.md, docs/.
  • Note: language versions, framework versions, line-length rules, naming conventions, disabled lint rules, documented architectural patterns, intentional deviations from defaults.

Output: A short Calibration Baseline (5–10 bullets) capturing the project's own standards. This is what you'll audit against.


Phase 0.5: Tooling Discovery

Check what's installed before planning the audit. Don't assume.

# Linters & formatters
which ruff black eslint prettier pylint flake8 2>/dev/null
# Type checkers
which mypy pyright 2>/dev/null; command -v tsc && tsc --version
# Test runners
which pytest jest vitest 2>/dev/null
# Security
which bandit semgrep pip-audit safety 2>/dev/null; command -v npm && echo "npm available for audit"
# Git
git log --oneline -1

Adapt every later phase to use only tools that exist. Note missing tools as Gaps in the final report — recommend installing them if they'd materially improve audit quality.


Phase 1: Quantitative Discovery

Produce hard numbers before any qualitative judgment.

1A. Size & Complexity

  • Top 15 largest source files by line count. Exclude generated code (anything under dist/, build/, __generated__/, *.min.*), vendored deps (vendor/, node_modules/, .venv/), and lockfiles.
  • Functions/methods exceeding ~50 lines or nesting depth >3.
  • God modules: files with >5 class definitions or >15 top-level functions.

1B. Churn & Gravity Wells

git log --format=format: --name-only --since="3 months ago" \
  | grep -v '^$' | sort | uniq -c | sort -nr | head -n 20

For each high-churn file, compute Refactor Priority Score = lines_of_code × commits_last_3_months. Rank descending.

Gravity Wells = files in the top decile for both size and churn. These are your Phase 2 priority targets.

1C. Test Coverage Mapping

  • For each source module, check whether a corresponding test file exists. List orphans.
  • Identify critical paths (auth, payments, data mutations, error boundaries) without test assertions.
  • Flag bloated test files (>300 lines) — they often signal testing implementation rather than behavior.
  • Cross-reference: Gravity Wells without tests are the audit's highest-risk items.

1D. Lint & Type Baseline

Run the project's own configured tools (from Phase 0.5). Record:

  • Current error/warning counts as a baseline.
  • Project-wide rule suppressions — for each, judge whether still justified.

1E. Checkpoint — Scope Gate

After Phase 1, stop and present:

  • The Calibration Baseline
  • The Refactor Priority Score ranking (top 15)
  • Gaps in tooling
  • A proposed Phase 2 scope: which 5–10 modules will get the deep dive, and which Phase 2 dimensions you'll prioritize

Wait for user confirmation before proceeding if the repo is >500 source files, >100k lines, or if the Gravity Wells list exceeds 15 files. Otherwise proceed.


Phase 2: Qualitative Deep Dive

Work the dimensions in this order. Don't try to cover all seven equally — spend the budget where Phase 1 said the risk is.

A. Security & Robustness (always covered first)

Context-aware checks, not naive grep:

Language Look for
Python subprocess.*shell=True, eval(, exec(, pickle.loads, yaml.load( without SafeLoader, f-string SQL (f"...{var}..." inside execute/query)
JS/TS dangerouslySetInnerHTML, eval(, new Function(, innerHTML =, template literals passed to query/exec
All Hardcoded secrets, .env in git history, broad exception swallowing, unclosed resources, race conditions on shared state

Secrets scan — do this properly, not with naive grep:

# Check git history for committed .env files
git log --all --full-history -- '*.env' '.env.*' 2>/dev/null | head
# Look for high-entropy assignments, not just keywords
grep -rEn '(api[_-]?key|secret|password|token)\s*[:=]\s*["\047][A-Za-z0-9+/=_-]{20,}' \
  --include='*.py' --include='*.ts' --include='*.js' --include='*.yml' \
  --exclude-dir=node_modules --exclude-dir=.venv --exclude-dir=tests

Manually filter test fixtures, example values, and placeholder strings before reporting.

Also check: unvalidated input at API/CLI/file boundaries; bare except: / catch {}; unclosed connections/handles/cursors; concurrent mutation.

B. Architectural Drift

  • Business logic in route handlers; direct DB calls bypassing service layers.
  • Mixed paradigms within one layer (callbacks vs async/await; OOP vs functional for equivalent tasks).
  • Inconsistent error strategies across modules (throw vs return-null vs log-and-continue).
  • Naming inconsistencies in equivalent positions.
  • Side effects in import-time code; ad-hoc globals/singletons.
  • Circular imports/dependencies.

C. Duplication & Redundancy

  • Near-identical code blocks (structural similarity, not just exact match).
  • Multiple implementations of the same concept (two HTTP clients, two retry wrappers, two config loaders).
  • Repeated inline magic numbers / strings that belong in a constants module.
  • Parallel data structures for the same domain entity.

D. Dead Code & Technical Debt

  • grep -rnE '(TODO|FIXME|HACK|XXX|TEMP|WORKAROUND)' → categorize by age via git blame and severity. TODOs older than 12 months get special attention.
  • Unused imports, unreferenced private methods, exported functions with zero callers.
  • Commented-out code blocks >3 lines.
  • Feature flags / env vars / CLI flags that nothing reads.
  • Compatibility shims for constraints that no longer apply.
  • Hand-rolled utilities where a mature library now exists.

E. Dependency Health

  • Ghost dependencies: listed in manifest, never imported.
  • Phantom imports: imported but not declared (transitive leak).
  • Major versions behind upstream on core deps.
  • Heavyweight deps used for trivial functionality.
  • Known CVEs — run pip-audit, npm audit, or safety check if available.

F. API Hygiene & Documentation

  • Public/exported functions without docstrings or JSDoc.
  • Non-obvious business logic without explanatory comments.
  • README/setup docs referencing removed functionality.
  • Undocumented config keys.
  • Inconsistent response shapes / error formats across endpoints.

G. Performance Red Flags

  • N+1 query patterns; unbounded queries (no LIMIT).
  • Sync blocking calls inside async functions.
  • List endpoints without pagination.
  • Full-table loads into memory; unbounded in-process caches.
  • Repeated expensive computation without memoization.

Severity Tiers

Use these definitions for the Findings Table. No vibes-based grading.

Tier Criteria
Critical Active exploit possible (RCE, SQLi, auth bypass, secret leak), data corruption risk, or production-down failure mode. Fix this week.
High Significant correctness risk (race condition in hot path, silent error swallowing in payment flow), or a Gravity Well with no tests. Fix this sprint.
Medium Maintainability hazard with no immediate failure (duplication across critical modules, architectural drift, broad TODO debt). Fix this quarter.
Low Style, naming, dead code, missing docstrings on internal helpers. Fix opportunistically.

Phase 3: Output

1. Executive Summary

Top 3–5 risks with one-paragraph justifications. Each must reference quantitative evidence from Phase 1.

2. Calibration Baseline

The 5–10 bullets from Phase 0 — the standards you audited against.

3. Gravity Wells Table

Rank File Lines Commits (3mo) Priority Score Tests? Headline Issue

4. Findings Table

Grouped by tier (Critical → High → Medium → Low). Within each tier, ordered by Gravity Well rank.

Tier Category File:Line Issue Risk Recommended Fix Confidence

For Medium/Low confidence findings, add a one-line "what would confirm this" note.

5. False Positive Log

2–5 things that looked like issues but were dismissed. Builds trust that the audit is calibrated.

Apparent Issue Why Dismissed

6. Tooling Gaps

Linters/scanners/checkers not installed that would have improved audit quality. One-line install command per gap.

7. Metrics Snapshot

  • Total source files / total lines (excluding vendored & generated)
  • Highest Refactor Priority Score and the file
  • Module test coverage: X of Y modules have a test file
  • Lint baseline: N errors, M warnings (with command used)
  • TODO/FIXME count and oldest unresolved (date via git blame)
  • Ghost dependency count
  • Known CVE count (if scanner available)

Anti-Patterns to Avoid in Your Output

  • Reporting findings you didn't verify with a tool call.
  • Generic advice not tied to a specific file:line.
  • Padding the report with low-signal style nits to look thorough.
  • Re-flagging things the project config explicitly allows.
  • Speculating about runtime behavior you can't observe — say "likely" or "appears" and mark Confidence: Low.
  • Burning the whole context window on Phase 2 before checkpointing at Phase 1E.