diff --git a/.claude/skills/implement-feature/SKILL.md b/.claude/skills/implement-feature/SKILL.md new file mode 100644 index 0000000..28ef8ab --- /dev/null +++ b/.claude/skills/implement-feature/SKILL.md @@ -0,0 +1,189 @@ +--- +name: implement-feature +argument-hint: '[slug]' +description: This skill should be used when the user asks to "implement a feature", "run the Feature Runner", "/implement-feature", "implement all issues for ", or "drain the issue queue overnight". Automates the implementation side of the AI-development cycle for one Feature: creates an isolated worktree and branch, runs /tdd on every ready-for-agent issue in dependency order, and opens a PR when done. +--- + +# Implement Feature + +Automate the implementation side of the AI-development cycle for one Feature. Takes a slug, creates an isolated branch, runs `/tdd` on every `ready-for-agent` issue in dependency order, and marks each `resolved` on success. + +**Invocation:** `/implement-feature [slug]` + +## Quick start + +- **Named run** — `/implement-feature pr-review-doc-context-enrichment` — targets a specific Feature slug directly; creates a worktree, runs all `ready-for-agent` issues, opens a PR. +- **Auto-select** — `/implement-feature` with no argument — scans `docs/issues/` and picks the first Feature alphabetically that has at least one `ready-for-agent` issue and no unprepped issues (`needs-triage`, `needs-info`, `needs-specs`). Picks up partial features after a failure fix automatically. +- **Overnight loop** — `/loop /implement-feature` — drains the queue unattended; the runner emits `LOOP_COMPLETE` when no qualifying Feature remains, which terminates the loop. +- **Safe to interrupt** — Ctrl+C during any issue leaves that issue at `ready-for-agent`; re-running resumes from the first unresolved issue. Note: a **/tdd failure** (as opposed to a Ctrl+C interrupt) sets the failing issue to `needs-info`, preventing auto-select from retrying until the developer intervenes. + +## Steps + +### 0. Resolve the slug + +**If a slug argument was provided**, use it directly and proceed to step 1. + +**If no argument was provided**, scan `docs/issues/` for qualifying features: + +1. Use the Bash tool to list immediate subdirectories of `docs/issues/`: + +``` +ls -d docs/issues/*/ +``` + +2. For each subdirectory (potential feature slug), use the Bash tool to list its `NN-*.md` files and use the Read tool to check the `**Status:**` line of each one. A feature **qualifies** if: + + - At least one `NN-*.md` file has status `ready-for-agent`, **and** + - Every `NN-*.md` file has a status in `{ready-for-agent, resolved, closed, rejected, ready-for-human}`. + + Any file with status `needs-triage`, `needs-info`, `needs-specs`, or any unrecognised state **disqualifies the whole feature** — it is not fully prepped for autonomous execution. Features where every file is `resolved`, `closed`, or `rejected` (nothing left to run) are also skipped. + +3. Sort the qualifying slugs alphabetically and select the first one. + +4. **If no qualifying feature exists**, emit the **LOOP_COMPLETE signal** (see `references/runner-output-formats.md`) on its own line and exit cleanly (no error). Do not output anything after it. + +5. **If a qualifying feature was found**, set the slug to that feature's directory name and continue to step 1. + +### 1. Resolve the feature directory and assemble the static context bundle + +The slug argument maps directly to `docs/issues//`. Use the Bash tool to confirm: + +1. The directory exists (e.g. `ls docs/issues//`). +2. It contains at least one `NN-*.md` file (e.g. `ls docs/issues//[0-9]*.md`). +3. At least one of those files has `**Status:** ready-for-agent` (use the Read tool on each file and inspect the `**Status:**` line). + +If any of these checks fails, stop and report the specific reason to the user. Do not create a worktree. + +**Read the PRD:** `docs/issues//PRD.md`. Scan its content for references matching `apps/claude-code//` (any path that starts with that prefix). This determines the ADR scope: + +- **Plugin feature** — one or more `apps/claude-code//` references found → use that plugin's `apps/claude-code//CONTEXT.md` and `apps/claude-code//docs/adr/`. Do **not** also inject root ADRs. +- **Repo/tooling feature** — no such references found → use root `CONTEXT.md` and root `docs/adr/`. + +Also extract the `title:` field from the PRD's YAML frontmatter (the value between the opening `---` and closing `---` at the top of the file). Retain it for use in step 7's PR title derivation. + +**Read the scoped CONTEXT.md** using the Read tool. + +**Read all ADR files** in the scoped ADR directory: list `*.md` files using the Bash tool, then read each one using the Read tool. + +**Get the last 5 git commits** using the Bash tool: + +``` +git log --oneline -5 +``` + +These items (PRD content + title, CONTEXT.md, ADRs, recent commits) are static — gather them once before the issue loop begins. + +### 2. Create the worktree and branch + +First, check whether a worktree from a prior run already exists using the Bash tool: + +``` +ls .claude/worktrees/ +``` + +- **Exists** — reuse it. The branch `feature/afk/` already contains the committed work from the previous run. Skip `git worktree add`. +- **Does not exist** — create it using the Bash tool: + +``` +git worktree add .claude/worktrees/ -b feature/afk/ develop +``` + +The worktree lands at `.claude/worktrees/`. All subsequent implementation work happens inside that worktree. + +### 3. Collect issues, build the dependency graph, and derive execution order + +Use the Bash tool to list **all** `NN-*.md` files in `docs/issues//` (including `resolved` and `closed` — they are needed for graph completeness): + +``` +ls docs/issues//[0-9]*.md +``` + +Use the Read tool to read each file. For every file record: + +- Its **numeric prefix** (the `NN` integer from the filename). +- Its **status** (`**Status:**` line). +- Its **`## Blocked by`** list — the filenames or paths referenced there. `## Blocked by: None`, `## Blocked by: None — can start immediately`, or a missing `## Blocked by` section all mean no predecessors. + +**Missing-blocker check — halt before executing anything if violated:** + +For each `## Blocked by` reference, verify the referenced filename actually exists in `docs/issues//`. If a referenced blocker file is missing (typo, renamed, deleted), halt immediately with the **missing blocker error** (see `references/runner-output-formats.md`), naming the issue and the unresolvable reference. Do not silently treat it as satisfied. + +**Conflict check — halt before executing anything if violated:** + +For each issue A that lists issue B in `## Blocked by`: if B's numeric prefix is greater than A's numeric prefix, the dependency contradicts numerical convention. Halt immediately with the **dependency conflict error** (see `references/runner-output-formats.md`), naming both issues. + +**Build the execution queue:** + +From the dependency graph, compute a topological order over all issues (using `## Blocked by` edges). Filter the topological sequence to only `ready-for-agent` issues — `resolved`, `closed`, and `rejected` issues are satisfied and act as satisfied dependency nodes, not as items to execute. + +**Unsatisfied dependency check — halt before executing anything if violated:** + +For each `ready-for-agent` issue in the execution queue, inspect its `## Blocked by` list. If any listed blocker has status `ready-for-human`, halt immediately with the **unsatisfied dependency error** (see `references/runner-output-formats.md`), naming both issues. + +This ordered list is the execution queue. Record M = number of items in the queue (frozen at this moment — do not recount mid-run). + +### 4. Implement each issue via `/tdd` + +For each issue file in queue order (N = 1, 2, … M), before invoking `/tdd`, emit the **progress line** (see `references/runner-output-formats.md`) substituting N, M, and the issue title (first `# Heading` line of the issue file). + +Invoke the sub-agent using the Agent tool with `subagent_type: general-purpose` — the only stock type with access to both the Skill tool (to load `/tdd`) and the Edit/Write tools (to write code). The issue's `## Acceptance criteria` replaces the interactive planning phase — pass it as the pre-approved plan so the agent skips confirmation and proceeds directly to implementation. + +Before constructing the prompt, use the Read tool to read all sibling issue files (`docs/issues//[0-9]*.md` except the current issue) at their current state — this gives the sub-agent visibility into what is already resolved and what is still pending. + +Construct the prompt using the template in `references/tdd-prompt-template.md`, substituting all `` values at runtime. Pass the constructed prompt to the Agent tool. Wait for the agent to return before continuing. + +**On failure:** If the Agent call signals failure (throws, returns an error, or explicitly reports it could not complete the issue): + +1. Using the Edit tool, change the `**Status:** ready-for-agent` line to `**Status:** needs-info`. This prevents auto-select from picking up this Feature on subsequent loop iterations. + +2. Append the **failure note** (see `references/runner-output-formats.md`) to the issue file using the Edit tool, substituting ``. + +3. Stop the runner immediately. Do not execute any subsequent issues — they may depend on a foundation this issue was meant to lay. + +4. Report to the user: which issue failed, that the worktree is at `.claude/worktrees/` on branch `feature/afk/`, and that no subsequent issues were run. + +### 5. Mark each issue resolved + +After the Agent call for an issue returns **successfully**, update the issue file using the Edit tool: change the `**Status:** ready-for-agent` line to `**Status:** resolved`. + +### 6. Continue until queue is empty + +Repeat steps 4–5 for every issue in the queue. When the last issue is resolved, proceed to step 7. + +### 7. Open a pull request and clean up + +**Push the branch:** + +``` +git -C .claude/worktrees/ push -u origin feature/afk/ +``` + +**Derive the PR title** from the PRD's `title` frontmatter field (already read in step 1) and the slug: + +``` +feat(): +``` + +**List the resolved issues** — all `NN-*.md` files in `docs/issues//` whose status is now `resolved` (every issue the runner just processed, in numerical order). + +**Open the PR** using the Bash tool. The exact `gh pr create` invocation with the heredoc-wrapped body lives in `references/runner-output-formats.md` under "PR body template" — use that form verbatim, substituting ``, ``, and the resolved-issue list before running it. Run from inside the worktree (`git -C .claude/worktrees/`) or pass `--repo` if needed. **Do not run the snippet without substitution** — angle-bracket placeholders are not valid shell. + +**Failure handling for steps 7a and 7b:** If `git push` or `gh pr create` fails (non-zero exit, network error, permission denied, branch protection, etc.): + +1. Stop immediately. Do **not** remove the worktree. +2. Report to the user: which command failed, the error output, that the worktree is at `.claude/worktrees/` on branch `feature/afk/`, and that all issues are still `resolved` (the failure is post-implementation). +3. Suggest re-running `git push` / `gh pr create` manually once the cause is resolved. + +**Remove the worktree** only after the PR is opened successfully (PR URL returned by `gh`): + +``` +git worktree remove .claude/worktrees/ +``` + +Report the PR URL and the list of resolved issues to the user. + +## Supporting Documentation + +- **`references/runner-output-formats.md`** — verbatim strings for the progress line, dependency conflict error, unsatisfied dependency error, failure note, PR body template, and `LOOP_COMPLETE` signal. +- **`references/tdd-prompt-template.md`** — the AFK prompt template passed to the Agent tool for each `/tdd` sub-agent invocation; substitute all `` values at runtime. +- **`docs/agents/feature-runner.md`** — human-facing reference: lifecycle diagram, dependency-satisfaction matrix, ADR scope, historical cleanup convention. Maintained in parallel; consult for broader context, not for runtime instructions. diff --git a/.claude/skills/implement-feature/references/runner-output-formats.md b/.claude/skills/implement-feature/references/runner-output-formats.md new file mode 100644 index 0000000..23ae43f --- /dev/null +++ b/.claude/skills/implement-feature/references/runner-output-formats.md @@ -0,0 +1,106 @@ +# Runner output formats + +Verbatim strings emitted or embedded by the Feature Runner. `SKILL.md` references this file by name rather than repeating these strings inline. + +--- + +## Progress line + +Emitted to the user before each `/tdd` invocation. + +``` +Implementing issue N of M: +``` + +--- + +## Missing blocker error + +Emitted when a `## Blocked by` reference points to a filename that does not exist in `docs/issues//`. The runner halts before executing any issue. + +``` +Feature Runner error: missing blocker reference. + Issue NN- lists "" in ## Blocked by, but that file does not exist in docs/issues//. + Check for a typo, a rename, or a deleted issue. Resolve the reference manually before re-running. +``` + +--- + +## Dependency conflict error + +Emitted when a `## Blocked by` reference points to an issue with a higher numeric prefix than the issue being blocked. The runner halts before executing any issue. + +``` +Feature Runner error: dependency conflict detected. + Issue NN- is blocked by NN-, but NN- has a higher number than NN-. + This conflicts with the numerical ordering convention. Resolve the ordering manually before re-running. +``` + +--- + +## Unsatisfied dependency error + +Emitted when a `ready-for-agent` issue in the execution queue has a `## Blocked by` dependency whose status is `ready-for-human`. The runner halts before executing any issue. + +``` +Feature Runner error: unsatisfied dependency. + Issue NN- is blocked by NN-, but NN- has status ready-for-human. + Complete issue NN- manually (or update its status) before re-running /implement-feature . +``` + +--- + +## Failure note + +Appended to a failing issue file under `## Comments` when `/tdd` cannot complete an issue. The `**Status:**` line is also changed to `needs-info` (see SKILL.md step 4). + +```markdown +## Comments + +> _This was generated by AI during triage._ + +**Feature Runner failure** — `/tdd` could not complete this issue. Status has been set to `needs-info`. + +The worktree at `.claude/worktrees/` has been left in place for inspection. Once the issue is resolved manually, restore `**Status:** ready-for-agent` and re-run `/implement-feature ` to resume. Alternatively, close or reject the issue if it should not be retried. +``` + +--- + +## PR body template + +The body passed to `gh pr create` after all issues in a feature are resolved. + +```markdown +## Feature + +`docs/issues//PRD.md` + +## Resolved issues + +- `docs/issues//NN-.md` +- ... + +🤖 Implemented by the Feature Runner via `/implement-feature ` +``` + +Pass via a bash heredoc: + +``` +gh pr create \ + --base develop \ + --title "feat(): " \ + --body "$(cat <<'EOF' + +EOF +)" +``` + +--- + +## LOOP_COMPLETE signal + +Emitted on its own line when no qualifying feature exists during auto-selection. Must appear alone — nothing before or after it on the same line. This is the stop signal that `/loop /implement-feature` uses to terminate an overnight draining run. + +``` +LOOP_COMPLETE +``` diff --git a/.claude/skills/implement-feature/references/tdd-prompt-template.md b/.claude/skills/implement-feature/references/tdd-prompt-template.md new file mode 100644 index 0000000..ff4a8d9 --- /dev/null +++ b/.claude/skills/implement-feature/references/tdd-prompt-template.md @@ -0,0 +1,27 @@ +# /tdd AFK prompt template + +Passed to the Agent tool for each `/tdd` sub-agent invocation. All `` values are substituted at runtime before the prompt is sent. + +``` +You are running `/tdd` in AFK mode. The interactive planning phase is complete — do not ask for confirmation. Begin by invoking the `tdd` skill via the Skill tool to load its full procedural guidance (red→green→refactor, vertical-slice rule, deep-modules / interface-design / refactoring sub-references), then follow it using the acceptance criteria below as the pre-approved plan and proceed directly to the red→green→refactor loop. + +Working directory: .claude/worktrees/ + +--- ISSUE --- + + +--- PRD (parent context) --- + + +--- SIBLING ISSUES --- + + +--- CONTEXT.md --- + + +--- ADRs --- + + +--- RECENT COMMITS (last 5) --- + +``` diff --git a/.claude/skills/new-plugin/references/package-json-template.md b/.claude/skills/new-plugin/references/package-json-template.md index e4bd0e3..64b77ba 100644 --- a/.claude/skills/new-plugin/references/package-json-template.md +++ b/.claude/skills/new-plugin/references/package-json-template.md @@ -12,7 +12,7 @@ Use this as the starting shape for a new plugin's `package.json`. Copy `packageM "license": "LGPL-3.0-or-later", "type": "module", "packageManager": "", - "engines": { "node": ">=24", "pnpm": ">=10" }, + "engines": { "node": ">=22", "pnpm": ">=10" }, "scripts": { "test": "node --test", "typecheck": "tsc --noEmit --project tsconfig.json", @@ -32,7 +32,7 @@ Use this as the starting shape for a new plugin's `package.json`. Copy `packageM } ``` -`node --test` with no path argument uses Node's built-in test file discovery. It exits 0 with zero tests in Node >=22 — safe here because this repo requires `node >=24`. +`node --test` with no path argument uses Node's built-in test file discovery. It exits 0 with zero tests in Node >=22 — safe here because this repo requires `node >=22`. ## Command-only plugin (no scripts or tests) @@ -46,7 +46,7 @@ Omit `test`, `typecheck`, and the `@types/node`/`@unic/tsconfig`/`typescript` de "license": "LGPL-3.0-or-later", "type": "module", "packageManager": "", - "engines": { "node": ">=24", "pnpm": ">=10" }, + "engines": { "node": ">=22", "pnpm": ">=10" }, "scripts": { "bump": "unic-bump", "sync-version": "unic-sync-version", diff --git a/.out-of-scope/plans-issues-sync.md b/.out-of-scope/plans-issues-sync.md new file mode 100644 index 0000000..e8666e1 --- /dev/null +++ b/.out-of-scope/plans-issues-sync.md @@ -0,0 +1,11 @@ +# docs/plans/ ↔ docs/issues/ Sync Gap + +This project will not build a bridge between `docs/plans/` and `docs/issues/`. + +## Why this is out of scope + +`docs/plans/` is being discontinued. The sync gap between the two systems is a consequence of running them in parallel, not a problem worth solving on its own. Once `docs/plans/` is retired, `docs/issues/` becomes the sole source of truth and the gap disappears. + +## Prior requests + +- `docs/inbox/plans-issues-sync-gap.md` — "no bridge between docs/plans/ specs and docs/issues/ triage files" (2026-05-08) diff --git a/.out-of-scope/plans-numbering-format.md b/.out-of-scope/plans-numbering-format.md new file mode 100644 index 0000000..16d90a4 --- /dev/null +++ b/.out-of-scope/plans-numbering-format.md @@ -0,0 +1,11 @@ +# Plans Numbering Format + +This project will not migrate `docs/plans/` spec file numbering to a 4-digit prefix. + +## Why this is out of scope + +`docs/plans/` is being discontinued. Renaming all existing spec files to a 4-digit scheme (`0001-`, `0002-`, …) would be pure churn on infrastructure that is on its way out, with no functional benefit and a non-trivial risk of breaking tooling or references mid-transition. + +## Prior requests + +- `docs/inbox/migrate-plans-numbering-to-4-digits-prefix.md` — "Migrate plans numbering to 4-digits prefix" (2026-05-03) diff --git a/.out-of-scope/plugin-scaffolding-workflow.md b/.out-of-scope/plugin-scaffolding-workflow.md new file mode 100644 index 0000000..ceb861d --- /dev/null +++ b/.out-of-scope/plugin-scaffolding-workflow.md @@ -0,0 +1,21 @@ +# Plugin Scaffolding Workflow + +This project will not research how to scaffold or create new plugins — the workflow is already established. + +## Why this is out of scope + +The approach has been proven in practice: + +1. If the work is a skill, start with `/write-a-skill`. +2. For a full plugin, use `/plugin-dev:create-plugin` (Anthropic's guided end-to-end plugin creation workflow). +3. For targeted component work, use the relevant sub-command: + - `/plugin-dev:hook-development` + - `/plugin-dev:skill-development` + - `/plugin-dev:plugin-settings` + - …and other `plugin-dev:*` variants. + +Further research into scaffolding approaches would duplicate work already done and encoded in these skills. + +## Prior requests + +- `docs/inbox/research-how-to-scaffold-or-create-new-plugins.md` — "Research how to scaffold or create new plugins" (2026-05-03) diff --git a/.out-of-scope/ralph-orchestrator-alternatives.md b/.out-of-scope/ralph-orchestrator-alternatives.md new file mode 100644 index 0000000..8d15f71 --- /dev/null +++ b/.out-of-scope/ralph-orchestrator-alternatives.md @@ -0,0 +1,11 @@ +# Ralph Orchestrator Alternatives + +This project will not research alternatives to the ralph-orchestrator. + +## Why this is out of scope + +`docs/plans/` and the ralph-orchestrator workflow built on top of it are being discontinued. Researching replacement orchestrators for a system that is being phased out is not a productive use of time. New work follows the `docs/issues/` triage workflow instead. + +## Prior requests + +- `docs/inbox/research-ralph-orchestrator-alternatives.md` — "research ralph-orchestrator alternatives" (2026-05-03) diff --git a/AGENTS.md b/AGENTS.md index 4ec6471..4245f0c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -51,7 +51,7 @@ pnpm --filter ralph # run that plugin's own Spec Runner loop ## Tech stack -- **Runtime**: Node.js ≥ 24 LTS (pinned `24.15.0` via `.nvmrc` + `pnpm-workspace.yaml`) +- **Runtime**: Node.js ≥ 22. `.nvmrc` is the source of truth for local dev (currently `24.15.0`) and is consumed by `actions/setup-node` in CI. - **Package manager**: pnpm 10 (workspace mode, catalog pinning) - **Module system**: ESM (`"type": "module"`) throughout - **Linter/formatter**: Biome 2 for code/JSON; Prettier for Markdown only diff --git a/CONTEXT.md b/CONTEXT.md index ce83036..d66161e 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -28,6 +28,14 @@ _Avoid_: ticket, task, issue, story The agent automation that reads Specs and implements them one at a time. Currently backed by `ralph-orchestrator` but the concept is tool-agnostic. _Avoid_: Ralph, Ralph Orchestrator (tool-specific, not domain terms) +**Feature**: +A self-contained unit of work tracked as a directory under `docs/issues//`, containing a PRD and numbered implementation issues. The atomic input to the Feature Runner. +_Avoid_: ticket, epic, story + +**Feature Runner**: +The skill that implements a Feature's issues end-to-end in one worktree, branch, and pull request. Parallel concept to Spec Runner, but driven by the issue tracker rather than `docs/plans/`. +_Avoid_: issue runner, queue runner + **Consumer**: A repository that installs and uses a Plugin. External to this monorepo. _Avoid_: client, user repo, target repo, host repo @@ -38,6 +46,7 @@ _Avoid_: client, user repo, target repo, host repo - A **Claude Code Plugin** is a **Plugin** — the inverse is not always true - A **Workspace Package** supports **Plugin** development but is not itself a **Plugin** - A **Spec** drives exactly one **Spec Runner** iteration +- A **Feature** drives one **Feature Runner** execution — a Feature is to the Feature Runner what a Spec is to the Spec Runner - A **Consumer** installs one or more **Plugins** ## Example dialogue diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index f38ae2e..ed3f1d0 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -18,7 +18,7 @@ Every plugin must work on **macOS, Windows, and Linux**. Concretely: ### Runtime -- **Node.js ≥ 24** (Active LTS). Version pinned via `pnpm-workspace.yaml#useNodeVersion` and `.nvmrc`. +- **Node.js ≥ 22**. The recommended local version lives in `.nvmrc` (read automatically by nvm/fnm/asdf/volta and by `actions/setup-node` in CI). CI also exercises the matrix on Node 22 and 24. - **ESM only** — `"type": "module"` in every `package.json`; `.mjs` extension for scripts. - **No TypeScript compilation step** — write `.mjs` with `// @ts-check` + JSDoc; `tsc --noEmit` for type-checking only. @@ -56,11 +56,11 @@ LGPL-3.0-or-later for all packages in this monorepo. ## Prerequisites -| Tool | Version | How to get it | -| --------------- | ----------------- | -------------------------------------------------------------------------------- | -| Node.js | ≥ 24 (Active LTS) | [nodejs.org](https://nodejs.org) | -| pnpm | ≥ 10 | `npm install -g pnpm` | -| Claude Code CLI | latest | [claude.ai/code](https://claude.ai/code) — required as the Spec Runner's backend | +| Tool | Version | How to get it | +| --------------- | ----------------------------------------------- | -------------------------------------------------------------------------------- | +| Node.js | ≥ 22 (see `.nvmrc` for the recommended version) | [nodejs.org](https://nodejs.org) | +| pnpm | ≥ 10 | `npm install -g pnpm` | +| Claude Code CLI | latest | [claude.ai/code](https://claude.ai/code) — required as the Spec Runner's backend | Everything else (ralph-orchestrator, Biome, Prettier, TypeScript) is a workspace devDependency and installs with: diff --git a/README.md b/README.md index 7133bfe..62e3424 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ Then install individual plugins: ## Development -**Prerequisites:** Node.js ≥ 24, pnpm ≥ 10, Claude Code CLI (for Ralph). +**Prerequisites:** Node.js ≥ 22 (see `.nvmrc` for the recommended version), pnpm ≥ 10, Claude Code CLI (for Ralph). ```sh pnpm install # install all workspace deps diff --git a/apps/claude-code/auto-format/CLAUDE.md b/apps/claude-code/auto-format/CLAUDE.md index b91977d..20e5c2a 100644 --- a/apps/claude-code/auto-format/CLAUDE.md +++ b/apps/claude-code/auto-format/CLAUDE.md @@ -22,7 +22,7 @@ pnpm verify:changelog # Check CHANGELOG structure ## Tech Stack -- **Runtime**: Node.js >=24 (LTS). Version pinned via `pnpm-workspace.yaml#useNodeVersion`. +- **Runtime**: Node.js >=22. `.nvmrc` (currently `24.15.0`) is the recommended local version; CI exercises the matrix on Node 22 and 24. - **Package manager**: pnpm (workspace mode, catalog pinning). - **Module system**: ESM (`"type": "module"`). - **Test runner**: `node:test` built-in. No external framework. diff --git a/apps/claude-code/auto-format/package.json b/apps/claude-code/auto-format/package.json index dc3d5f8..c725412 100644 --- a/apps/claude-code/auto-format/package.json +++ b/apps/claude-code/auto-format/package.json @@ -6,7 +6,7 @@ "type": "module", "packageManager": "pnpm@10.33.0", "engines": { - "node": ">=24", + "node": ">=22", "pnpm": ">=10" }, "scripts": { diff --git a/apps/claude-code/pr-review/.agents/ado-fetcher.md b/apps/claude-code/pr-review/.agents/ado-fetcher.md new file mode 100644 index 0000000..d8321b5 --- /dev/null +++ b/apps/claude-code/pr-review/.agents/ado-fetcher.md @@ -0,0 +1,310 @@ +--- +allowed-tools: ['Bash'] +description: 'Fetch all Azure DevOps read data required for a PR review: PR metadata, latest iteration, changed files, raw diff, and linked work-item IDs. Read-only — no write operations.' +--- + +# ADO Fetcher + +You fetch all Azure DevOps data required for a PR review and return a structured context block. You make no write operations — this agent is purely read-only. + +You receive all required context in this prompt as literal strings. Do not read environment variables — agents do not inherit them. + +--- + +## Inputs + +You receive: + +- `ORG_URL` — the Azure DevOps organisation URL (e.g. `https://dev.azure.com/myorg`) +- `PROJECT` — the ADO project name +- `PR_ID` — the pull request ID (integer as string) +- `PRIOR_ITERATION_ID` — the iteration ID from the prior review (integer as string, or empty string for first-review) +- `PLUGIN_ROOT` — absolute path to this plugin's directory (for Node.js helper scripts) + +--- + +## Step 1 — Fetch PR metadata + +```bash +az repos pr show --id {PR_ID} --org {ORG_URL} --output json +``` + +Capture and remember: + +- `repository.id` → `REPO_ID` +- `repository.project.name` → `PROJECT` (update if it differs from the input) +- `sourceRefName` → `SOURCE_REF` (e.g. `refs/heads/feature/my-branch`) +- `targetRefName` → `TARGET_REF` (e.g. `refs/heads/develop`) +- `title` → `PR_TITLE` +- `description` → `PR_DESCRIPTION` +- `status` — note if already merged (`mergeStatus: succeeded`); continue without error — comments are still useful as a review record + +Strip `refs/heads/` prefix from `SOURCE_REF` and `TARGET_REF` to get plain branch names (`SOURCE_BRANCH`, `TARGET_BRANCH`). + +--- + +## Step 2 — Fetch PR iterations and resolve latest + +```bash +ITERATIONS_JSON=$(az devops invoke \ + --area git \ + --resource pullRequestIterations \ + --route-parameters "project=$PROJECT" "repositoryId=$REPO_ID" "pullRequestId=$PR_ID" \ + --org "$ORG_URL" \ + --api-version "7.1" \ + --output json 2>/tmp/ado_fetcher_iter.err) +ITER_EXIT=$? +``` + +Parse via the helper — returns a discriminated union; empty value array → ABORTED (no implicit iteration fallback): + +```bash +ITER_RESULT=$( + ITER_RESP="$ITERATIONS_JSON" \ + ITER_EXIT_CODE="$ITER_EXIT" \ + PLUGIN_R="$PLUGIN_ROOT" \ + node --input-type=module << 'EOJS' +const { fetchIterations } = await import(`file://${process.env.PLUGIN_R}/scripts/ado/fetch-iterations.mjs`) +const result = fetchIterations({ responseText: process.env.ITER_RESP ?? '', exitCode: Number(process.env.ITER_EXIT_CODE) }) +process.stdout.write(JSON.stringify(result)) +EOJS +) + +ITER_OK=$(echo "$ITER_RESULT" | node -e "process.stdout.write(String(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).ok))") +if [ "$ITER_OK" != "true" ]; then + ITER_REASON=$(echo "$ITER_RESULT" | node -e "process.stdout.write(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).reason ?? '')") + ITER_MSG=$(echo "$ITER_RESULT" | node -e "process.stdout.write(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).message ?? '')") + rm -f /tmp/ado_fetcher_iter.err + if [ "$ITER_REASON" = "auth" ]; then + echo "ERROR: $ITER_MSG. Try \`az devops login\` to re-authenticate." >&2 + elif [ "$ITER_REASON" = "empty-iterations" ]; then + echo "ERROR: iterations endpoint returned empty value array. Cannot sign Review with a valid Iteration ID." >&2 + else + echo "ERROR: $ITER_MSG" >&2 + fi + exit 1 +fi +rm -f /tmp/ado_fetcher_iter.err + +LATEST_ITERATION_ID=$(echo "$ITER_RESULT" | node -e "process.stdout.write(String(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).latestIterationId))") +LATEST_COMMIT_SHA=$(echo "$ITER_RESULT" | node -e "process.stdout.write(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).latestCommitSha)") +``` + +--- + +## Step 3 — List changed files + +```bash +az devops invoke \ + --area git \ + --resource pullRequestIterationChanges \ + --route-parameters "repositoryId=$REPO_ID" "pullRequestId=$PR_ID" "iterationId=$LATEST_ITERATION_ID" \ + --org "$ORG_URL" \ + --api-version "7.1" \ + --output json +``` + +Extract file paths and change types: + +```bash +CHANGED_FILES=$(az devops invoke \ + --area git \ + --resource pullRequestIterationChanges \ + --route-parameters "repositoryId=$REPO_ID" "pullRequestId=$PR_ID" "iterationId=$LATEST_ITERATION_ID" \ + --org "$ORG_URL" \ + --api-version "7.1" \ + --output json | node -e " +const chunks = [] +process.stdin.on('data', c => chunks.push(c)) +process.stdin.on('end', () => { + const data = JSON.parse(Buffer.concat(chunks).toString()) + for (const c of data.changeEntries ?? []) { + const path = c.item?.path ?? '' + const ct = c.changeType ?? '' + process.stdout.write(ct + ': ' + path + '\n') + } +}) +") +``` + +--- + +## Step 4 — Get the raw diff + +Check whether the local branch matches the PR source branch: + +```bash +git branch --show-current +``` + +If it does not match, check out the PR branch: + +```bash +az repos pr checkout --id "$PR_ID" --org "$ORG_URL" \ + || (git fetch origin "$SOURCE_BRANCH" && git checkout "$SOURCE_BRANCH") \ + || { echo "ERROR: could not check out PR source branch $SOURCE_BRANCH" >&2; exit 1; } +``` + +If `PRIOR_ITERATION_ID` is non-empty, determine the incremental diff range. Fetch the prior iteration's commit SHA from the iterations list: + +```bash +PRIOR_COMMIT_SHA=$(echo "$ITERATIONS_JSON" | node -e " +const chunks = [] +process.stdin.on('data', c => chunks.push(c)) +process.stdin.on('end', () => { + const id = Number(process.env.PRIOR_ITER_ID) + const value = JSON.parse(Buffer.concat(chunks).toString()).value ?? [] + const it = value.find(v => v.id === id) + process.stdout.write(it?.sourceRefCommit?.commitId ?? '') +}) +" PRIOR_ITER_ID="$PRIOR_ITERATION_ID") +``` + +### Diff strategy + +Branch on whether `PRIOR_ITERATION_ID` is set and whether commits are available: + +**First-review (`PRIOR_ITERATION_ID` empty) or fallback:** + +```bash +RAW_DIFF=$(git diff "origin/${TARGET_BRANCH}...HEAD") +DIFF_RANGE=full +``` + +**Re-review with resolvable prior commit (`PRIOR_COMMIT_SHA` non-empty, differs from `LATEST_COMMIT_SHA`):** + +```bash +if git fetch origin "$PRIOR_COMMIT_SHA" 2>/dev/null; then + RAW_DIFF=$(git diff "${PRIOR_COMMIT_SHA}..${LATEST_COMMIT_SHA}") + DIFF_RANGE=incremental +else + echo "Warning: prior commit $PRIOR_COMMIT_SHA unreachable — falling back to full diff." + RAW_DIFF=$(git diff "origin/${TARGET_BRANCH}...HEAD") + DIFF_RANGE=full + DIFF_RANGE_FALLBACK=true +fi +``` + +**Re-review with no new commits (`PRIOR_COMMIT_SHA == LATEST_COMMIT_SHA`):** + +```bash +echo "No new commits since last review." +RAW_DIFF="" +DIFF_RANGE=incremental +``` + +--- + +## Step 5 — Fetch linked work-item IDs + +```bash +WI_RESPONSE=$(az devops invoke \ + --area git \ + --resource pullRequestWorkItems \ + --route-parameters "repositoryId=$REPO_ID" "pullRequestId=$PR_ID" \ + --org "$ORG_URL" \ + --api-version "7.1" \ + --output json 2>/tmp/ado_fetcher_wi.err) +WI_EXIT=$? +``` + +Parse with the helper — returns a discriminated union so the Notices step can distinguish EMPTY-BY-DESIGN from a fetch failure: + +```bash +WI_RESULT=$( + WI_RESP="$WI_RESPONSE" \ + WI_EXIT_CODE="$WI_EXIT" \ + PLUGIN_R="$PLUGIN_ROOT" \ + node --input-type=module << 'EOJS' +const { fetchWorkItems } = await import(`file://${process.env.PLUGIN_R}/scripts/ado/fetch-work-items.mjs`) +const result = fetchWorkItems({ responseText: process.env.WI_RESP ?? '', exitCode: Number(process.env.WI_EXIT_CODE) }) +process.stdout.write(JSON.stringify(result)) +EOJS +) + +WI_OK=$(echo "$WI_RESULT" | node -e "process.stdout.write(String(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).ok))") +if [ "$WI_OK" = "true" ]; then + WORK_ITEM_IDS=$(echo "$WI_RESULT" | node -e "process.stdout.write(JSON.stringify(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).ids))") + WI_FAIL_MESSAGE="" +else + WORK_ITEM_IDS="[]" + WI_FAIL_MESSAGE=$(echo "$WI_RESULT" | node -e "process.stdout.write(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).message ?? '')") +fi +rm -f /tmp/ado_fetcher_wi.err +``` + +--- + +## Step 6 — Build the Notices array + +Initialise the per-agent Notices array. Emission sites: + +- **DEGRADED warning** (`kind: diff-range`) — when `DIFF_RANGE_FALLBACK=true` (prior commit unreachable; fell back to full diff). +- **DEGRADED warning** (`kind: work-items`) — when the work-item fetch failed (`WI_OK=false`); message comes from the helper. +- **EMPTY-BY-DESIGN info** (`kind: doc-context`) — when `WORK_ITEM_IDS=[]` and the fetch succeeded (no work items linked to the PR). + +```bash +NOTICES=$( + DIFF_RANGE_FB="${DIFF_RANGE_FALLBACK:-false}" \ + WI_IDS="$WORK_ITEM_IDS" \ + WI_OK="$WI_OK" \ + WI_MSG="$WI_FAIL_MESSAGE" \ + PLUGIN_R="$PLUGIN_ROOT" \ + node --input-type=module << 'EOJS' +const { createNotice } = await import(`file://${process.env.PLUGIN_R}/scripts/ado/notices.mjs`) +const ids = JSON.parse(process.env.WI_IDS || '[]') +const notices = [] +if (process.env.DIFF_RANGE_FB === 'true') { + notices.push(createNotice('warning', 'diff-range', 'Incremental diff unavailable — Coordinator will classify against the full PR diff with conservative downgrades.')) +} +if (process.env.WI_OK !== 'true') { + notices.push(createNotice('warning', 'work-items', process.env.WI_MSG || 'Failed to fetch linked work items. Review proceeded without business context.')) +} else if (ids.length === 0) { + notices.push(createNotice('info', 'doc-context', 'Reviewed without business context — no work items linked to this PR.')) +} +process.stdout.write(JSON.stringify(notices)) +EOJS +) +``` + +--- + +## Output + +Return the following structured context block as your final output. Fill in all values gathered above. This block is consumed verbatim by the orchestrator and downstream agents: + +``` +ADO_FETCHER_RESULT_START +ORG_URL: {ORG_URL} +PROJECT: {PROJECT} +PR_ID: {PR_ID} +REPO_ID: {REPO_ID} +PR_TITLE: {PR_TITLE} +PR_DESCRIPTION: +{PR_DESCRIPTION} +SOURCE_BRANCH: {SOURCE_BRANCH} +TARGET_BRANCH: {TARGET_BRANCH} +LATEST_ITERATION_ID: {LATEST_ITERATION_ID} +LATEST_COMMIT_SHA: {LATEST_COMMIT_SHA} +DIFF_RANGE: {DIFF_RANGE} +WORK_ITEM_IDS: {WORK_ITEM_IDS} +NOTICES: {NOTICES} + +CHANGED_FILES: +{CHANGED_FILES} + +RAW_DIFF: +{RAW_DIFF} +ADO_FETCHER_RESULT_END +``` + +Where: + +- `DIFF_RANGE` is `full` when the diff ran against `origin/${TARGET_BRANCH}...HEAD` (first-review or fallback), or `incremental` when it ran against `${PRIOR_COMMIT_SHA}..${LATEST_COMMIT_SHA}`. When `full` due to a fallback, the `NOTICES` array also contains a `warning`-severity `diff-range` entry. +- `WORK_ITEM_IDS` is the JSON array from Step 5, e.g. `[42, 7]` or `[]` +- `NOTICES` is the JSON array from Step 6, e.g. `[{"severity":"info","kind":"doc-context","message":"..."}]` or `[]` +- `CHANGED_FILES` is the newline-separated list from Step 3, e.g. `edit: /src/api.ts` +- `RAW_DIFF` is the full diff text from Step 4 (may be empty if no new commits) +- `LATEST_COMMIT_SHA` is the latest source-branch commit SHA captured in Step 2; reserved for future diff-range debugging and not consumed by any current downstream agent — the diff-range logic that needed it is now self-contained in Step 4 above. + +**Never add any ADO write operations (POST, PATCH, DELETE) to this agent.** diff --git a/apps/claude-code/pr-review/.agents/ado-writer.md b/apps/claude-code/pr-review/.agents/ado-writer.md new file mode 100644 index 0000000..b727ca3 --- /dev/null +++ b/apps/claude-code/pr-review/.agents/ado-writer.md @@ -0,0 +1,407 @@ +--- +allowed-tools: ['Bash'] +description: 'Post all Azure DevOps write-back operations for a PR review: inline comment threads per finding, Review Summary or delta reply, and completion marker. Write-only — no read operations.' +--- + +# ADO Writer + +You post all Azure DevOps comments for a PR review and return a structured result block. You make no read operations — this agent is purely write-only. + +You receive all required context in this prompt as literal strings. Do not read environment variables — agents do not inherit them. + +--- + +## Inputs + +You receive: + +- `ORG_URL` — the Azure DevOps organisation URL (e.g. `https://dev.azure.com/myorg`) +- `PROJECT` — the ADO project name +- `REPO_ID` — the repository UUID (e.g. `99bf5e9b-...`) +- `PR_ID` — the pull request ID (integer as string) +- `LATEST_ITERATION_ID` — the latest PR iteration ID (integer as string) +- `SUMMARY_THREAD_ID` — the existing summary thread ID from a prior review, or empty string for first-review +- `MODE` — `first-review` or `re-review` +- `PLUGIN_ROOT` — absolute path to this plugin's directory (for Node.js helper scripts) +- `FINDINGS` — a JSON array of compact findings: `{ severity, filePath, startLine, endLine, title, body }[]` +- `NOTICES_JSON` — a JSON array of merged Notices: `{ severity: "info" | "warning", kind, message }[]`. May be `[]`. + +--- + +## Constants + +```bash +SIGNATURE_PREFIX="🤖 *Reviewed by Claude Code*" +SIGNATURE="🤖 *Reviewed by Claude Code* — Iteration ${LATEST_ITERATION_ID}" +FINDINGS_POSTED=0 +NOTICES='[]' +``` + +Every comment posted — inline or summary — **must** end with this trailer: + +``` +--- +🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID} +``` + +--- + +## Helper: parse-write-response + +Use this snippet to route any `az devops invoke` outcome through the canonical HTTP-tier mapping. Capture it once per call site into `PWR_JSON`, then branch on `PWR_OK`/`PWR_TIER`/`PWR_ID`/`PWR_MSG`: + +```bash +PWR_ERR=$(cat "${TMPDIR:-/tmp}/ado_writer_.err" 2>/dev/null) +PWR_JSON=$( + RESP="$" EXIT="$" ERR="$PWR_ERR" PLUGIN_R="$PLUGIN_ROOT" \ + node --input-type=module << 'EOJS' +const { parseWriteResponse } = await import(`file://${process.env.PLUGIN_R}/scripts/ado/parse-write-response.mjs`) +const r = parseWriteResponse({ httpExit: Number(process.env.EXIT), responseText: process.env.RESP, errStream: process.env.ERR }) +process.stdout.write(JSON.stringify(r)) +EOJS +) +PWR_OK=$(printf '%s' "$PWR_JSON" | node -e "const r=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')); process.stdout.write(String(r.ok))") +PWR_TIER=$(printf '%s' "$PWR_JSON" | node -e "const r=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')); process.stdout.write(r.tier||'')") +PWR_ID=$(printf '%s' "$PWR_JSON" | node -e "const r=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')); process.stdout.write(r.id!=null?String(r.id):'')") +PWR_MSG=$(printf '%s' "$PWR_JSON" | node -e "const r=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')); process.stdout.write(r.message||'')") +``` + +**Tier handling at every call site:** + +- `PWR_OK=true` → the write succeeded; use `PWR_ID` if you need the created resource's id. +- `PWR_TIER=aborted` (401/403) → stream the `.err` file to stderr, emit `ERROR: `, and `exit 1`. The orchestrator will surface an abort Trailer. +- `PWR_TIER=degraded` (5xx / network / other-4xx) → stream the `.err` file to stderr; push a DEGRADED Notice to `NOTICES`; continue to the next write. Do NOT exit. + +To push a Notice to the `NOTICES` JSON string: + +```bash +NOTICES=$( + N="$NOTICES" SEV="warning" K="" M="" \ + node -e "const a=JSON.parse(process.env.N); a.push({severity:process.env.SEV,kind:process.env.K,message:process.env.M}); process.stdout.write(JSON.stringify(a))" +) +``` + +--- + +## Step 1 — Post inline comment threads + +For each finding in `FINDINGS`, post one new Inline Comment thread to ADO at the correct file path and line range. + +Use a unique temp file per comment (e.g. `${TMPDIR:-/tmp}/ado_writer_thread_1.json`, `_2.json`, etc.). + +```bash +cat > "${TMPDIR:-/tmp}/ado_writer_thread_N.json" << 'ENDJSON' +{ + "comments": [ + { + "commentType": 1, + "content": "{SEVERITY_EMOJI} **{title}**\n\n{body}\n\n---\n🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}" + } + ], + "status": 1, + "threadContext": { + "filePath": "{filePath}", + "rightFileEnd": { "line": {endLine}, "offset": 1 }, + "rightFileStart": { "line": {startLine}, "offset": 1 } + } +} +ENDJSON + +THREAD_RESPONSE=$(az devops invoke \ + --area git \ + --resource pullRequestThreads \ + --route-parameters "project=${PROJECT}" "repositoryId=${REPO_ID}" "pullRequestId=${PR_ID}" \ + --org "${ORG_URL}" \ + --http-method POST \ + --in-file "${TMPDIR:-/tmp}/ado_writer_thread_N.json" \ + --api-version "7.1" \ + --output json 2>"${TMPDIR:-/tmp}/ado_writer_thread_N.err") +THREAD_EXIT=$? +``` + +Map severity to emoji before writing the content: + +- `critical` → `🔴` +- `important` → `🟠` +- `minor` → `🟡` +- any other value → use as-is + +### Parse primary POST result + +Apply the [parse-write-response helper](#helper-parse-write-response) with `=thread_N`, `=THREAD_RESPONSE`, `=THREAD_EXIT`. + +- `PWR_OK=true` → `THREAD_ID=$PWR_ID`; skip the fallback section. +- `PWR_TIER=aborted` → stream `.err` to stderr, `echo "ERROR: $PWR_MSG" >&2`, `exit 1`. +- `PWR_TIER=degraded` → try the **threadContext rejection fallback** below. + +### threadContext rejection fallback + +```bash +cat > "${TMPDIR:-/tmp}/ado_writer_thread_N_fallback.json" << 'ENDJSON' +{ + "comments": [ + { + "commentType": 1, + "content": "{SEVERITY_EMOJI} **{title}** ({filePath}:{startLine})\n\n{body}\n\n---\n🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}" + } + ], + "status": 1 +} +ENDJSON + +THREAD_RESPONSE=$(az devops invoke \ + --area git \ + --resource pullRequestThreads \ + --route-parameters "project=${PROJECT}" "repositoryId=${REPO_ID}" "pullRequestId=${PR_ID}" \ + --org "${ORG_URL}" \ + --http-method POST \ + --in-file "${TMPDIR:-/tmp}/ado_writer_thread_N_fallback.json" \ + --api-version "7.1" \ + --output json 2>"${TMPDIR:-/tmp}/ado_writer_thread_N_fallback.err") +FALLBACK_EXIT=$? +``` + +Apply the helper again with `=thread_N_fallback`, `=THREAD_RESPONSE`, `=FALLBACK_EXIT`. + +- `PWR_OK=true` → `THREAD_ID=$PWR_ID`. +- `PWR_TIER=aborted` → stream `.err` to stderr, `echo "ERROR: $PWR_MSG" >&2`, `exit 1`. +- `PWR_TIER=degraded` → stream both `.err` files to stderr; push a `warning` Notice to `NOTICES`: + - `kind: "inline-post"`, `message: "Failed to post inline thread at {filePath}:{startLine} — {PWR_MSG}."` + - Set `THREAD_ID=""`. + +### Increment counter + +```bash +if [ -n "$THREAD_ID" ]; then + FINDINGS_POSTED=$((FINDINGS_POSTED + 1)) + echo "Thread posted: $THREAD_ID" +fi +``` + +--- + +## Step 2 — Post Review Summary or delta reply + +Branch on `MODE` and the `SUMMARY_THREAD_ID` value. + +--- + +### MODE=first-review — Post full Review Summary + +Compute `NOTICES_BLOCK` first: + +```bash +NOTICES_BLOCK=$( + NJ="$NOTICES_JSON" \ + PLUGIN_R="$PLUGIN_ROOT" \ + node --input-type=module << 'EOJS' +const { formatNoticesAsSummaryBlock } = await import(`file://${process.env.PLUGIN_R}/scripts/ado/notices.mjs`) +const notices = JSON.parse(process.env.NJ || '[]') +process.stdout.write(formatNoticesAsSummaryBlock(notices)) +EOJS +) +``` + +Post one general thread **without** `threadContext`: + +```bash +cat > "${TMPDIR:-/tmp}/ado_writer_summary.json" << 'ENDJSON' +{ + "comments": [ + { + "commentType": 1, + "content": "## PR Review Summary\n\n{SUMMARY_CONTENT}\n\n---\n🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}" + } + ], + "status": 1 +} +ENDJSON + +SUMMARY_RESPONSE=$(az devops invoke \ + --area git \ + --resource pullRequestThreads \ + --route-parameters "project=${PROJECT}" "repositoryId=${REPO_ID}" "pullRequestId=${PR_ID}" \ + --org "${ORG_URL}" \ + --http-method POST \ + --in-file "${TMPDIR:-/tmp}/ado_writer_summary.json" \ + --api-version "7.1" \ + --output json 2>"${TMPDIR:-/tmp}/ado_writer_summary.err") +SUMMARY_EXIT=$? +``` + +Apply the helper with `=summary`, `=SUMMARY_RESPONSE`, `=SUMMARY_EXIT`. + +- `PWR_OK=true` → `SUMMARY_THREAD_ID=$PWR_ID`; echo `"Summary thread posted: ${SUMMARY_THREAD_ID}"`. +- `PWR_TIER=aborted` → stream `.err` to stderr, `echo "ERROR: $PWR_MSG" >&2`, `exit 1`. +- `PWR_TIER=degraded` → stream `.err` to stderr; push `warning` Notice (`kind: "summary-post"`, `message: "Failed to post Review Summary (${PWR_MSG}). Review findings were posted as inline threads only."`); set `SUMMARY_THREAD_ID=""`; continue. + +The `{SUMMARY_CONTENT}` must be structured as: + +```markdown +{NOTICES_BLOCK} + +### 🔴 Critical (X found) + +- **[{filePath}:{startLine}]** {title} + +### 🟠 Important (X found) + +- **[{filePath}:{startLine}]** {title} + +### 🟡 Minor / Suggestions + +- {title} + +### ✅ What's good + +- (positive observations if any) +``` + +`{NOTICES_BLOCK}` is computed above. When `NOTICES_JSON` is `[]`, the helper returns an empty string and no `## Notices` heading is emitted. + +--- + +### MODE=re-review, zero new findings — skip summary reply + +If `FINDINGS_POSTED=0` (no new findings were posted in Step 1): + +```bash +echo "Re-review: no new findings — skipping summary reply." +``` + +Do not post anything in Step 2. `SUMMARY_THREAD_ID` remains as provided. Step 3 still posts the completion marker on every successful run, even when zero inline findings were posted. + +--- + +### MODE=re-review, at least one new finding — delta reply + +If `FINDINGS_POSTED > 0`: + +#### SUMMARY_THREAD_ID set — post delta reply to existing summary thread + +```bash +cat > "${TMPDIR:-/tmp}/ado_writer_delta.json" << 'ENDJSON' +{ + "content": "🔄 Re-review delta — Iteration {LATEST_ITERATION_ID}\n\n{FINDINGS_POSTED} new finding(s).\n\n{BULLET_LIST_OF_NEW_FINDING_TITLES}\n\n---\n🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}", + "commentType": 1 +} +ENDJSON + +DELTA_RESPONSE=$(az devops invoke \ + --area git \ + --resource pullRequestThreadComments \ + --route-parameters "project=${PROJECT}" "repositoryId=${REPO_ID}" "pullRequestId=${PR_ID}" "threadId=${SUMMARY_THREAD_ID}" \ + --org "${ORG_URL}" \ + --http-method POST \ + --in-file "${TMPDIR:-/tmp}/ado_writer_delta.json" \ + --api-version "7.1" \ + --output json 2>"${TMPDIR:-/tmp}/ado_writer_delta.err") +DELTA_EXIT=$? +``` + +Apply the helper with `=delta`, `=DELTA_RESPONSE`, `=DELTA_EXIT`. + +- `PWR_OK=true` → echo `"Delta reply posted, comment ${PWR_ID}"`. +- `PWR_TIER=aborted` → stream `.err` to stderr, `echo "ERROR: $PWR_MSG" >&2`, `exit 1`. +- `PWR_TIER=degraded` → stream `.err` to stderr; push `warning` Notice (`kind: "delta-reply"`, `message: "Failed to post re-review delta reply to thread ${SUMMARY_THREAD_ID} (${PWR_MSG}). Inline threads were posted."`); continue. + +`{BULLET_LIST_OF_NEW_FINDING_TITLES}` — one bullet per finding posted in Step 1, format: + +``` +- **[{filePath}:{startLine}]** {title} +``` + +#### SUMMARY_THREAD_ID empty — full summary fallback + +If `SUMMARY_THREAD_ID` is empty, the prior summary thread was deleted. Fall back to first-review mode: post a full Review Summary as a new general thread (use the MODE=first-review code above) and update `SUMMARY_THREAD_ID`. + +--- + +## Step 3 — Post completion marker (final action) + +After Step 2 completes, post one final reply to the summary thread. This is the last write action of every successful run: + +```bash +if [ -n "${SUMMARY_THREAD_ID}" ]; then + cat > "${TMPDIR:-/tmp}/ado_writer_completion.json" << 'ENDJSON' + { + "content": "✅ Review complete — Iteration {LATEST_ITERATION_ID} ({FINDINGS_POSTED} findings posted)\n\n---\n🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}", + "commentType": 1 + } +ENDJSON + + COMPLETION_RESPONSE=$(az devops invoke \ + --area git \ + --resource pullRequestThreadComments \ + --route-parameters "project=${PROJECT}" "repositoryId=${REPO_ID}" "pullRequestId=${PR_ID}" "threadId=${SUMMARY_THREAD_ID}" \ + --org "${ORG_URL}" \ + --http-method POST \ + --in-file "${TMPDIR:-/tmp}/ado_writer_completion.json" \ + --api-version "7.1" \ + --output json 2>"${TMPDIR:-/tmp}/ado_writer_completion.err") + COMPLETION_EXIT=$? + + # Apply the helper with =completion, =COMPLETION_RESPONSE, =COMPLETION_EXIT. + # PWR_OK=true → echo "Completion marker posted, comment ${PWR_ID}". + # PWR_TIER=aborted → stream .err to stderr, echo "ERROR: $PWR_MSG" >&2, exit 1. + # PWR_TIER=degraded → stream .err to stderr; push warning Notice (kind: "completion-marker", + # message: "Failed to post completion marker to thread ${SUMMARY_THREAD_ID} (${PWR_MSG})."); continue. +else + echo "No summary thread — skipping completion marker." +fi +``` + +The absence of this marker for `LATEST_ITERATION_ID` on the next run signals a partial prior run. + +--- + +## Step 4 — Clean up + +```bash +rm -f "${TMPDIR:-/tmp}"/ado_writer_thread_*.json "${TMPDIR:-/tmp}"/ado_writer_thread_*.err "${TMPDIR:-/tmp}/ado_writer_summary.json" "${TMPDIR:-/tmp}/ado_writer_summary.err" "${TMPDIR:-/tmp}/ado_writer_delta.json" "${TMPDIR:-/tmp}/ado_writer_delta.err" "${TMPDIR:-/tmp}/ado_writer_completion.json" "${TMPDIR:-/tmp}/ado_writer_completion.err" +``` + +Cleanup is unconditional — always remove all temp files, even when notices were emitted. + +--- + +## Output + +Emit the structured result block as your final output, validating it round-trips through the `parseAdoWriterResult` helper before printing. This block is consumed verbatim by the orchestrator: + +```bash +RESULT=$( + SID="${SUMMARY_THREAD_ID}" \ + FP="${FINDINGS_POSTED}" \ + NJ="${NOTICES}" \ + PLUGIN_R="${PLUGIN_ROOT}" \ + node --input-type=module << 'EOJS' +const { parseAdoWriterResult } = await import(`file://${process.env.PLUGIN_R}/scripts/ado-writer.mjs`) +const output = `ADO_WRITER_RESULT_START\nSUMMARY_THREAD_ID: ${process.env.SID}\nFINDINGS_POSTED: ${process.env.FP}\nNOTICES: ${process.env.NJ}\nADO_WRITER_RESULT_END` +// Round-trip through the helper so any malformed block fails fast here, not downstream. +const parsed = parseAdoWriterResult(output) +if (!parsed.ok) { + process.stderr.write(`ado-writer: result block failed to parse (${parsed.reason})\n`) + process.exit(1) +} +process.stdout.write(output) +EOJS +) +echo "$RESULT" +``` + +``` +ADO_WRITER_RESULT_START +SUMMARY_THREAD_ID: {SUMMARY_THREAD_ID} +FINDINGS_POSTED: {FINDINGS_POSTED} +NOTICES: {NOTICES} +ADO_WRITER_RESULT_END +``` + +Where: + +- `SUMMARY_THREAD_ID` is the integer ID of the summary thread (updated if a new one was posted), or empty string if none +- `FINDINGS_POSTED` is the total count of inline comment threads successfully posted +- `NOTICES` is the JSON-serialised array of DEGRADED Notices emitted during this run (may be `[]`) + +**Never add any ADO read operations (GET) to this agent.** diff --git a/apps/claude-code/pr-review/.agents/re-review-coordinator.md b/apps/claude-code/pr-review/.agents/re-review-coordinator.md new file mode 100644 index 0000000..947fe4b --- /dev/null +++ b/apps/claude-code/pr-review/.agents/re-review-coordinator.md @@ -0,0 +1,474 @@ +--- +allowed-tools: ['Bash'] +description: 'Own the full re-review state machine: prior-thread detection, partial-run check, thread classification, finding matching, and reply posting to classified threads. Returns classification counts, fresh findings, and an earlyExit flag.' +--- + +# Re-review Coordinator + +You own the complete re-review state machine. You receive the ADO Fetcher context block (which includes the raw diff), the raw full PR threads JSON, a list of new findings, and the bot signature prefix. You parse the raw diff into diff hunks internally, run all re-review logic, and post replies to classified threads. You never re-fetch ADO data — all inputs are passed to you verbatim. + +You receive all required context in this prompt as literal strings. Do not read environment variables — agents do not inherit them. + +--- + +## Inputs + +You receive: + +- `ADO_FETCHER_RESULT` — the structured context block from the ADO Fetcher agent (between `ADO_FETCHER_RESULT_START` and `ADO_FETCHER_RESULT_END`). Parse fields from it: + - `ORG_URL` + - `PROJECT` + - `REPO_ID` + - `PR_ID` + - `LATEST_ITERATION_ID` + - `RAW_DIFF` — the raw git diff text (may be empty) + - `DIFF_RANGE` — `full` or `incremental`; controls the γ-downgrade in Step 5 +- `RAW_THREADS_JSON` — the full unfiltered ADO thread list as a JSON array (fetched by the orchestrator via `az repos pr thread list`; not re-fetched here) +- `FINDINGS` — a JSON array of new findings: `{ severity, filePath, startLine, endLine, title, body }[]` +- `SIGNATURE_PREFIX` — always `🤖 *Reviewed by Claude Code*` +- `PLUGIN_ROOT` — absolute path to this plugin's directory (for Node.js helper scripts) + +`PRIOR_ITERATION_ID` is recomputed internally from `RAW_THREADS_JSON` by `detect-prior-review` (Step 2); the orchestrator's own `PRIOR_ITERATION_ID` is not passed in. + +--- + +## Constants + +```bash +SIGNATURE_PREFIX="🤖 *Reviewed by Claude Code*" +SIGNATURE="🤖 *Reviewed by Claude Code* — Iteration ${LATEST_ITERATION_ID}" +``` + +--- + +## Step 1 — Parse RAW_DIFF into diff hunks + +Parse the raw diff text into a JSON array of `{ filePath, startLine, endLine }` objects. Store in a temp file. + +```bash +DIFF_HUNKS_FILE="$(mktemp "${TMPDIR:-/tmp}/re_review_hunks_XXXXXX")" +echo '[]' > "$DIFF_HUNKS_FILE" +``` + +Parse hunk boundaries from `RAW_DIFF` via the Node helper `parse-diff-hunks.mjs` (cross-platform; no python3 dependency): + +```bash +RAW_DIFF="$RAW_DIFF" \ +HUNKS_OUT_F="$DIFF_HUNKS_FILE" \ +PLUGIN_R="$PLUGIN_ROOT" \ +node --input-type=module << 'EOJS' +import { writeFileSync } from 'node:fs' +const { parseDiffHunks } = await import(`file://${process.env.PLUGIN_R}/scripts/re-review/parse-diff-hunks.mjs`) +const hunks = parseDiffHunks(process.env.RAW_DIFF ?? '') +writeFileSync(process.env.HUNKS_OUT_F, JSON.stringify(hunks)) +EOJS +``` + +If `RAW_DIFF` is empty, `DIFF_HUNKS_FILE` remains `[]` — this is valid for a no-new-commits path. + +--- + +## Step 2 — Detect prior bot threads + +Call `detect-prior-review` on the raw threads JSON: + +```bash +PRIOR_THREADS_FILE="$(mktemp "${TMPDIR:-/tmp}/re_review_prior_threads_XXXXXX")" + +DETECT_JSON=$( + RAW_THREADS="$RAW_THREADS_JSON" \ + SIG_P="$SIGNATURE_PREFIX" \ + THREADS_OUT_F="$PRIOR_THREADS_FILE" \ + PLUGIN_R="$PLUGIN_ROOT" \ + node --input-type=module << 'EOJS' +import { writeFileSync } from 'node:fs' +const { detectPriorReview } = await import('file://' + process.env.PLUGIN_R + '/scripts/re-review/detect-prior-review.mjs') +const threads = JSON.parse(process.env.RAW_THREADS) +const r = detectPriorReview({ threads, signaturePrefix: process.env.SIG_P }) +writeFileSync(process.env.THREADS_OUT_F, JSON.stringify(r.priorThreads)) +process.stdout.write(JSON.stringify({ + isRereview: r.isRereview, + summaryThreadId: r.summaryThread != null ? r.summaryThread.threadId : '', + priorIterationId: r.priorIterationId, + count: r.priorThreads.length, +})) +EOJS +) + +IS_REREVIEW=$(printf '%s' "$DETECT_JSON" | node -e "process.stdout.write(String(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).isRereview))") +BOT_THREAD_COUNT=$(printf '%s' "$DETECT_JSON" | node -e "process.stdout.write(String(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).count))") +SUMMARY_THREAD_ID=$(printf '%s' "$DETECT_JSON" | node -e "process.stdout.write(String(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).summaryThreadId))") +PRIOR_ITERATION_ID=$(printf '%s' "$DETECT_JSON" | node -e "const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')); process.stdout.write(d.priorIterationId != null ? String(d.priorIterationId) : 'null')") +``` + +If `IS_REREVIEW=false`: no prior bot threads found — return all findings as fresh and exit without classification or replies. Skip to [Step 8 — Return result](#step-8--return-result) with all counts zero, `freshFindings` = `FINDINGS`, `earlyExit: false`. (The coordinator does not switch modes; the orchestrator does not change agent dispatch based on this branch.) + +Log: + +```bash +if [ "$IS_REREVIEW" = "true" ]; then + echo "Detected $BOT_THREAD_COUNT prior bot threads — re-review mode." +else + echo "No prior bot threads detected — returning all findings as fresh; no classification or replies." +fi +``` + +--- + +## Step 3 — Partial-run check + +If `IS_REREVIEW=true`, `SUMMARY_THREAD_ID` is non-empty, and `PRIOR_ITERATION_ID` is not `"null"`, verify the prior review completed. Check the summary thread for the completion marker `✅ Review complete — Iteration {PRIOR_ITERATION_ID}`: + +The Node check distinguishes three outcomes via distinct exit codes — this prevents conflating "marker missing" (legitimate partial prior run; downgrade is correct) with "check crashed" (silent downgrade would re-post every prior thread): + +- exit `0` → marker found → `MARKER_FOUND=true` (proceed normally) +- exit `1` → marker not found → `MARKER_FOUND=false` (legitimate partial run; treat prior threads as absent — all findings will be returned as fresh) +- exit `2` or any other non-zero → the check itself crashed → **abort the coordinator with exit code 3** (do not silently downgrade) + +The orchestrator's Step 7 only treats an `earlyExit: true` block as a non-fatal skip; a non-zero coordinator exit propagates as a fatal failure that surfaces to the user and stops the run — which is the correct behaviour when the partial-run check is itself broken. + +```bash +if [ "$IS_REREVIEW" = "true" ] && [ -n "$SUMMARY_THREAD_ID" ] && [ "$PRIOR_ITERATION_ID" != "null" ]; then + THREADS_F="$PRIOR_THREADS_FILE" SID="$SUMMARY_THREAD_ID" PID="$PRIOR_ITERATION_ID" \ + node --input-type=module << 'EOJS' +import { readFileSync } from 'node:fs' +try { + const threads = JSON.parse(readFileSync(process.env.THREADS_F, 'utf8')) + const sid = Number(process.env.SID) + const prefix = '✅ Review complete — Iteration ' + process.env.PID + const found = threads.some(t => t.threadId === sid && (t.comments ?? []).some(c => (c.content ?? '').startsWith(prefix))) + process.exit(found ? 0 : 1) +} catch (e) { + process.stderr.write('PARTIAL_RUN_CHECK_ERROR: ' + e.message + '\n') + process.exit(2) +} +EOJS + PARTIAL_RUN_EXIT=$? + + case "$PARTIAL_RUN_EXIT" in + 0) MARKER_FOUND="true" ;; + 1) MARKER_FOUND="false" ;; + *) + echo "ERROR: partial-run check crashed unexpectedly (exit ${PARTIAL_RUN_EXIT}); refusing to silently downgrade mode." >&2 + exit 3 + ;; + esac + + if [ "$MARKER_FOUND" = "false" ]; then + echo "No completion marker for Iteration $PRIOR_ITERATION_ID — partial prior run; treating prior threads as absent and returning all findings as fresh." + IS_REREVIEW=false + SUMMARY_THREAD_ID="" + PRIOR_ITERATION_ID="null" + fi +fi +``` + +If `IS_REREVIEW` is now `false` after the partial-run check: no prior bot threads remain valid — return all findings as fresh and exit without classification or replies. Skip to [Step 8 — Return result](#step-8--return-result) with all counts zero, `freshFindings` = `FINDINGS`, `earlyExit: false`. + +--- + +## Step 4 — Early-exit check (no new revisions) + +Compare `PRIOR_ITERATION_ID` with `LATEST_ITERATION_ID`. If they are equal (and both non-null/non-empty), no new commits have been pushed since the prior review. Print pending threads to the console and exit early — **no ADO writes**. + +```bash +if [ "$IS_REREVIEW" = "true" ] && [ "$PRIOR_ITERATION_ID" != "null" ] && [ "$PRIOR_ITERATION_ID" = "$LATEST_ITERATION_ID" ]; then + echo "No new revisions since prior review (both iterations: $LATEST_ITERATION_ID)." + echo "" + echo "Pending threads from prior review:" + THREADS_F="$PRIOR_THREADS_FILE" node --input-type=module << 'EOJS' +import { readFileSync } from 'node:fs' +const threads = JSON.parse(readFileSync(process.env.THREADS_F, 'utf8')) +for (const t of threads) { + if (t.isSummaryThread) continue + if (t.status === 'active' || t.status === 'pending' || t.status === 1) { + const loc = t.filePath ? `${t.filePath} L${t.start?.line ?? '?'}-${t.end?.line ?? '?'}` : '(general)' + process.stdout.write(' ' + loc + '\n') + } +} +EOJS + # Count active/pending threads for the result + PENDING_COUNT=$( + THREADS_F="$PRIOR_THREADS_FILE" node --input-type=module << 'EOJS' +import { readFileSync } from 'node:fs' +const threads = JSON.parse(readFileSync(process.env.THREADS_F, 'utf8')) +const n = threads.filter(t => !t.isSummaryThread && (t.status === 'active' || t.status === 'pending' || t.status === 1)).length +process.stdout.write(String(n)) +EOJS + ) + # Clean up and return early + rm -f "$PRIOR_THREADS_FILE" "$DIFF_HUNKS_FILE" + # Output early-exit result block + cat << RESULT_EOF +RE_REVIEW_COORDINATOR_RESULT_START +earlyExit: true +addressed: 0 +disputed: 0 +pending: ${PENDING_COUNT} +obsolete: 0 +freshFindings: [] +RE_REVIEW_COORDINATOR_RESULT_END +RESULT_EOF + exit 0 +fi +``` + +--- + +## Step 5 — Classify all prior threads + +Parse `DIFF_RANGE` from `ADO_FETCHER_RESULT` (defaults to `incremental` if absent). Classify each non-summary thread using `classify-thread` — passing `diffRange` so the γ-downgrade fires when `DIFF_RANGE=full` — and update `PRIOR_THREADS_FILE` in place with the `classification` field. Capture counts. + +```bash +DIFF_RANGE=$(printf '%s' "$ADO_FETCHER_RESULT" | grep '^DIFF_RANGE:' | awk '{print $2}') +DIFF_RANGE="${DIFF_RANGE:-incremental}" + +CLASSIFY_COUNTS=$( + THREADS_F="$PRIOR_THREADS_FILE" \ + HUNKS_F="$DIFF_HUNKS_FILE" \ + SIG_P="$SIGNATURE_PREFIX" \ + DIFF_R="$DIFF_RANGE" \ + PLUGIN_R="$PLUGIN_ROOT" \ + node --input-type=module << 'EOJS' +import { readFileSync, writeFileSync } from 'node:fs' +const { classifyThread } = await import('file://' + process.env.PLUGIN_R + '/scripts/re-review/classify-thread.mjs') +const threads = JSON.parse(readFileSync(process.env.THREADS_F, 'utf8')) +const diffHunks = JSON.parse(readFileSync(process.env.HUNKS_F, 'utf8')) +const signaturePrefix = process.env.SIG_P +const diffRange = process.env.DIFF_R === 'full' ? 'full' : 'incremental' +const counts = { addressed: 0, disputed: 0, pending: 0, obsolete: 0 } +for (const t of threads) { + if (t.isSummaryThread) continue + const cls = classifyThread({ thread: t, diffHunks, signaturePrefix, diffRange }) + t.classification = cls + counts[cls]++ +} +writeFileSync(process.env.THREADS_F, JSON.stringify(threads)) +process.stdout.write(JSON.stringify(counts)) +EOJS +) + +ADDRESSED_COUNT=$(printf '%s' "$CLASSIFY_COUNTS" | node -e "process.stdout.write(String(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).addressed))") +DISPUTED_COUNT=$(printf '%s' "$CLASSIFY_COUNTS" | node -e "process.stdout.write(String(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).disputed))") +PENDING_COUNT=$(printf '%s' "$CLASSIFY_COUNTS" | node -e "process.stdout.write(String(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).pending))") +OBSOLETE_COUNT=$(printf '%s' "$CLASSIFY_COUNTS" | node -e "process.stdout.write(String(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).obsolete))") + +echo "Thread classification: ${ADDRESSED_COUNT} addressed, ${DISPUTED_COUNT} disputed, ${PENDING_COUNT} pending, ${OBSOLETE_COUNT} obsolete" +``` + +--- + +## Step 6 — Match findings, post replies, collect fresh findings + +For each finding in `FINDINGS`, call `match-finding` to look for a matching prior thread. Track which findings are consumed (matched). Unmatched findings become `freshFindings`. + +Reset the reply counts before iterating: + +```bash +FRESH_FINDINGS_JSON='[]' +NOTICES='[]' +``` + +Process each finding one at a time. For each finding: + +### 6a — Find matching prior thread + +Substitute the `{finding.x}` placeholders below with concrete values from the current `FINDINGS` array element — these are prompt-template tokens, not shell variables. + +```bash +MATCH_EXIT=0 +MATCH=$( + THREADS_F="$PRIOR_THREADS_FILE" \ + FINDING_FILE="{finding.filePath}" \ + FINDING_START="{finding.startLine}" \ + FINDING_END="{finding.endLine}" \ + PLUGIN_R="$PLUGIN_ROOT" \ + node --input-type=module << 'EOJS' +import { readFileSync } from 'node:fs' +const { matchFinding } = await import('file://' + process.env.PLUGIN_R + '/scripts/re-review/match-finding.mjs') +const threads = JSON.parse(readFileSync(process.env.THREADS_F, 'utf8')) +const result = matchFinding({ + finding: { + filePath: process.env.FINDING_FILE, + startLine: Number(process.env.FINDING_START), + endLine: Number(process.env.FINDING_END), + }, + priorThreads: threads, +}) +process.stdout.write(result != null ? JSON.stringify(result) : '') +EOJS +) || MATCH_EXIT=$? + +if [ "$MATCH_EXIT" -ne 0 ]; then + NOTICES=$( + N="$NOTICES" SEV="warning" K="thread-match" \ + M="Could not classify finding at {finding.filePath}:{finding.startLine} — falling back to no-match." \ + node -e "const a=JSON.parse(process.env.N); a.push({severity:process.env.SEV,kind:process.env.K,message:process.env.M}); process.stdout.write(JSON.stringify(a))" + ) + CLASSIFICATION="" + THREAD_ID="" +else + CLASSIFICATION=$(printf '%s' "$MATCH" | node -e "const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')||'{}'); process.stdout.write(d.classification ?? '')") + THREAD_ID=$(printf '%s' "$MATCH" | node -e "const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')||'{}'); process.stdout.write(String(d.threadId ?? ''))") +fi +``` + +### 6b — Dispatch on classification + +**No match (`MATCH` is empty) → add to freshFindings** + +The finding has no prior thread. Add it to `FRESH_FINDINGS_JSON` (do not post here — the orchestrator will pass fresh findings to the ADO Writer). + +**`obsolete` → skip** + +No action. Do not post. + +**`pending` → evaluate for new evidence** + +Read the most recent bot comment from the matched thread (last comment whose content contains `SIGNATURE_PREFIX`). Compare its text against the current finding's body text. + +- If **no new evidence** (same issue, same analysis): skip. Do not post. +- If the matched thread has `filePath = null` (general pending thread): always skip. +- If **new evidence** (additional analysis, different suggested fix, new code examples): post a new-evidence reply: + +```bash +cat > "${TMPDIR:-/tmp}/re_review_reply_${THREAD_ID}.json" << ENDJSON +{ + "content": "{NEW_EVIDENCE_CONTENT}\n\n---\n🤖 *Reviewed by Claude Code* — Iteration ${LATEST_ITERATION_ID}", + "commentType": 1 +} +ENDJSON + +az devops invoke \ + --area git \ + --resource pullRequestThreadComments \ + --route-parameters "project=${PROJECT}" "repositoryId=${REPO_ID}" "pullRequestId=${PR_ID}" "threadId=${THREAD_ID}" \ + --org "${ORG_URL}" \ + --http-method POST \ + --in-file "${TMPDIR:-/tmp}/re_review_reply_${THREAD_ID}.json" \ + --api-version "7.1" \ + --output json | node -e "process.stdout.write('New-evidence reply posted, comment ' + String(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).id ?? ''))" +``` + +**`disputed` → post dispute acknowledgement** + +Briefly acknowledge the reviewer's perspective without re-asserting the finding. Always include the ADO nudge before the signature: + +```bash +cat > "${TMPDIR:-/tmp}/re_review_reply_${THREAD_ID}.json" << ENDJSON +{ + "content": "{BRIEF_ACKNOWLEDGEMENT}\n\nIf you consider this resolved, please mark the thread as fixed in Azure DevOps.\n\n---\n🤖 *Reviewed by Claude Code* — Iteration ${LATEST_ITERATION_ID}", + "commentType": 1 +} +ENDJSON + +az devops invoke \ + --area git \ + --resource pullRequestThreadComments \ + --route-parameters "project=${PROJECT}" "repositoryId=${REPO_ID}" "pullRequestId=${PR_ID}" "threadId=${THREAD_ID}" \ + --org "${ORG_URL}" \ + --http-method POST \ + --in-file "${TMPDIR:-/tmp}/re_review_reply_${THREAD_ID}.json" \ + --api-version "7.1" \ + --output json | node -e "process.stdout.write('Dispute acknowledgement posted, comment ' + String(JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')).id ?? ''))" +``` + +**`addressed` → PATCH thread status to fixed** + +```bash +# PATCH thread status to fixed (2) +cat > "${TMPDIR:-/tmp}/re_review_patch_${THREAD_ID}.json" << ENDJSON +{ "status": 2 } +ENDJSON + +PATCH_RESP=$(az devops invoke \ + --area git \ + --resource pullRequestThreads \ + --route-parameters "project=${PROJECT}" "repositoryId=${REPO_ID}" "pullRequestId=${PR_ID}" "threadId=${THREAD_ID}" \ + --org "${ORG_URL}" \ + --http-method PATCH \ + --in-file "${TMPDIR:-/tmp}/re_review_patch_${THREAD_ID}.json" \ + --api-version "7.1" \ + --output json 2>"${TMPDIR:-/tmp}/re_review_patch_${THREAD_ID}.err") +PATCH_EXIT=$? + +PWR_ERR=$(cat "${TMPDIR:-/tmp}/re_review_patch_${THREAD_ID}.err" 2>/dev/null) +PWR_JSON=$( + RESP="$PATCH_RESP" EXIT="$PATCH_EXIT" ERR="$PWR_ERR" PLUGIN_R="$PLUGIN_ROOT" \ + node --input-type=module << 'EOJS' +const { parseWriteResponse } = await import(`file://${process.env.PLUGIN_R}/scripts/ado/parse-write-response.mjs`) +const r = parseWriteResponse({ httpExit: Number(process.env.EXIT), responseText: process.env.RESP, errStream: process.env.ERR }) +process.stdout.write(JSON.stringify(r)) +EOJS +) +PWR_OK=$(printf '%s' "$PWR_JSON" | node -e "const r=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')); process.stdout.write(String(r.ok))") +PWR_TIER=$(printf '%s' "$PWR_JSON" | node -e "const r=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')); process.stdout.write(r.tier||'')") +PWR_MSG=$(printf '%s' "$PWR_JSON" | node -e "const r=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')); process.stdout.write(r.message||'')") + +if [ "$PWR_OK" = "true" ]; then + echo "Thread ${THREAD_ID} patched to fixed" +elif [ "$PWR_TIER" = "aborted" ]; then + cat "${TMPDIR:-/tmp}/re_review_patch_${THREAD_ID}.err" >&2 + echo "ERROR: Could not mark thread ${THREAD_ID} as fixed — ${PWR_MSG}. Try \`az devops login\` to re-authenticate." >&2 + exit 1 +else + cat "${TMPDIR:-/tmp}/re_review_patch_${THREAD_ID}.err" >&2 + NOTICES=$( + N="$NOTICES" SEV="warning" K="patch-to-fixed" \ + M="Could not mark thread ${THREAD_ID} as fixed (${PWR_MSG}). Thread remains active and will be re-evaluated on next re-review." \ + node -e "const a=JSON.parse(process.env.N); a.push({severity:process.env.SEV,kind:process.env.K,message:process.env.M}); process.stdout.write(JSON.stringify(a))" + ) +fi +``` + +--- + +## Step 7 — Clean up temp files + +```bash +rm -f "$PRIOR_THREADS_FILE" "$DIFF_HUNKS_FILE" +rm -f "${TMPDIR:-/tmp}"/re_review_reply_*.json "${TMPDIR:-/tmp}"/re_review_patch_*.json "${TMPDIR:-/tmp}"/re_review_patch_*.err +``` + +--- + +## Step 8 — Return result + +Return the following structured block as your final output. This block is consumed verbatim by the orchestrator. + +`freshFindings` contains only the findings that had **no matching prior thread** — the orchestrator passes these to the ADO Writer to post as new threads. Findings that matched a prior thread (any classification) are consumed here and **not** included in `freshFindings`. + +`earlyExit` is `true` only on the no-new-revisions path (Step 4). On all other paths — including normal completion with zero fresh findings — `earlyExit` is `false`. + +``` +RE_REVIEW_COORDINATOR_RESULT_START +earlyExit: false +addressed: {ADDRESSED_COUNT} +disputed: {DISPUTED_COUNT} +pending: {PENDING_COUNT} +obsolete: {OBSOLETE_COUNT} +freshFindings: {FRESH_FINDINGS_JSON} +NOTICES: {NOTICES} +RE_REVIEW_COORDINATOR_RESULT_END +``` + +Where: + +- `earlyExit` — `true` only when prior and latest iteration IDs were equal (no-new-revisions path); `false` otherwise +- `addressed` — count of prior threads classified as addressed (and PATCHed to fixed) +- `disputed` — count of prior threads classified as disputed (and replied to with acknowledgement) +- `pending` — count of prior threads classified as pending (may include threads that received a new-evidence reply or were skipped) +- `obsolete` — count of prior threads classified as obsolete +- `freshFindings` — JSON array of unmatched findings in the same shape as the input `FINDINGS` array; empty array `[]` if all findings matched prior threads or if `earlyExit` is `true` +- `NOTICES` — JSON array of DEGRADED Notices emitted during this run (may be `[]`); each entry has `{ severity: "warning", kind: "thread-match", message }` + +--- + +## Important invariants + +- **No ADO reads**: do not call `az devops invoke` for GET operations. All data is passed as inputs. +- **No re-fetch of threads**: the orchestrator already captured `RAW_THREADS_JSON` during mode detection — do not call `az repos pr thread list` again. +- **Early exit has no ADO writes**: the no-new-revisions path (Step 4) only prints to console and returns the result block — it never posts replies or PATCHes threads. +- **All four count fields are always present** in the result block, even when zero. +- **Matched findings are consumed**: a finding matched to any classified prior thread is excluded from `freshFindings`, regardless of whether a reply was posted. +- The completion marker is posted by the ADO Writer, not by this coordinator. diff --git a/apps/claude-code/pr-review/.claude-plugin/marketplace.json b/apps/claude-code/pr-review/.claude-plugin/marketplace.json index cd2173a..53d4d3f 100644 --- a/apps/claude-code/pr-review/.claude-plugin/marketplace.json +++ b/apps/claude-code/pr-review/.claude-plugin/marketplace.json @@ -21,7 +21,7 @@ "name": "pr-review", "source": "./", "tags": ["code-quality", "azure-devops"], - "version": "0.9.1" + "version": "1.2.10" } ] } diff --git a/apps/claude-code/pr-review/.claude-plugin/plugin.json b/apps/claude-code/pr-review/.claude-plugin/plugin.json index 1d22639..989cb7b 100644 --- a/apps/claude-code/pr-review/.claude-plugin/plugin.json +++ b/apps/claude-code/pr-review/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "pr-review", - "version": "0.9.1", + "version": "1.2.10", "description": "Review Azure DevOps pull requests with multi-agent analysis and post threaded comments back to the PR.", "author": { "name": "Unic AG", diff --git a/apps/claude-code/pr-review/CHANGELOG.md b/apps/claude-code/pr-review/CHANGELOG.md index 810bfd5..8188eff 100644 --- a/apps/claude-code/pr-review/CHANGELOG.md +++ b/apps/claude-code/pr-review/CHANGELOG.md @@ -11,6 +11,230 @@ ### Fixed - (none) +## [1.2.10] — 2026-05-14 + +### Breaking +- (none) + +### Added +- (none) + +### Changed +- (none) + +### Fixed +- `ado-writer.mjs` NOTICES block JSON parse failure now returns `{ ok: false, reason: 'malformed' }` instead of silently dropping all Writer-emitted Notices and returning `{ ok: true, notices: [] }`. +- `parse-write-response.mjs` now appends `errStream` content to the error message for all classified failure tiers (auth/transient), not only the malformed-response path — giving auth failures meaningful context when the response body is empty. +- `notices.mjs` `formatTrailer` aborted branch no longer emits a stray ` — ` separator when `abortReason` is absent. +- `fetch-work-items.mjs` now routes non-zero exit codes through `classifyHttpError`, returning `reason: 'auth'` (401/403), `reason: 'malformed'` (4xx malformed-request), or `reason: 'transient'` (5xx / network) instead of the generic `reason: 'fetch-failed'`. `@returns` JSDoc updated to use a literal union. +- `fetch-work-items.mjs` guards against `null` / non-object elements in the ADO `value` array to prevent `TypeError: Cannot read properties of null (reading 'id')`. +- `fetch-iterations.mjs` `malformed-request` HTTP kind now maps to `reason: 'malformed'` instead of `reason: 'transient'`, preventing structural ADO API errors from being retried as transient network failures. +- `fetch-iterations.mjs` guards against `null` / non-object elements in the `value` array before calling `.reduce()`. +- `detect-default-branch.mjs` `source: 'none'` result now includes a `warning` Notice (`kind: 'default-branch'`) so the caller can surface the abort reason through the Notice pipeline. Previously returned no notice. +- `detect-default-branch.mjs` trims whitespace from `remoteHeadBranch` before the truthy check, preventing a whitespace-only string from being returned as the detected branch name. +- `detect-default-branch.mjs` local `Notice` typedef replaced with canonical import from `notices.mjs`, ensuring `kind` is validated against `NoticeKind` rather than `string`. +- `classify-thread.mjs` JSDoc rule list corrected to 5 rules matching the actual evaluation order (status-check → obsolete-check → intersection-check → disputed-check → pending); previous comment conflated rules 1 and 3. + +## [1.2.9] — 2026-05-14 + +### Breaking +- (none) + +### Added +- New helper `scripts/pre-pr/detect-default-branch.mjs` — pure function `detectDefaultBranch({ branchExists, remoteHeadBranch })` returning `{ branch, source, notice? }`. Fallback chain: `remote-show` → `origin/develop` → `origin/main` → `origin/master` → `none`. Emits a `warning` Notice (`kind: default-branch`) for every fallback level; no notice for `remote-show`. `{ branch: null, source: 'none' }` aborts the Pre-PR run. 7 unit cases covering all branches. + +### Changed +- `buildPrePrContext` return type extended to include `notices: Notice[]`. Suspicious-shape detection: non-empty diff containing ≥ 1 `diff --git` header but yielding zero parsed paths emits a DEGRADED Notice (`kind: diff-parse`). Normal diffs return `notices: []`. +- Pre-PR mode Step A now calls `detectDefaultBranch` (Gitflow-aware fallback chain) instead of the hardcoded `main` fallback. On `branch: null` the run aborts with a clear stderr message and a Trailer aborted line. Any fallback notice is collected in `PRE_PR_NOTICES`. +- Pre-PR mode Step B merges `buildPrePrContext().notices` into `PRE_PR_NOTICES` via `mergeNotices`. +- Pre-PR mode Step E prints all Notices (via `formatNoticesAsPrePrPreamble`) before findings, and passes `PRE_PR_NOTICES` to `formatTrailer` so the Trailer reflects the actual notice count. + +### Fixed +- Pre-PR mode default-branch detection no longer silently falls through to `main` when `git remote show origin` is offline or returns an unexpected format. The fallback chain (`develop` → `main` → `master`) now emits a visible warning Notice naming the actually-used branch, and `none` aborts the run with an actionable error message. +- Pre-PR mode malformed diffs (non-empty input with `diff --git` headers but zero parsed paths) now surface a DEGRADED Notice instead of silently proceeding with an empty file list. + +## [1.2.8] — 2026-05-14 + +### Breaking +- (none) + +### Added +- (none) + +### Changed +- Re-review Coordinator PATCH-to-fixed call site now routes through `parse-write-response.mjs`. On `tier: aborted` (401/403) the Coordinator exits non-zero with a clear stderr message and the orchestrator surfaces a Trailer abort line. On `tier: degraded` (5xx/network/other-4xx) a per-thread DEGRADED Notice (`kind: patch-to-fixed`) is pushed to the Coordinator's `NOTICES` array and iteration continues. 404 and 409 continue silently (canonical OK). +- Orchestrator Step 7 now handles a missing coordinator result block (coordinator exited non-zero): infers `abortKind` from output and calls `formatTrailer` before stopping. + +### Fixed +- PATCH-to-fixed 401/403 auth failures were previously logged as a "PATCH warning" string on stdout that nothing read — the run continued silently. They now abort the Coordinator with a clear stderr message. +- PATCH-to-fixed 409 catch-all replaced by the canonical HTTP-tier mapping; 404 is now also treated as OK (deleted thread is a domain success). + +## [1.2.7] — 2026-05-14 + +### Breaking +- (none) + +### Added +- (none) + +### Changed +- `matchFinding` now throws a `TypeError` when `priorThreads` is not an array or when `finding` is missing required typed fields (`filePath: string`, `startLine: number`, `endLine: number`). Previously, malformed input could produce an uncaught exception that was silently swallowed as a no-match. +- Re-review Coordinator Step 6a wraps the `match-finding` call in a try/catch. On throw, a DEGRADED Notice (`kind: thread-match`) is pushed to the Coordinator's `NOTICES` array and the finding falls through to the unclassified (no-match) path. The Coordinator result block now includes a `NOTICES: [...]` field. +- Orchestrator Step 7 extracts `NOTICES` from the Coordinator result block; Step 8 includes them in the combined `mergeNotices` call alongside Fetcher and Writer notices. + +### Fixed +- Match-finding parse errors were previously silently swallowed by `2>/dev/null || echo ""` guards in the Coordinator, causing the affected finding to be treated as no-match and re-posted as a duplicate inline thread with no visible signal. The throw contract and DEGRADED Notice surface the cause to the reviewer. + +## [1.2.6] — 2026-05-14 + +### Breaking +- (none) + +### Added +- (none) + +### Changed +- `addressed` threads are now silently resolved — the Re-review Coordinator PATCHes the thread status to fixed (status 2) without posting a Reply comment. Previously a "Resolved — thanks!" reply was posted, generating an ADO notification for every thread participant. +- `classifyThread` now accepts a `diffRange: 'full' | 'incremental'` parameter (default `'incremental'`). When `'full'`, outputs `addressed` and `obsolete` are remapped to `pending` (γ-downgrade per ADR-0004) since diff-position evidence is unreliable on a widened range. `disputed` is unaffected. +- Re-review Coordinator (Step 5) parses `DIFF_RANGE` from `ADO_FETCHER_RESULT` and threads it into every `classify-thread` invocation. + +### Fixed +- Re-reviews that fell back to a full diff (prior commit unreachable) no longer produce false-confidence `addressed` or `obsolete` classifications; all such threads are conservatively downgraded to `pending`. + +## [1.2.4] — 2026-05-14 + +### Breaking +- (none) + +### Added +- (none) + +### Changed +- `parseAdoWriterResult` now returns a discriminated union `{ ok: true, summaryThreadId, findingsPosted, notices } | { ok: false, reason: 'missing-block' | 'malformed' }` instead of a partial object with null fields. Callers must branch on `result.ok` before accessing result fields. + +### Fixed +- Writer crash no longer silently reported as success: the orchestrator now emits a clear stderr error and an aborted Trailer when the Writer's result block is missing or malformed. + +## [1.2.3] — 2026-05-14 + +### Breaking +- (none) + +### Added +- `scripts/ado/parse-write-response.mjs` — pure function `parseWriteResponse({ httpExit, responseText, errStream })` returning `{ ok: true, id } | { ok: false, tier, kind, message }`. Composes `classifyHttpError` with response-id parsing; 404/409 map to `{ ok: true, id: null }` (canonical OK with no resource created); 200 without a numeric id maps to `{ ok: false, tier: 'degraded', kind: 'malformed-response' }`. Covered by `tests/parse-write-response.test.mjs` (13 unit cases spanning all branches). + +### Changed +- ADO Writer prompt routes every `az devops invoke` POST/PATCH call site through `parseWriteResponse`. On `tier: 'aborted'` (401/403), the Writer streams the `.err` file to stderr and exits non-zero. On `tier: 'degraded'` (5xx/network/other-4xx), the Writer pushes a typed DEGRADED Notice to its internal `NOTICES` array and continues to the next call site. `ADO_WRITER_RESULT_START/END` block gains a `NOTICES: [...]` field. +- Orchestrator Step 8 now parses Writer `NOTICES` from the result block and merges them into `NOTICES_JSON` via `mergeNotices` before printing the Trailer, so all Notice counts reflect both Fetcher and Writer sources. +- `parseAdoWriterResult` return type extended to `{ summaryThreadId, findingsPosted, notices: Notice[] }`. Legacy blocks without a `NOTICES` field return `notices: []`. + +### Fixed +- ADO Writer inline-POST auth failures (HTTP 401/403) now abort the Writer immediately with a clear stderr message. Previously they were silently logged and the run continued, leaving subsequent writes potentially authenticated against stale credentials. + +## [1.2.2] — 2026-05-13 + +### Breaking +- (none) + +### Added +- (none) + +### Changed +- ADO Fetcher result block (`ADO_FETCHER_RESULT_START/END`) now includes a `DIFF_RANGE: full | incremental` field reflecting which diff strategy was used. Orchestrator parses the field; the Coordinator γ-downgrade that consumes it is deferred to PRD B issue B3. + +### Fixed +- Diff-range fallback in the ADO Fetcher no longer fires silently. When the prior iteration's commit is unreachable and the Fetcher falls back to `origin/${TARGET_BRANCH}...HEAD`, a `warning` Notice (`kind: diff-range`) is now emitted in the Fetcher's `NOTICES` array so the reviewer sees the degraded state in the Summary. + +## [1.2.1] — 2026-05-13 + +### Breaking +- (none) + +### Added +- `scripts/ado/fetch-iterations.mjs` — pure function `fetchIterations({ responseText, exitCode })` returning `{ ok: true, latestIterationId, latestCommitSha } | { ok: false, reason, message }`. Subsumes `parseIterations`; uses `classifyHttpError` for HTTP failures; distinguishes empty-iterations ABORTED from auth/transient/malformed. Covered by `tests/fetch-iterations.test.mjs` (8 unit cases spanning all reason branches). + +### Changed +- ADO Fetcher prompt Step 2 (iterations fetch) now delegates to `fetchIterations` via `await import`. On `{ ok: false }`, the Fetcher exits non-zero with a clear stderr message and the orchestrator emits a Trailer aborted line. + +### Fixed +- `parseIterations` and its silent `iterationId=1` fallback for empty-iterations are removed; an empty iterations endpoint response now aborts the run instead of silently signing comments with `Iteration 1`. + +## [1.2.0] — 2026-05-13 + +### Breaking +- (none) + +### Added +- `scripts/ado/classify-http-error.mjs` — pure function `classifyHttpError({ status, body, exitCode })` implementing the canonical HTTP-tier mapping (200/201/404/409 → OK; 401/403 → ABORTED; 5xx/other-4xx → DEGRADED; network/exit-code → DEGRADED). Covered by `tests/classify-http-error.test.mjs` (16 unit cases spanning every mapping row, malformed-body paths, and network-exit-code paths). +- `scripts/ado/fetch-work-items.mjs` — pure function `fetchWorkItems({ responseText, exitCode })` returning `{ ok: true, ids } | { ok: false, reason, message }`. Subsumes `parseWorkItemIds`; distinguishes EMPTY-BY-DESIGN (`{ ok: true, ids: [] }`) from fetch failure (`{ ok: false }`). Covered by `tests/fetch-work-items.test.mjs` (9 unit cases). +- ADR 0015 (`docs/adr/0015-canonical-http-tier-mapping.md`) recording the HTTP-tier mapping table, the 401/403 abort rule, and the no-retries-in-v1 stance. + +### Changed +- ADO Fetcher prompt Step 5 (`work-item fetch`) now delegates to `fetchWorkItems` via `await import`. On `{ ok: false }`, emits a DEGRADED Notice (`kind: work-items`) into the `NOTICES` array. On `{ ok: true, ids: [] }`, still emits the existing EMPTY-BY-DESIGN `info` Notice (`kind: doc-context`). + +### Fixed +- `parseWorkItemIds` is removed; callers that received an empty array on auth failure can no longer conflate a fetch failure with a legitimately empty work-item list. + +## [1.1.0] — 2026-05-13 + +### Breaking +- (none) + +### Added +- `scripts/ado/notices.mjs` — pure helpers (`createNotice`, `mergeNotices`, `formatNoticesAsSummaryBlock`, `formatNoticesAsPrePrPreamble`, `formatTrailer`) implementing the four-tier Notice doctrine (OK / EMPTY-BY-DESIGN / DEGRADED / ABORTED). Covered by `tests/notices.test.mjs` (14 unit cases). +- ADR 0014 (`docs/adr/0014-notice-tier-doctrine-and-failure-classification-helpers.md`) recording the four-tier doctrine, the no-fifth-ASK-tier rule, the Notice shape (`{ severity, kind, message }`), the canonical `kind` enum, the mandatory end-of-run Trailer convention, and the helper-layer refinement to ADR 0013. +- ADO Fetcher `ADO_FETCHER_RESULT_START`/`_END` block now carries a `NOTICES` JSON array. When `WORK_ITEM_IDS=[]`, the Fetcher emits an `info` Notice (`kind: doc-context`, message: "Reviewed without business context — no work items linked to this PR."). + +### Changed +- Orchestrator (`commands/review-pr.md`) parses `NOTICES` from the Fetcher result block, sets `NOTICES_JSON` via `mergeNotices`, and threads it into the ADO Writer prompt. New `Step 8 — End-of-run Trailer` prints one mandatory `formatTrailer` line in the Claude interface for every run (success, abort). Pre-PR mode's completion line is now also a `formatTrailer` call (`mode: 'pre-pr'`) so AFK invokers see the same trailer shape across modes. +- ADO Writer (`.agents/ado-writer.md`) accepts a new `NOTICES_JSON` input and renders a `## Notices` block above severity-grouped findings in the Review Summary content (heading bare; `ℹ️` / `⚠` prefixes per item). Empty `NOTICES_JSON` produces no `## Notices` heading. +- Orchestrator `## Constants` section removed; the `SIGNATURE_PREFIX` invariant is now expressed inline at every call site that needed it (the constant value was already inlined; the section was documentation only). + +### Fixed +- (none) + +## [1.0.0] — 2026-05-12 + +### Breaking +- (none) + +### Added +- Orchestrator split: `review-pr.md` refactored from a monolithic command to a thin orchestrator (≤ 200 lines per PRD acceptance criterion) that delegates ADO API calls and coordination logic to three focused agents +- ADO Fetcher agent: handles all Azure DevOps REST API fetches (diff, threads, iterations) in a single dedicated context window +- Re-review Coordinator agent: classifies prior bot threads, computes incremental diffs, and decides per-thread reply actions +- ADO Writer agent: posts all inline thread comments and the summary comment back to ADO, keeping write operations isolated from analysis +- Pre-PR mode: invoke `/pr-review:review-pr` without an ADO URL to review a local branch diff before the PR is created; findings are printed to the terminal instead of posted to ADO +- Compact sub-agent output: all review-aspect agent prompts now include an explicit JSON output contract, keeping reasoning inside each agent's context window and returning only structured `{ severity, filePath, startLine, endLine, title, body }[]` arrays to the orchestrator +- New `scripts/re-review/parse-diff-hunks.mjs` helper module (with 7 unit tests) that parses raw `git diff` text into per-hunk `{ filePath, startLine, endLine }` entries — pure function, no I/O, slash-prefixed file paths. +- New `scripts/mode-detection.mjs` helper that consolidates `Step 4` re-review detection and exports both `detectMode()` and `formatModeEnv()` used by the orchestrator. + +### Changed +- Trim `commands/review-pr.md` from 297 lines to ≤ 200 lines to meet the PRD acceptance criterion: extracted mode-detection to a helper, factored the duplicated `MODE`/`SUMMARY_THREAD_ID` write-back into a single ADO Writer prompt, consolidated the compact finding schema into one shared block referenced by Step 6 and Pre-PR Step D, and tightened instructional prose. Realigned the compact-output guidance tests to assert against the shared schema block + each section's reference, removing fragile section-slice substring assertions. + +### Fixed +- Convert static imports of helper modules to `await import(...)` in agent prompts — static `import` does not accept dynamic specifiers. +- Port the re-review diff-hunk parser from a `python3` heredoc to a Node helper (`parse-diff-hunks.mjs`) in `re-review-coordinator.md` Step 1 — Windows-native CI and developer machines have no `python3`, breaking the cross-platform rule. +- Replace bare `/tmp/` literals with `${TMPDIR:-/tmp}/` across `re-review-coordinator.md` (reply/patch/error files in Steps 6 and 7) and `ado-writer.md` (thread, fallback, summary, delta, completion files in Steps 1–4) so temp files honour the OS-configured temp directory. +- Drop the `.json` suffix from `mktemp ".../re_review_hunks_XXXXXX"` / `re_review_prior_threads_XXXXXX` patterns — BSD `mktemp` on macOS rejects suffixes after the `X` template. +- H1 — ADO Writer Step 1 no longer bumps `FINDINGS_POSTED` unconditionally after the threadContext fallback. The substring `"message"` heuristic is replaced by a structural check (exit code + numeric `id` parsed by Node); on confirmed failure the writer logs the captured stderr from the `*.err` file and continues to the next finding rather than miscounting a missing post as success. +- H2 — ADO Writer Step 2 no longer swallows summary/delta POST failures. The summary POST and the re-review delta-reply POST now capture exit code + parsed numeric `id`; on failure the writer aborts with a non-zero exit and a clear stderr message, because the completion marker and the next re-review's detection both depend on a valid `SUMMARY_THREAD_ID` — silent failure here corrupts re-review state forever. +- H3 — Orchestrator Step 4 no longer coerces `az repos pr thread list` failures to `[]`. The fetch is now captured separately; on non-zero exit the orchestrator emits a clear stderr error ("ERROR: failed to fetch PR threads via Azure CLI. Try `az devops login` to re-authenticate.") and exits `1`, preventing a fetch failure from being mistaken for "no prior threads" and triggering a duplicate-post storm on re-review. +- H4 — Re-review Coordinator Step 3 partial-run check no longer conflates "marker missing" with "check crashed". The Node heredoc now wraps its body in try/catch and exits with distinct codes (`0` = found, `1` = not found, `2` = crash); the bash side branches on those codes and aborts the coordinator with exit `3` on a crash instead of silently downgrading to first-review mode and re-posting every prior thread. +- H5 — ADO Fetcher Step 4 branch-checkout fallback is now an executable `||` chain instead of a literal shell comment. If `az repos pr checkout` fails, the agent now actually runs `git fetch origin "$SOURCE_BRANCH" && git checkout "$SOURCE_BRANCH"`, and aborts with a clear stderr error if both fail — previously the comment-form fallback never ran and the agent silently continued on the wrong branch. +- Re-review Coordinator inline cross-references in Steps 2 and 3 pointed to a non-existent `Step 7 — Return result` section (the actual return-result heading is Step 8, after `Step 7 — Clean up`). Anchors now resolve and use the same numbering as the headings. +- Re-review Coordinator Inputs section now states explicitly that `PRIOR_ITERATION_ID` is recomputed internally by `detect-prior-review` from `RAW_THREADS_JSON`; the orchestrator's own `PRIOR_ITERATION_ID` is not threaded in, preventing redundant input plumbing. +- Re-review Coordinator no-prior-threads and partial-run branches no longer claim to "fall back to first-review mode" — the coordinator does not switch modes, it returns a result block with zero counts and `freshFindings = FINDINGS`, and the orchestrator does not change agent dispatch based on this. Prose corrected in Step 2, Step 3, and the two associated `echo` log lines. +- Re-review Coordinator Step 6a now states up front that `{finding.filePath}` / `{finding.startLine}` / `{finding.endLine}` are prompt-template placeholders to be substituted by the agent for the current `FINDINGS` element, not bash variables. +- ADO Writer Step 2's `MODE=re-review, zero new findings` branch now notes that Step 3 still posts the completion marker on every successful run, resolving the apparent contradiction with the "Do not post anything" line. +- ADO Fetcher output documentation now flags `LATEST_COMMIT_SHA` as reserved for future diff-range debugging and unused by any current downstream agent (the diff-range logic that needed it is self-contained in Step 4) — prevents future contributors from threading it through new agents under the assumption it is consumed. +- Orchestrator Step 6 prose no longer claims the review-aspect-agent prompts receive `PR_TITLE` and `PR_DESCRIPTION`. The Fetcher captures them for downstream use, but the orchestrator does not parse them, so the prose now reads "full diff and changed file contents" only — removing the contradiction with Step 5's parse list. +- Pre-PR mode default-branch detection no longer silently leaves `DEFAULT_BRANCH` empty when `git remote show origin` produces no `HEAD branch:` line. The pipeline now filters empty awk output through `grep .` so the `|| echo "main"` fallback fires for real, instead of being short-circuited by a still-zero-exit awk. +- `shouldSkipFile` now uses the lower-cased path for the `/generated/` directory check too, so capitalised `.NET`-style paths like `/Source/Generated/ApiClient.cs` are skipped consistently with the other rules. +- `parseChangedFilesFromDiff` now splits the diff text on `/\r?\n/` (matching the sibling `parseDiffHunks` helper), so CRLF-formatted diffs from Windows Git no longer produce paths with a trailing `\r`. + +### Fixed +- (none) + ## [0.9.1] — 2026-05-08 ### Breaking diff --git a/apps/claude-code/pr-review/CLAUDE.md b/apps/claude-code/pr-review/CLAUDE.md index 008ff1a..794f9c6 100644 --- a/apps/claude-code/pr-review/CLAUDE.md +++ b/apps/claude-code/pr-review/CLAUDE.md @@ -18,10 +18,16 @@ A Claude Code plugin (`pr-review`) that adds a `/pr-review:review-pr` command. W plugin.json # Plugin manifest (name, version, description) marketplace.json # Marketplace listing metadata commands/ - review-pr.md # The slash command definition — this is the core logic + review-pr.md # The slash command definition — thin orchestrator +.agents/ + ado-fetcher.md # ADO Fetcher — all Azure DevOps REST API fetches + re-review-coordinator.md # Re-review Coordinator — thread classification + incremental diff + ado-writer.md # ADO Writer — posts inline threads and summary comment to ADO + doc-context-orchestrator.md # Doc Context Orchestrator — work item + Confluence fetching + doc-context-synthesizer.md # Doc Context Synthesizer — produces business-context narrative ``` -The entire behaviour of the plugin lives in `commands/review-pr.md`. There are no build steps, no transpilation, no dependencies to install. +`commands/review-pr.md` is a thin orchestrator (≤ 200 lines per PRD acceptance criterion). It delegates ADO API calls and coordination logic to the focused agents in `.agents/`. Pure helpers used by both the orchestrator and the agents live under `scripts/` (`ado-fetcher.mjs`, `ado-writer.mjs`, `pre-pr.mjs`, `mode-detection.mjs`, `confluence-client.mjs`, `re-review/*.mjs`) with tests under `tests/`. There are no build steps, no transpilation, no dependencies to install. ## Plugin metadata @@ -33,8 +39,10 @@ When bumping the version, update it in **both** files: ## Command conventions (`commands/review-pr.md`) - YAML frontmatter declares `allowed-tools` — add any new tools the command needs there -- Auto-generated files are explicitly skipped in Step 6 (serialization YAMLs, `*.g.cs`, generated types output, `swagger.md`) +- Auto-generated files are skipped during file-content reading by the `shouldSkipFile` helper in `scripts/pre-pr.mjs` (serialization YAMLs, `*.g.cs`, generated types output, `swagger.md`) - All comments posted to ADO **must** end with the exact signature: `---\n🤖 *Reviewed by Claude Code* — Iteration N` (where N = LATEST_ITERATION_ID) +- ADO REST calls (`pullRequestThreads`, thread replies, iteration fetches) are handled by the focused agents in `.agents/`, not inline in the orchestrator command +- ADO Fetcher (`ado-fetcher.md`) owns all read operations; ADO Writer (`ado-writer.md`) owns all write operations; Re-review Coordinator (`re-review-coordinator.md`) owns thread classification and incremental diff logic - Inline threads use ADO REST `pullRequestThreads` via `az devops invoke`; file paths must match ADO format (leading `/`, forward slashes) - Always use the latest iteration of the PR. `iterationId=1` is never used. Re-reviews additionally compute `PRIOR_ITERATION_ID` from the prior review's signature — see spec 02. - If `az devops invoke` returns a `threadContext` error, fall back to posting without `threadContext` (general comment) diff --git a/apps/claude-code/pr-review/CONTEXT.md b/apps/claude-code/pr-review/CONTEXT.md index ca2c8f8..1b896d6 100644 --- a/apps/claude-code/pr-review/CONTEXT.md +++ b/apps/claude-code/pr-review/CONTEXT.md @@ -77,6 +77,34 @@ _Avoid_: merger, aggregator, deduplicator A self-contained plugin agent that orchestrates the entire Doc Context gathering phase — fetching work item details, running the Confluence credential check once, spawning Work Item Summarizer and Confluence Fetcher agents in parallel, and delegating final synthesis to the Doc Context Synthesizer. Returns the Synthesizer's output verbatim as a plain markdown string. _Avoid_: context orchestrator, doc orchestrator, gathering agent +### Operating modes + +**Pre-PR mode**: +A Review run without a PR URL, targeting a local branch diff. No ADO write-back occurs; findings are presented in the Claude interface only. +_Avoid_: local review, offline review, draft review + +**First-review mode**: +A Review run against an ADO PR where no prior Bot Signature is found. Produces a full set of Inline Comments and a Review Summary posted to ADO. +_Avoid_: initial review, fresh review + +**Re-review mode**: +A Review run against an ADO PR where a prior **Bot Signature** is found in the PR's threads. Focuses on commits since the last Review, performs Thread Classification, and replies to or resolves existing Review Threads rather than duplicating them. +_Avoid_: incremental review, follow-up review, second pass + +### Orchestration agents + +**ADO Fetcher**: +A plugin agent that retrieves PR metadata, iterations, changed files, and the raw diff from Azure DevOps. Used by first-review and re-review modes; not invoked in pre-PR mode. +_Avoid_: fetcher, data agent, ADO client + +**Re-review Coordinator**: +A plugin agent that owns the full re-review state machine — prior thread detection, partial-run check, Thread Classification, finding matching, and reply posting to classified threads. Invoked only in re-review mode. +_Avoid_: re-review agent, rereview handler + +**ADO Writer**: +A plugin agent responsible for all ADO write-back operations — posting Inline Comments, patching thread status, and posting the Review Summary or delta reply. Used by first-review and re-review modes. +_Avoid_: writer agent, comment poster, ADO publisher + ### Re-review classification **Thread Classification**: @@ -95,6 +123,30 @@ A Thread Classification state. No action taken; the issue still exists in the ne **obsolete**: A Thread Classification state. The relevant code was deleted or moved; the comment no longer applies. +### Platform-failure handling + +**Notice**: +A user-facing message emitted by an orchestration agent when a Review operation completed in a non-OK Notice Tier. Carries `severity` (`info` or `warning`), `kind` (a small enum identifying the failed operation), and a one-line `message`. Notices are merged across agents by the orchestrator, rendered in the Review Summary, included in the end-of-run Trailer, and (for Pre-PR mode) printed in the Claude interface before findings. +_Avoid_: warning, error, log line + +**Notice Tier**: +A four-state classification of every Review operation outcome: **OK**, **EMPTY-BY-DESIGN**, **DEGRADED**, **ABORTED**. The tier choice IS the gating decision — there is no fifth "ask the user" tier. Failure modes that tempt one are reclassified as ABORTED. + +**OK**: +A Notice Tier. The operation completed with a non-empty result. No Notice emitted. + +**EMPTY-BY-DESIGN**: +A Notice Tier. The operation completed with an empty result that is a legitimate domain state (no work-items linked, no Confluence pages, no prior threads). Currently emits an `info` Notice only for the Doc Context family; other empty states are inherent to the Review type and stay silent. + +**DEGRADED**: +A Notice Tier. The operation failed but the Review can still complete with reduced coverage. Emits a `warning` Notice; the Review still posts. + +**ABORTED**: +A Notice Tier. The operation failed and continuing would corrupt cross-run state (Bot Signature drift, Summary thread desync, mode misdetection). The run stops before the Review Summary is composed; the failure goes to stderr plus the end-of-run Trailer. + +**Trailer**: +A single end-of-run line printed by the orchestrator to the Claude interface, regardless of mode or success state. Carries findings count by severity, Notice counts by severity, and (for ADO modes) the PR URL. Designed for AFK skim: the invoker sees outcome status without opening the PR. + ## Relationships - A **Review** produces one **Review Summary**, zero or more **Inline Comments**, and zero or more **General Comments** @@ -106,6 +158,10 @@ A Thread Classification state. The relevant code was deleted or moved; the comme - A **Doc Context** is assembled via a three-tier pipeline: the **Doc Context Orchestrator** spawns **Work Item Summarizer** and **Confluence Fetcher** agents (Doc Context Sub-agents) in parallel, then delegates their outputs to the **Doc Context Synthesizer**, which produces the final `DOC_CONTEXT` narrative injected into every Review Aspect agent - A **Doc Context Sub-agent** operates on a single source (work item or Confluence page) and receives the changed files list and the local diff when available - The **Doc Context Orchestrator** returns the **Doc Context Synthesizer**'s output verbatim; it does not rewrite or reformat the narrative +- The **ADO Fetcher** is invoked by first-review and re-review modes; **Pre-PR mode** skips it entirely and goes directly to Review Aspect agents +- The **Re-review Coordinator** is invoked only when the mode is re-review; first-review and pre-PR modes never load it +- The **ADO Writer** is invoked by first-review and re-review modes; **Pre-PR mode** does not write back to ADO +- Every operation in an orchestration agent terminates in one of the four **Notice Tiers**. **DEGRADED** and **EMPTY-BY-DESIGN**-with-message operations emit a **Notice** that flows from the agent's structured result block, through the orchestrator's merge step, into the **Review Summary** (for ADO modes) or the printed pre-findings block (for **Pre-PR mode**). The end-of-run **Trailer** carries Notice counts so the invoker sees them without opening the PR. ## Example dialogue diff --git a/apps/claude-code/pr-review/commands/review-pr.md b/apps/claude-code/pr-review/commands/review-pr.md index a297a48..962ed64 100644 --- a/apps/claude-code/pr-review/commands/review-pr.md +++ b/apps/claude-code/pr-review/commands/review-pr.md @@ -6,990 +6,182 @@ description: 'Review an Azure DevOps pull request: fetch diff, run multi-agent a # Azure DevOps PR Review -Perform a comprehensive code review for an Azure DevOps pull request, then post findings as threaded comments directly on the PR (inline where possible) and one general summary comment. - **Arguments:** "$ARGUMENTS" ---- +Thin orchestrator that detects one of three modes — Pre-PR, First-review, Re-review — and delegates to focused agents. The `SIGNATURE_PREFIX` `🤖 *Reviewed by Claude Code*` is sacred (re-review detection depends on it) and appears inline at every call site that needs it. -## Prerequisites check +### Compact finding schema -Before starting, verify: +Every review aspect agent prompt (Step 6, Step D) ends with this exact contract: -```bash -az --version 2>&1 | head -1 -az extension list --output table 2>&1 | grep azure-devops ``` +Return your findings as a JSON array. Each element must have exactly these six fields: +- severity: "critical" | "important" | "minor" +- filePath: string — leading /, forward slashes, matching ADO format (e.g. /src/foo.ts) +- startLine: integer — first line of the relevant range +- endLine: integer — last line of the relevant range (same as startLine for single-line findings) +- title: string — one line, ≤ 80 chars +- body: string — one paragraph; the exact text to post as the ADO comment or local-interface comment -If `azure-devops` extension is missing: `az extension add --name azure-devops` - -Also verify `pr-review-toolkit` is available by checking if the agent `pr-review-toolkit:code-reviewer` can be invoked. If that plugin is not installed and enabled, stop immediately and tell the user: - -> This command requires the `pr-review-toolkit` plugin (from `anthropics/claude-plugins-official`) to be installed and enabled. Enable it via Claude Code settings → Plugins, then re-run this command. - ---- - -## Step 1 — Parse the PR URL - -Extract from `$ARGUMENTS`. Expected ADO format: - -```txt -https://dev.azure.com/{org}/{project}/_git/{repo}/pullrequest/{id} +Keep reasoning and supporting evidence inside your own context. Do not include code quotes, prose reasoning, or any text outside the JSON array in your return value. ``` -Variables to extract: +### Aspect-filter selection (used in Step 6 and Pre-PR Step D) -- `ORG_URL` = `https://dev.azure.com/{org}` -- `PROJECT` = `{project}` -- `PR_ID` = `{id}` +Parse `$ARGUMENTS` for an aspect filter (`code` | `errors` | `tests` | `comments` | `types` | `all`); default `all`. Always run `pr-review-toolkit:code-reviewer` and `pr-review-toolkit:silent-failure-hunter`. Also run `pr-review-toolkit:pr-test-analyzer` if test files changed, `pr-review-toolkit:comment-analyzer` if docs/comments were added, and `pr-review-toolkit:type-design-analyzer` if new types were introduced. -**GitHub URLs** (`https://github.com/...`) are not supported — tell the user and stop. +## Step 1 — Prerequisites -If no URL provided, run `az repos pr list --status active --output table` to help them pick one. +Verify `pr-review-toolkit` is enabled (e.g. the `pr-review-toolkit:code-reviewer` agent exists). If missing, stop with installation instructions. Verify `git --version` succeeds. ---- +## Step 2 — Parse arguments and detect mode -## Step 2 — Check the default `az` org +Extract a PR URL from `$ARGUMENTS`. Expected format: `https://dev.azure.com/{org}/{project}/_git/{repo}/pullrequest/{id}`. GitHub URLs are not supported. -```bash -az devops configure --list -``` +- **No URL** → `MODE=pre-pr` → jump to [Pre-PR mode](#pre-pr-mode). +- **URL present** → extract `ORG_URL`, `PROJECT`, `PR_ID` and continue. -Note the configured `organization`. If it differs from `ORG_URL`, pass `--org {ORG_URL}` explicitly in every `az` command below. +## Step 3 — Azure CLI check (PR modes only) ---- - -## Step 3 — Fetch PR metadata - -```bash -az repos pr show --id {PR_ID} --org {ORG_URL} --output json -``` +Run `az --version` and `az extension list | grep azure-devops`. If missing: `az extension add --name azure-devops`. -Capture and remember: +## Step 4 — Re-review detection -- `repository.id` → `REPO_ID` (UUID, e.g. `99bf5e9b-...`) -- `sourceRefName` → source branch (e.g. `refs/heads/feature/my-branch`) -- `targetRefName` → target branch (e.g. `refs/heads/develop`) -- `title`, `description` -- `status` — note if already merged (`mergeStatus: succeeded`); continue anyway, comments are still useful as a review record -- `createdBy.displayName` - -Strip `refs/heads/` prefix to get plain branch names for git commands. - -Capture additionally: - -- `repository.project.name` → `PROJECT` - ---- - -## Step 3.5 — Detect prior review - -Fetch all existing PR threads and check for prior Claude Code comments. This step runs **unconditionally** and performs **no write actions**. - -### Variables exported by this step - -| Variable | Type | Description | -| -------------------- | ------------------- | -------------------------------------------------------------- | -| `IS_REREVIEW` | `true`/`false` | Whether a prior Claude Code review was found | -| `PRIOR_THREADS_FILE` | path | Temp file — jq-readable JSON array of prior bot threads | -| `SUMMARY_THREAD_ID` | integer or `""` | Thread ID of the prior summary thread (if any) | -| `PRIOR_ITERATION_ID` | integer or `"null"` | Iteration number parsed from the most recent prior bot comment | - -### Fetch all threads (paginated) +Fetch the thread list **once**; never re-fetch downstream. ```bash -PRIOR_THREADS_RAW="$(mktemp "${TMPDIR:-/tmp}/pr_threads_raw_XXXXXX.json")" -PRIOR_THREADS_ALL="$(mktemp "${TMPDIR:-/tmp}/pr_threads_all_XXXXXX.json")" -echo '[]' > "$PRIOR_THREADS_ALL" - -CONTINUATION_TOKEN="" -while true; do - EXTRA_ARGS=() - if [ -n "$CONTINUATION_TOKEN" ]; then - EXTRA_ARGS=(--query-parameters "continuationToken=$CONTINUATION_TOKEN") - fi - - az devops invoke \ - --area git \ - --resource pullRequestThreads \ - --route-parameters "project=$PROJECT" "repositoryId=$REPO_ID" "pullRequestId=$PR_ID" \ - --org "$ORG_URL" \ - --api-version "7.1" \ - "${EXTRA_ARGS[@]}" \ - --output json > "$PRIOR_THREADS_RAW" - - jq -s '.[0] + .[1].value' "$PRIOR_THREADS_ALL" "$PRIOR_THREADS_RAW" \ - > "${PRIOR_THREADS_ALL}.tmp" \ - && mv "${PRIOR_THREADS_ALL}.tmp" "$PRIOR_THREADS_ALL" - - CONTINUATION_TOKEN=$(jq -r '.continuationToken // empty' "$PRIOR_THREADS_RAW") - [ -z "$CONTINUATION_TOKEN" ] && break -done -rm -f "$PRIOR_THREADS_RAW" -``` - -### Parse bot threads +PR_THREADS_ERR="${TMPDIR:-/tmp}/pr_threads.err" +RAW_THREADS_JSON=$(az repos pr thread list --id "$PR_ID" --org "$ORG_URL" --output json 2>"$PR_THREADS_ERR") || { + echo "ERROR: failed to fetch PR threads via Azure CLI. Try \`az devops login\` to re-authenticate." >&2 + cat "$PR_THREADS_ERR" >&2; exit 1; } -```bash -PRIOR_THREADS_FILE="$(mktemp "${TMPDIR:-/tmp}/pr_prior_threads_XXXXXX.json")" -SIGNATURE_PREFIX="🤖 *Reviewed by Claude Code*" - -DETECT_JSON=$( - THREADS_ALL_F="$PRIOR_THREADS_ALL" \ - SIG_P="$SIGNATURE_PREFIX" \ - THREADS_OUT_F="$PRIOR_THREADS_FILE" \ - PLUGIN_R="${CLAUDE_PLUGIN_ROOT}" \ +eval "$( + RAW_T="$RAW_THREADS_JSON" SIG_P="🤖 *Reviewed by Claude Code*" PLUGIN_R="${CLAUDE_PLUGIN_ROOT}" \ node --input-type=module << 'EOJS' -import { readFileSync, writeFileSync } from 'node:fs' -const { detectPriorReview } = await import('file://' + process.env.PLUGIN_R + '/scripts/re-review/detect-prior-review.mjs') -const threads = JSON.parse(readFileSync(process.env.THREADS_ALL_F, 'utf8')) -const r = detectPriorReview({ threads, signaturePrefix: process.env.SIG_P }) -writeFileSync(process.env.THREADS_OUT_F, JSON.stringify(r.priorThreads)) -process.stdout.write(JSON.stringify({ - isRereview: r.isRereview, - summaryThreadId: r.summaryThread != null ? r.summaryThread.threadId : '', - priorIterationId: r.priorIterationId, - count: r.priorThreads.length, -})) +const { detectMode, formatModeEnv } = await import(`file://${process.env.PLUGIN_R}/scripts/mode-detection.mjs`) +const threads = JSON.parse(process.env.RAW_T || '[]') +process.stdout.write(formatModeEnv(detectMode({ threads, signaturePrefix: process.env.SIG_P }))) EOJS -) -rm -f "$PRIOR_THREADS_ALL" -``` - -### Set detection variables +)" -```bash -IS_REREVIEW=$(echo "$DETECT_JSON" | jq -r '.isRereview') -BOT_THREAD_COUNT=$(echo "$DETECT_JSON" | jq -r '.count') -SUMMARY_THREAD_ID=$(echo "$DETECT_JSON" | jq -r '.summaryThreadId // ""') -PRIOR_ITERATION_ID=$(echo "$DETECT_JSON" | jq -r 'if .priorIterationId == null then "null" else (.priorIterationId | tostring) end') - -if [ "$IS_REREVIEW" = "true" ]; then - echo "Detected $BOT_THREAD_COUNT prior Claude Code threads — re-review mode ON" -else - SUMMARY_THREAD_ID="" - PRIOR_ITERATION_ID="null" - echo "Detected 0 prior Claude Code threads — re-review mode OFF" -fi +echo "Mode detected: $MODE" ``` -### Partial-prior-run check - -If `IS_REREVIEW=true`, verify the prior review completed **before** committing to re-review mode. This ensures the entire pipeline — diff range, agent analysis, and comment-posting — uses a consistent mode. +Sets `MODE`, `IS_REREVIEW`, `PRIOR_ITERATION_ID`, `SUMMARY_THREAD_ID`. -If the summary thread is known, check it for a completion marker for `PRIOR_ITERATION_ID`. If none is found, the prior run was partial — reset to first-review mode now so Steps 4–10 all run in the same path. +## Step 5 — ADO Fetcher -Skip this check when `PRIOR_ITERATION_ID` is `"null"` (no iteration suffix was parsed from the prior signature) — assume the prior run completed: +Launch the ADO Fetcher agent and **wait for its result** before anything else (the PRD requires the Fetcher to complete before downstream agents run). -```bash -if [ "$IS_REREVIEW" = "true" ] && [ -n "$SUMMARY_THREAD_ID" ] && [ "$PRIOR_ITERATION_ID" != "null" ]; then - MARKER_FOUND=$( - THREADS_F="$PRIOR_THREADS_FILE" SID="$SUMMARY_THREAD_ID" PID="$PRIOR_ITERATION_ID" \ - node --input-type=module << 'EOJS' -import { readFileSync } from 'node:fs' -const threads = JSON.parse(readFileSync(process.env.THREADS_F, 'utf8')) -const sid = Number(process.env.SID) -const prefix = '✅ Review complete — Iteration ' + process.env.PID -const found = threads.some(t => t.threadId === sid && (t.comments ?? []).some(c => (c.content ?? '').startsWith(prefix))) -console.log(found ? 'true' : 'false') -EOJS - ) || { echo "ERROR: partial-run check script failed — falling back to first-review mode for safety."; MARKER_FOUND="false"; } - - if [ "$MARKER_FOUND" != "true" ] && [ "$MARKER_FOUND" != "false" ]; then - echo "ERROR: unexpected MARKER_FOUND value '${MARKER_FOUND}' — falling back to first-review mode for safety." - MARKER_FOUND="false" - fi - - if [ "$MARKER_FOUND" = "false" ]; then - echo "No completion marker for Iteration $PRIOR_ITERATION_ID — partial prior run. Falling back to first-review mode." - IS_REREVIEW=false - SUMMARY_THREAD_ID="" - PRIOR_ITERATION_ID="null" - fi -fi -``` - ---- - -## Step 3.6 — Fetch PR iterations - -Resolve the latest iteration ID and capture its commit SHA. These values drive the file-list query (Step 4) and the incremental diff baseline (spec 04). - -```bash -ITERATIONS_JSON=$(az devops invoke \ - --area git \ - --resource pullRequestIterations \ - --route-parameters "project=$PROJECT" "repositoryId=$REPO_ID" "pullRequestId=$PR_ID" \ - --org "$ORG_URL" \ - --api-version "7.1" \ - --output json) - -ITERATIONS_VALUE=$(echo "$ITERATIONS_JSON" | jq '.value // []') -ITERATION_COUNT=$(echo "$ITERATIONS_VALUE" | jq 'length') - -if [ "$ITERATION_COUNT" -eq 0 ]; then - echo "Warning: no iterations returned — defaulting to iteration 1" - LATEST_ITERATION_ID=1 - LATEST_COMMIT_ID="" -else - LATEST_ITERATION_ID=$(echo "$ITERATIONS_VALUE" | jq 'max_by(.id) | .id') - LATEST_COMMIT_ID=$(echo "$ITERATIONS_VALUE" | jq -r --argjson id "$LATEST_ITERATION_ID" \ - '.[] | select(.id == $id) | .sourceRefCommit.commitId // ""') -fi -echo "Latest iteration: $LATEST_ITERATION_ID (commit: ${LATEST_COMMIT_ID:-n/a})" -``` - -When `IS_REREVIEW=true`, resolve the prior commit for spec 04's incremental diff: - -```bash -if [ "$IS_REREVIEW" = "true" ]; then - if [ "$PRIOR_ITERATION_ID" != "null" ]; then - # Iteration ID was parsed directly from the "— Iteration N" signature suffix - PRIOR_COMMIT_ID=$(echo "$ITERATIONS_VALUE" | jq -r --argjson id "$PRIOR_ITERATION_ID" \ - '.[] | select(.id == $id) | .sourceRefCommit.commitId // ""') - else - # Timestamp fallback: the prior comment had no "— Iteration N" suffix. - # Find the max publishedDate across all prior bot comments, then pick the - # highest iteration whose createdDate is still ≤ that timestamp. - PRIOR_MAX_DATE=$(jq -r '[.[].comments[].publishedDate // empty] | max // ""' "$PRIOR_THREADS_FILE") - if [ -n "$PRIOR_MAX_DATE" ]; then - PRIOR_ITERATION_ID=$(echo "$ITERATIONS_VALUE" | jq -r --arg d "$PRIOR_MAX_DATE" \ - '[.[] | select(.createdDate <= $d)] | max_by(.id) | .id // "null"') - if [ "$PRIOR_ITERATION_ID" != "null" ]; then - PRIOR_COMMIT_ID=$(echo "$ITERATIONS_VALUE" | jq -r --argjson id "$PRIOR_ITERATION_ID" \ - '.[] | select(.id == $id) | .sourceRefCommit.commitId // ""') - else - PRIOR_COMMIT_ID="" - fi - else - PRIOR_COMMIT_ID="" - fi - fi - echo "Prior iteration: $PRIOR_ITERATION_ID (commit: ${PRIOR_COMMIT_ID:-n/a})" -fi -``` - ---- - -## Step 4 — List changed files - -Use the ADO REST API (note: `az repos pr` has no file-list subcommand): - -```bash -az devops invoke \ - --area git \ - --resource pullRequestIterationChanges \ - --route-parameters "repositoryId={REPO_ID}" "pullRequestId={PR_ID}" "iterationId=$LATEST_ITERATION_ID" \ - --org {ORG_URL} \ - --api-version "7.1" \ - --output json | python3 -c " -import json, sys -data = json.load(sys.stdin) -for c in data.get('changeEntries', []): - path = c.get('item', {}).get('path', '') - ct = c.get('changeType', '') - print(f'{ct}: {path}') -" -``` - ---- - -## Step 4a — Gather Doc Context (work items + Confluence pages) - -```bash -DOC_CONTEXT='' -``` - -Fetch work items linked to the PR and capture the output: - -```bash -WI_JSON=$(az devops invoke \ - --area git \ - --resource pullRequestWorkItems \ - --route-parameters "repositoryId={REPO_ID}" "pullRequestId={PR_ID}" \ - --org {ORG_URL} \ - --api-version "7.1" \ - --output json 2>/dev/null) || WI_JSON="" -``` - -Extract the work item IDs into a comma-separated string: - -```bash -WI_IDS=$(echo "$WI_JSON" | jq -r '[.value[]?.id | tostring] | join(",")' 2>/dev/null) || WI_IDS="" +```txt +Agent( + subagent_type: "pr-review:ado-fetcher", + prompt: "Fetch all ADO data for this PR review. + ORG_URL: {ORG_URL} + PROJECT: {PROJECT} + PR_ID: {PR_ID} + PRIOR_ITERATION_ID: {PRIOR_ITERATION_ID} + PLUGIN_ROOT: {CLAUDE_PLUGIN_ROOT}" +) ``` -If `WI_JSON` is empty, the command failed, or `WI_IDS` is empty (the `value` array -had no entries), leave `DOC_CONTEXT=''` and skip the orchestrator spawn — step 5 (diff) continues independently. +Store the full output as `ADO_FETCHER_RESULT`. If the `ADO_FETCHER_RESULT_START`/`_END` block is absent (Fetcher exited non-zero), determine the abort kind from the output (output contains `az devops login` → `abortKind: 'auth'`; otherwise `abortKind: 'fetcher'`), call `formatTrailer({ mode: 'aborted', abortKind, abortReason: })` from `scripts/ado/notices.mjs`, and stop. Otherwise parse `LATEST_ITERATION_ID`, `REPO_ID`, `CHANGED_FILES`, `RAW_DIFF`, `DIFF_RANGE`, `WORK_ITEM_IDS`, and `NOTICES` from the block. Store `DIFF_RANGE`; the Re-review Coordinator (Step 7) parses it from `ADO_FETCHER_RESULT` to apply the γ-downgrade when `DIFF_RANGE=full`. Set `NOTICES_JSON` to `mergeNotices(NOTICES)` via `scripts/ado/notices.mjs`. -Otherwise, wait for the diff from step 5 to be available (step 4a and step 5 run -concurrently up to this point; only the orchestrator spawn waits for the diff). +## Step 6 — Doc Context Orchestrator + review aspect agents (parallel) -Resolve the plugin path: +Launch both groups concurrently in a **single message**. -```bash -CONFLUENCE_CLIENT_PATH="${CLAUDE_PLUGIN_ROOT}/scripts/confluence-client.mjs" -``` - -Delegate to the Doc Context Orchestrator agent: +**Doc Context Orchestrator** — gathers business context. The returned text is stored as `DOC_CONTEXT` and surfaced in the final user-facing summary; it is **not** prepended to review aspect agent prompts (those run in parallel with the orchestrator and cannot block on its output). ```txt Agent( subagent_type: "pr-review:doc-context-orchestrator", prompt: "Orchestrate Doc Context gathering. - ORG_URL: {ORG_URL} PR_ID: {PR_ID} - Work item IDs: {WI_IDS} - Confluence client path: {CONFLUENCE_CLIENT_PATH} - + Work item IDs: {WORK_ITEM_IDS} + Confluence client path: {CLAUDE_PLUGIN_ROOT}/scripts/confluence-client.mjs Changed files: - {CHANGED_FILES_LIST} - + {CHANGED_FILES} Diff: {RAW_DIFF} - - Return the complete Doc Context markdown block, or an empty string if no - meaningful context could be gathered." + Return the complete Doc Context markdown block, or an empty string." ) ``` -Store the agent's output as `DOC_CONTEXT`. - -Step 4a pre-fetch (work item IDs) runs in parallel with step 5. The orchestrator agent spawn waits for the diff from step 5. Step 8 waits for the orchestrator agent to complete before launching review agents. - ---- - -## Step 5 — Get the diff locally - -Check if the local branch matches the PR source branch: - -```bash -git branch --show-current -``` - -If it does not match, check out the PR branch: - -```bash -az repos pr checkout --id {PR_ID} --org {ORG_URL} -# or: git fetch origin {source-branch} && git checkout {source-branch} -``` - -Create the diff hunks output file (consumed by spec 05 for thread classification): - -```bash -DIFF_HUNKS_FILE="$(mktemp "${TMPDIR:-/tmp}/pr_diff_hunks_XXXXXX.json")" -echo '[]' > "$DIFF_HUNKS_FILE" -``` - -### Diff strategy - -Branch on `IS_REREVIEW` to decide which diff range to use. - -#### Path A — First-time review (`IS_REREVIEW=false`) - -Run the full branch diff: - -```bash -git diff origin/{target-branch}...HEAD --name-only -RAW_DIFF=$(git diff origin/{target-branch}...HEAD) -``` - -Then [parse hunk boundaries](#hunk-boundary-parsing). - -#### Path B — Re-review, no prior commit (`IS_REREVIEW=true`, `PRIOR_COMMIT_ID` empty) - -```bash -echo "Warning: could not resolve prior commit — falling back to full diff." -git diff origin/{target-branch}...HEAD --name-only -RAW_DIFF=$(git diff origin/{target-branch}...HEAD) -``` - -Then [parse hunk boundaries](#hunk-boundary-parsing). - -#### Path B2 — Re-review, no latest commit (`IS_REREVIEW=true`, `LATEST_COMMIT_ID` empty) - -```bash -echo "Warning: could not resolve latest commit — falling back to full diff." -git diff origin/{target-branch}...HEAD --name-only -RAW_DIFF=$(git diff origin/{target-branch}...HEAD) -``` - -Then [parse hunk boundaries](#hunk-boundary-parsing). - -#### Path C — Re-review, no new commits (`IS_REREVIEW=true`, `PRIOR_COMMIT_ID == LATEST_COMMIT_ID`) - -```bash -echo "No new commits since last review." -echo "" -echo "Pending threads from prior review:" -jq -r '.[] | select(.status == "active" or .status == "pending") | - " \(.filePath // "(general)") L\(.start.line // "?")-\(.end.line // "?")"' "$PRIOR_THREADS_FILE" -``` - -**Stop here — do not proceed to Steps 5.5–11.** Clean up temp files and return to the user: - -```bash -rm -f "$PRIOR_THREADS_FILE" "$DIFF_HUNKS_FILE" -``` - -#### Path D — Re-review, new commits (`IS_REREVIEW=true`, `PRIOR_COMMIT_ID != LATEST_COMMIT_ID`) - -Attempt to fetch the prior commit, then diff only the new range: - -```bash -if git fetch origin "$PRIOR_COMMIT_ID" 2>/dev/null; then - git diff "${PRIOR_COMMIT_ID}".."${LATEST_COMMIT_ID}" --name-only - RAW_DIFF=$(git diff "${PRIOR_COMMIT_ID}".."${LATEST_COMMIT_ID}") -else - echo "Warning: prior commit ${PRIOR_COMMIT_ID} unreachable; latest commit ${LATEST_COMMIT_ID} — falling back to full diff." - git diff origin/{target-branch}...HEAD --name-only - RAW_DIFF=$(git diff origin/{target-branch}...HEAD) -fi -``` - -Then [parse hunk boundaries](#hunk-boundary-parsing). - -### Hunk boundary parsing - -After obtaining `$RAW_DIFF` in Paths A, B, B2, or D, parse file paths and line ranges into `DIFF_HUNKS_FILE`: - -```bash -echo "$RAW_DIFF" | python3 -c " -import sys, json, re -hunks = [] -current_file = None -for line in sys.stdin: - m = re.match(r'^diff --git a/.* b/(.*)', line.rstrip()) - if m: - current_file = '/' + m.group(1) - continue - m = re.match(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,(\d+))? @@', line) - if m and current_file: - start = int(m.group(1)) - count = int(m.group(2)) if m.group(2) is not None else 1 - end = start + max(count - 1, 0) - hunks.append({'filePath': current_file, 'startLine': start, 'endLine': end}) -print(json.dumps(hunks)) -" > "$DIFF_HUNKS_FILE" -``` - -If the diff is very large (>500 lines), focus on the most significant changed files rather than trying to pass the entire diff to agents. - ---- - -## Step 5.5 — Classify existing threads - -For each non-summary thread in `PRIOR_THREADS_FILE`, assign exactly one classification using diff hunks from `DIFF_HUNKS_FILE`. This step runs **unconditionally** — it is a no-op when `PRIOR_THREADS_FILE` is empty. - -**Classification rules (evaluated in order):** - -1. **`addressed`** — ADO status is `fixed`, `wontFix`, `closed`, or `byDesign` (string or numeric 2–5), **or** status is `active`/`pending` and the thread's `[start.line, end.line]` range intersects a changed hunk. -2. **`obsolete`** — `filePath` is non-null and does not appear in the diff at all. -3. **`disputed`** — status is `active` and at least one comment does not contain the signature prefix `🤖 *Reviewed by Claude Code*`. -4. **`pending`** — status is `active` and all comments contain the signature prefix (bot-only thread). - -General threads (`filePath = null`, non-summary): rules 1 (intersection) and 2 do not apply; classify as `disputed` or `pending` only. - -```bash -THREADS_FILE="$PRIOR_THREADS_FILE" \ -HUNKS_FILE="$DIFF_HUNKS_FILE" \ -SIG_P="$SIGNATURE_PREFIX" \ -PLUGIN_R="${CLAUDE_PLUGIN_ROOT}" \ -node --input-type=module << 'EOJS' -import { readFileSync, writeFileSync } from 'node:fs' -const { classifyThread } = await import('file://' + process.env.PLUGIN_R + '/scripts/re-review/classify-thread.mjs') -const threads = JSON.parse(readFileSync(process.env.THREADS_FILE, 'utf8')) -const diffHunks = JSON.parse(readFileSync(process.env.HUNKS_FILE, 'utf8')) -const signaturePrefix = process.env.SIG_P -const counts = { addressed: 0, disputed: 0, pending: 0, obsolete: 0 } -for (const t of threads) { - if (t.isSummaryThread) continue - const cls = classifyThread({ thread: t, diffHunks, signaturePrefix }) - t.classification = cls - counts[cls]++ -} -writeFileSync(process.env.THREADS_FILE, JSON.stringify(threads)) -console.log(`Threads: ${counts.addressed} addressed, ${counts.disputed} disputed, ${counts.pending} pending, ${counts.obsolete} obsolete`) -EOJS -``` - ---- - -## Step 6 — Read key changed files - -Use the `Read` tool on the most important changed files (application logic, hooks, contracts, config). Skip auto-generated files: - -- `*/generate-types/output/**` -- `*.Designer.cs`, `*.g.cs`, `*.generated.*` -- `**/serialization/**/*.yml` (Sitecore serialization) -- `**/swagger.md` (generated API contract) - ---- - -## Step 7 — Determine review aspects - -Parse `$ARGUMENTS` for an aspect filter: `code`, `errors`, `tests`, `comments`, `types`, `all` (default). - -Map aspects to agents: - -- `code` → `pr-review-toolkit:code-reviewer` (always run) -- `errors` → `pr-review-toolkit:silent-failure-hunter` (always run) -- `tests` → `pr-review-toolkit:pr-test-analyzer` (if test files changed) -- `comments` → `pr-review-toolkit:comment-analyzer` (if docs/comments added) -- `types` → `pr-review-toolkit:type-design-analyzer` (if new types introduced) - ---- - -## Step 8 — Launch review agents in parallel - -Launch at least `code-reviewer` and `silent-failure-hunter` in a **single message** (parallel). For each agent, provide a self-contained prompt including: - -1. The Doc Context block from step 4a (if `DOC_CONTEXT` is non-empty) -2. The PR title and description -3. The full diff (or the most important sections if large) -4. The content of key changed files (from Step 6) -5. Project conventions from `CLAUDE.md` if present -6. File paths and language context +**Review aspect agents** — apply the [aspect-filter selection](#aspect-filter-selection-used-in-step-6-and-pre-pr-step-d) above. For each selected agent, pass the full diff and changed file contents (the Fetcher captures PR title and description for downstream use only; they are not parsed by the orchestrator). Every prompt **must** end with the [compact finding schema](#compact-finding-schema) block verbatim. Collect returned JSON arrays, deduplicate, sort by severity (`critical` first); assemble `FINDINGS` as `{ severity, filePath, startLine, endLine, title, body }[]`. -Inject `DOC_CONTEXT` as a preamble before the diff content. If `DOC_CONTEXT` is empty, omit the preamble and agents receive the same prompt as today. +## Step 7 — Write-back (branch on mode) -Prompt structure when `DOC_CONTEXT` is non-empty: - -``` -{DOC_CONTEXT} - -## Diff -{diff content} - -## Changed files -{file contents} -``` - -**Example agent invocations (parallel):** +**Re-review only** — first run the coordinator, parse `RE_REVIEW_COORDINATOR_RESULT_START`/`_END`, extract `earlyExit`, `freshFindings`, and `NOTICES` (store as `coordinatorNotices`; default `[]` if absent). If the result block is absent (coordinator exited non-zero), infer `abortKind` from output (contains `az devops login` → `'auth'`; else `'coordinator'`), call `formatTrailer({ mode: 'aborted', abortKind, abortReason: })`, and stop. If `earlyExit: true`, stop; otherwise reassign `FINDINGS_JSON` to `freshFindings`. ```txt Agent( - subagent_type: "pr-review-toolkit:code-reviewer", - prompt: "Review PR '{title}' targeting {target-branch}. {DOC_CONTEXT if non-empty}\n\n## Diff\n[diff content]\n\n## Changed files\n[key file contents]\n\n[CLAUDE.md conventions]" -) - -Agent( - subagent_type: "pr-review-toolkit:silent-failure-hunter", - prompt: "Review PR '{title}' for silent failures. {DOC_CONTEXT if non-empty}\n\n## Diff\n[diff content]\n\n## Changed files\n[key file contents]" + subagent_type: "pr-review:re-review-coordinator", + prompt: "Run the re-review state machine. + ADO_FETCHER_RESULT: + {ADO_FETCHER_RESULT} + RAW_THREADS_JSON: + {RAW_THREADS_JSON} + FINDINGS: {FINDINGS_JSON} + SIGNATURE_PREFIX: 🤖 *Reviewed by Claude Code* + PLUGIN_ROOT: {CLAUDE_PLUGIN_ROOT}" ) ``` ---- - -## Step 9 — Aggregate findings - -Combine results from all agents. For each finding assign: +**Both modes** — invoke ADO Writer. For first-review, `MODE=first-review` and `SUMMARY_THREAD_ID=""`. For re-review, both come from Step 4. -- **Severity**: 🔴 Critical / 🟠 Important / 🟡 Minor -- **File path** — exactly as it appears in the ADO PR (leading `/`, forward slashes, e.g. `/fe/src/pages/_app.tsx`) -- **Line number(s)** — use the **right/new file** line numbers (post-diff) -- **Comment text** — clear, actionable, with a suggested fix where possible - ---- - -## Step 10 — Post inline comments - -Initialize the findings-posted counter and re-review delta counters: - -```bash -FINDINGS_POSTED=0 -NEW_THREAD_COUNT=0 -ADDRESSED_COUNT=0 -DISPUTED_COUNT=0 -PENDING_COUNT=0 -``` - -Branch on `IS_REREVIEW`. - ---- - -### Path A — IS_REREVIEW=false (first-review flow) - -For each finding with a known file and line, post a PR thread: - -```bash -cat > /tmp/pr_thread_N.json << 'ENDJSON' -{ - "comments": [ - { - "commentType": 1, - "content": "{COMMENT_TEXT}\n\n---\n🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}" - } - ], - "status": 1, - "threadContext": { - "filePath": "/{path/to/file}", - "rightFileEnd": { "line": END_LINE, "offset": 1 }, - "rightFileStart": { "line": START_LINE, "offset": 1 } - } -} -ENDJSON - -az devops invoke \ - --area git \ - --resource pullRequestThreads \ - --route-parameters "repositoryId={REPO_ID}" "pullRequestId={PR_ID}" \ - --org {ORG_URL} \ - --http-method POST \ - --in-file /tmp/pr_thread_N.json \ - --api-version "7.1" \ - --output json | python3 -c "import json,sys; d=json.load(sys.stdin); print('Thread', d.get('id'), d.get('status'))" - -FINDINGS_POSTED=$((FINDINGS_POSTED + 1)) -NEW_THREAD_COUNT=$((NEW_THREAD_COUNT + 1)) -``` - -**Rules:** - -- File paths: leading `/`, forward slashes, must match ADO exactly (as listed in Step 4) -- Line numbers: new/right file (post-diff), not original file -- `offset` can always be `1` -- Multi-line findings: set `rightFileStart.line` to first line, `rightFileEnd.line` to last -- If exact line is unknown, omit `threadContext` entirely (becomes a general comment) -- Use a unique temp file name per comment (e.g. `/tmp/pr_thread_1.json`, `/tmp/pr_thread_2.json`) - ---- - -### Path B — IS_REREVIEW=true (re-review reply flow) - -#### Thread matching - -For each finding (`{FINDING_FILE}`, line range `{FINDING_START}`–`{FINDING_END}`), search `PRIOR_THREADS_FILE` for a matching prior thread using filePath equality and line-range overlap with ±3 line drift: - -```bash -MATCH=$( - THREADS_F="$PRIOR_THREADS_FILE" \ - FINDING_F="{FINDING_FILE}" \ - FINDING_S="{FINDING_START}" \ - FINDING_E="{FINDING_END}" \ - PLUGIN_R="${CLAUDE_PLUGIN_ROOT}" \ - node --input-type=module << 'EOJS' -import { readFileSync } from 'node:fs' -const { matchFinding } = await import('file://' + process.env.PLUGIN_R + '/scripts/re-review/match-finding.mjs') -const threads = JSON.parse(readFileSync(process.env.THREADS_F, 'utf8')) -const result = matchFinding({ - finding: { filePath: process.env.FINDING_F, startLine: Number(process.env.FINDING_S), endLine: Number(process.env.FINDING_E) }, - priorThreads: threads, -}) -process.stdout.write(result != null ? JSON.stringify(result) : '') -EOJS +```txt +Agent( + subagent_type: "pr-review:ado-writer", + prompt: "Post all ADO comments for this {MODE} run. + ORG_URL: {ORG_URL} + PROJECT: {PROJECT} + REPO_ID: {REPO_ID} + PR_ID: {PR_ID} + LATEST_ITERATION_ID: {LATEST_ITERATION_ID} + SUMMARY_THREAD_ID: {SUMMARY_THREAD_ID} + MODE: {MODE} + PLUGIN_ROOT: {CLAUDE_PLUGIN_ROOT} + FINDINGS: {FINDINGS_JSON} + NOTICES_JSON: {NOTICES_JSON}" ) - -CLASSIFICATION=$(printf '%s' "$MATCH" | jq -r '.classification // ""' 2>/dev/null || echo "") -THREAD_ID=$(printf '%s' "$MATCH" | jq -r '.threadId // ""' 2>/dev/null || echo "") ``` -- If `MATCH` is empty → **no prior thread**: post a fresh thread via Path A (increment `FINDINGS_POSTED` and `NEW_THREAD_COUNT`). -- If `MATCH` is non-empty → **prior thread found**: dispatch on `CLASSIFICATION` below. +## Step 8 — Parse Writer result + Trailer -#### `obsolete` — skip +Parse the Writer output via `parseAdoWriterResult` from `scripts/ado-writer.mjs`. On `{ ok: false }`, emit `ERROR: Writer did not return a valid result block (). The Summary may or may not have been posted; verify on ADO.` to stderr and print the Trailer aborted line, then stop. Otherwise extract `result.notices` and merge with fetcher and coordinator notices into `NOTICES_JSON` via `mergeNotices([...fetcherNotices, ...coordinatorNotices, ...result.notices])` from `scripts/ado/notices.mjs`; print Trailer via `formatTrailer({ mode, findings, notices: NOTICES_JSON, prUrl })`: reduce `FINDINGS_JSON` to `{ critical, important, minor }` counts; build `prUrl` from `ORG_URL`/`PROJECT`/`PR_ID`. Pre-PR: Step E. -No action. Do not post. Do not increment `FINDINGS_POSTED`. +## Pre-PR mode -#### `pending` — evaluate for new evidence +No PR URL provided — reviewing the local branch diff; no ADO calls are made. Initialize `PRE_PR_NOTICES=[]`. -Increment `PENDING_COUNT` for each matched `pending` thread (whether replied to or skipped): +### Step A — Detect default branch + compute diff -```bash -PENDING_COUNT=$((PENDING_COUNT + 1)) -``` +Run `git remote show origin 2>/dev/null` and parse the `HEAD branch:` line as `REMOTE_HEAD` (empty string if absent); define `branchExists(name)` as exits 0 when `git rev-parse --verify --quiet refs/remotes/origin/$name` succeeds. Via `await import`, call `detectDefaultBranch({ remoteHeadBranch: REMOTE_HEAD, branchExists })` from `scripts/pre-pr/detect-default-branch.mjs`. On `{ branch: null }`: emit a clear stderr message, call `formatTrailer({ mode: 'pre-pr', findings: {}, notices: [] })` from `scripts/ado/notices.mjs`, and stop. If `result.notice` exists, push it to `PRE_PR_NOTICES`. Compute `RAW_DIFF=$(git diff "origin/${result.branch}...HEAD") || { echo "git diff failed"; exit 1; }`. -Read the most recent bot comment from the matched thread (last entry in `matched_thread['comments']` where the content contains `SIGNATURE_PREFIX`). Compare its text against the current finding's comment. +### Step B — Parse changed files -- **No new evidence** (same issue, no additional analysis): skip. Do not post. Do not increment `FINDINGS_POSTED`. -- General `pending` threads with no `filePath` (non-summary): always skip. +Via `await import`, call `buildPrePrContext(RAW_DIFF)` from `scripts/pre-pr.mjs`; merge `context.notices` into `PRE_PR_NOTICES` via `mergeNotices` from `scripts/ado/notices.mjs`; set `FILTERED_FILES` from `context.filteredFiles`. Read the contents of each file in `FILTERED_FILES`, skipping deleted ones. -- **New evidence** (additional analysis, different suggested fix, new code examples not present in the prior comment): reply with only the new content: +### Step C — Resolve aspect filter -```bash -cat > /tmp/pr_reply_N.json << 'ENDJSON' -{ - "content": "{NEW_EVIDENCE_CONTENT}\n\n---\n🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}", - "commentType": 1 -} -ENDJSON - -az devops invoke \ - --area git \ - --resource pullRequestThreadComments \ - --route-parameters "repositoryId={REPO_ID}" "pullRequestId={PR_ID}" "threadId=$THREAD_ID" \ - --org {ORG_URL} \ - --http-method POST \ - --in-file /tmp/pr_reply_N.json \ - --api-version "7.1" \ - --output json | python3 -c "import json,sys; d=json.load(sys.stdin); print('Reply posted, comment', d.get('id'))" - -FINDINGS_POSTED=$((FINDINGS_POSTED + 1)) -``` +Apply the [aspect-filter selection](#aspect-filter-selection-used-in-step-6-and-pre-pr-step-d) defined above. -#### `disputed` — acknowledge the author's point +### Step D — Run review aspect agents -Reply without re-asserting the finding. Briefly acknowledge the author's perspective. Always include the ADO nudge before the signature: +Doc Context is skipped (no work items without a PR). Launch all selected review aspect agents in a **single message**, passing `RAW_DIFF` and changed file contents. Every prompt **must** end with the [compact finding schema](#compact-finding-schema) verbatim; in Pre-PR mode the `body` field reads "exact text to post as the comment" (rendered in the Claude interface, not written back to ADO). -```bash -cat > /tmp/pr_reply_N.json << 'ENDJSON' -{ - "content": "{BRIEF_ACKNOWLEDGEMENT}\n\nIf you consider this resolved, please mark the thread as fixed in Azure DevOps.\n\n---\n🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}", - "commentType": 1 -} -ENDJSON - -az devops invoke \ - --area git \ - --resource pullRequestThreadComments \ - --route-parameters "repositoryId={REPO_ID}" "pullRequestId={PR_ID}" "threadId=$THREAD_ID" \ - --org {ORG_URL} \ - --http-method POST \ - --in-file /tmp/pr_reply_N.json \ - --api-version "7.1" \ - --output json | python3 -c "import json,sys; d=json.load(sys.stdin); print('Reply posted, comment', d.get('id'))" - -FINDINGS_POSTED=$((FINDINGS_POSTED + 1)) -DISPUTED_COUNT=$((DISPUTED_COUNT + 1)) -``` +Collect, dedupe, and sort returned JSON arrays into `FINDINGS` (`critical` first). -#### `addressed` — confirm resolution and mark thread fixed +### Step E — Present findings -Reply to confirm the fix, then PATCH the thread status to `fixed` (`status: 2`). Log 409 and continue: +Print Notices from `PRE_PR_NOTICES` via `formatNoticesAsPrePrPreamble(PRE_PR_NOTICES)` from `scripts/ado/notices.mjs`, then print each finding grouped by severity (`critical`, `important`, `minor`): -```bash -# 1. Post reply -cat > /tmp/pr_reply_N.json << 'ENDJSON' -{ - "content": "Resolved as of Iteration {LATEST_ITERATION_ID} — thanks!\n\n---\n🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}", - "commentType": 1 -} -ENDJSON - -az devops invoke \ - --area git \ - --resource pullRequestThreadComments \ - --route-parameters "repositoryId={REPO_ID}" "pullRequestId={PR_ID}" "threadId=$THREAD_ID" \ - --org {ORG_URL} \ - --http-method POST \ - --in-file /tmp/pr_reply_N.json \ - --api-version "7.1" \ - --output json | python3 -c "import json,sys; d=json.load(sys.stdin); print('Reply posted, comment', d.get('id'))" - -# 2. PATCH thread status to fixed (2) -cat > /tmp/pr_thread_patch_N.json << 'ENDJSON' -{ "status": 2 } -ENDJSON - -az devops invoke \ - --area git \ - --resource pullRequestThreads \ - --route-parameters "repositoryId={REPO_ID}" "pullRequestId={PR_ID}" "threadId=$THREAD_ID" \ - --org {ORG_URL} \ - --http-method PATCH \ - --in-file /tmp/pr_thread_patch_N.json \ - --api-version "7.1" \ - --output json 2>/tmp/pr_patch_err_N.json | \ - python3 -c " -import json, sys -try: - d = json.load(sys.stdin) - print('Thread patched to fixed') -except Exception: - err = open('/tmp/pr_patch_err_N.json').read() - if '409' in err or 'conflict' in err.lower(): - print('409 Conflict — thread resolved concurrently. Continuing.') - else: - print('PATCH warning:', err[:200]) -" - -FINDINGS_POSTED=$((FINDINGS_POSTED + 1)) -ADDRESSED_COUNT=$((ADDRESSED_COUNT + 1)) ``` - ---- - -## Step 11 — Post summary comment - -Branch on `IS_REREVIEW` and the counters set in Step 10. - ---- - -### IS_REREVIEW=false — full summary (unchanged behaviour) - -Post one general thread **without** `threadContext`: - -```bash -cat > /tmp/pr_summary.json << 'ENDJSON' -{ - "comments": [ - { - "commentType": 1, - "content": "## PR Review Summary — {PR_TITLE}\n\n{SUMMARY_CONTENT}\n\n---\n🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}" - } - ], - "status": 1 -} -ENDJSON - -SUMMARY_RESPONSE=$(az devops invoke \ - --area git \ - --resource pullRequestThreads \ - --route-parameters "repositoryId={REPO_ID}" "pullRequestId={PR_ID}" \ - --org {ORG_URL} \ - --http-method POST \ - --in-file /tmp/pr_summary.json \ - --api-version "7.1" \ - --output json) -echo "$SUMMARY_RESPONSE" | python3 -c "import json,sys; d=json.load(sys.stdin); print('Summary thread', d.get('id'), d.get('status'))" -# Always update SUMMARY_THREAD_ID to the newly posted thread so Step 11.5 posts the -# completion marker to the current run's summary thread, not the prior one. -SUMMARY_THREAD_ID=$(echo "$SUMMARY_RESPONSE" | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('id',''))") -``` - -**Summary structure:** - -```markdown -## PR Review Summary — {title} - -### 🔴 Critical (X found) - -- **[file:line]** Issue description - -### 🟠 Important (X found) - -- **[file:line]** Issue description - -### 🟡 Minor / Suggestions - -- Suggestion - -### ✅ What's good - -- Positive observation - ---- - -🤖 _Reviewed by Claude Code_ — Iteration {N} -``` - ---- - -### IS_REREVIEW=true, all counters zero — skip - -If `NEW_THREAD_COUNT=0` AND `ADDRESSED_COUNT=0` AND `DISPUTED_COUNT=0`: - -```bash -echo "Re-review: nothing changed — skipping summary comment." -``` - -Do not post anything. `SUMMARY_THREAD_ID` remains set from Step 3.5 so Step 11.5 can still post the completion marker to the existing summary thread. - ---- - -### IS_REREVIEW=true, at least one counter > 0 — delta reply or fallback - -#### SUMMARY_THREAD_ID set — post delta reply to existing summary thread - -Reply to the existing summary thread via `pullRequestThreadComments`: - -```bash -cat > /tmp/pr_delta.json << 'ENDJSON' -{ - "content": "🤖 *Reviewed by Claude Code* — Re-review delta (Iteration {LATEST_ITERATION_ID})\n\n{NEW_THREAD_COUNT} new findings, {ADDRESSED_COUNT} resolved, {DISPUTED_COUNT} disputed, {PENDING_COUNT} pending.\n\n{BULLET_LIST_OF_NEW_FINDING_TITLES}\n\n---\n🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}", - "commentType": 1 -} -ENDJSON - -az devops invoke \ - --area git \ - --resource pullRequestThreadComments \ - --route-parameters "repositoryId={REPO_ID}" "pullRequestId={PR_ID}" "threadId=$SUMMARY_THREAD_ID" \ - --org {ORG_URL} \ - --http-method POST \ - --in-file /tmp/pr_delta.json \ - --api-version "7.1" \ - --output json | python3 -c "import json,sys; d=json.load(sys.stdin); print('Delta reply posted, comment', d.get('id'))" +[{severity}] {filePath} L{startLine}–{endLine} +{title} +{body} ``` -`{BULLET_LIST_OF_NEW_FINDING_TITLES}` — one bullet per new thread posted in Step 10, format: - -``` -- **[{filePath}:{startLine}]** {one-line finding title} -``` - -Include only threads created in this run (`NEW_THREAD_COUNT` threads). No prose, no section headings. - -`SUMMARY_THREAD_ID` is **not** updated — it already points to the existing summary thread for Step 11.5 to use. - -#### SUMMARY_THREAD_ID empty — full summary fallback - -The prior summary thread was deleted. Fall back to first-review mode: post a full summary as a new general thread (use the IS_REREVIEW=false code above) and update `SUMMARY_THREAD_ID`. - ---- - -## Step 11.5 — Post completion marker - -After Step 11 completes, post one final reply to the summary thread **if `SUMMARY_THREAD_ID` is set**. Skip silently if it is empty (this can happen when prior bot threads exist but no summary thread was detected). This is the last write action of every successful run: - -```bash -if [ -n "$SUMMARY_THREAD_ID" ]; then - cat > /tmp/pr_completion_marker.json << 'ENDJSON' -{ - "content": "✅ Review complete — Iteration {LATEST_ITERATION_ID} ({FINDINGS_POSTED} findings posted)\n\n---\n🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}", - "commentType": 1 -} -ENDJSON - - az devops invoke \ - --area git \ - --resource pullRequestThreadComments \ - --route-parameters "repositoryId={REPO_ID}" "pullRequestId={PR_ID}" "threadId=$SUMMARY_THREAD_ID" \ - --org {ORG_URL} \ - --http-method POST \ - --in-file /tmp/pr_completion_marker.json \ - --api-version "7.1" \ - --output json | python3 -c "import json,sys; d=json.load(sys.stdin); print('Completion marker posted, comment', d.get('id'))" -else - echo "No summary thread — skipping completion marker." -fi -``` - -The absence of this marker for `LATEST_ITERATION_ID` on the next run signals a partial prior run — Step 3.5 detects this and falls back to first-review mode. - ---- - -## Step 12 — Clean up - -```bash -rm -f /tmp/pr_thread_*.json /tmp/pr_reply_*.json /tmp/pr_thread_patch_*.json /tmp/pr_patch_err_*.json /tmp/pr_completion_marker.json /tmp/pr_summary.json /tmp/pr_delta.json -rm -f "$PRIOR_THREADS_FILE" "$DIFF_HUNKS_FILE" -``` - ---- - -## Comment signature - -Every comment — inline or summary — **must** end with this trailer on its own line: - -```txt ---- -🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID} -``` - -Two constants govern signature generation: - -- `SIGNATURE_PREFIX` = `🤖 *Reviewed by Claude Code*` -- `SIGNATURE` = `🤖 *Reviewed by Claude Code* — Iteration {LATEST_ITERATION_ID}` (resolved at post time) - -Never alter the prefix — re-review detection depends on it. - ---- - -## Notes - -- The PR may already be merged — post comments anyway as a review record. -- Use `az repos pr checkout --id {PR_ID} --org {ORG_URL}` if the local branch doesn't match the source branch. -- Always use the latest iteration of the PR (`LATEST_ITERATION_ID`). Re-reviews additionally compute `PRIOR_ITERATION_ID` — see Step 3.5 and Step 3.6. -- If `az devops invoke` returns an error on `threadContext` (e.g. file not found in the diff), retry without `threadContext` to post as a general comment. -- The detection prefix is `🤖 *Reviewed by Claude Code*` (substring match). The full emitted form is `🤖 *Reviewed by Claude Code* — Iteration N`. Never alter the prefix — re-review detection depends on it. +End with one Trailer line via `formatTrailer({ mode: 'pre-pr', findings, notices: PRE_PR_NOTICES })` from `scripts/ado/notices.mjs` (reduce `FINDINGS` to `{ critical, important, minor }` counts). The line reads `✅ Pre-PR review complete: findings (...) · warning notices`. diff --git a/apps/claude-code/pr-review/docs/adr/0004-incremental-diff-baseline.md b/apps/claude-code/pr-review/docs/adr/0004-incremental-diff-baseline.md index 4503f64..256e17e 100644 --- a/apps/claude-code/pr-review/docs/adr/0004-incremental-diff-baseline.md +++ b/apps/claude-code/pr-review/docs/adr/0004-incremental-diff-baseline.md @@ -14,3 +14,9 @@ When a prior review is detected, the baseline commit is read from the prior revi - Re-review cost and noise are proportional to the delta since the last review. - If the PR was rebased and the baseline commit is no longer in the history, the plugin falls back to a full diff. + +## Degraded baseline + +When `DIFF_RANGE=full` because the incremental fallback fired, the ADO Fetcher emits a `warning`-severity Notice (`kind: diff-range`, message: "Incremental diff unavailable — Coordinator will classify against the full PR diff with conservative downgrades."). + +When the Coordinator receives `DIFF_RANGE=full`, it MAY classify against the full diff but MUST apply the γ-downgrade rule: outputs of `addressed` and `obsolete` are remapped to `pending`, since those verdicts depend on diff-position evidence that is unreliable when the diff range is wider than the delta since the last review. `disputed` is unaffected (its derivation is reviewer-reply-based, not diff-position-based). The γ-downgrade rule is implemented in PRD B issue B3. diff --git a/apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md b/apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md index d59a442..e6d77e6 100644 --- a/apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md +++ b/apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md @@ -9,7 +9,7 @@ Re-reviews that open duplicate comments for already-noted issues create noise an ## Decision - For **pending** and **disputed** threads: post a reply noting whether the issue persists or has been escalated. -- For **addressed** threads: post a reply confirming the fix and resolve the thread. +- For **addressed** threads: resolve the thread silently via PATCH to `fixed` (status 2) — no reply comment is posted. - Never open a new thread for an issue that already has an active thread. ## Consequences @@ -17,3 +17,5 @@ Re-reviews that open duplicate comments for already-noted issues create noise an - PR comment threads remain linear and readable. - Addressed threads are automatically resolved, reducing the reviewer's manual work. - Incorrectly classified threads (e.g. false "addressed") will be auto-resolved; the reviewer may need to reopen them. + +**Revised:** 2026-05-14 — Removed the reply comment for `addressed` threads. Reason: the "Resolved — thanks!" reply generated an ADO notification for every thread participant; developers often self-resolve threads before the bot runs, causing the bot to comment on already-closed threads (notification spam). The thread status PATCH to `fixed` remains unchanged. diff --git a/apps/claude-code/pr-review/docs/adr/0013-orchestrator-split-for-review-pr.md b/apps/claude-code/pr-review/docs/adr/0013-orchestrator-split-for-review-pr.md new file mode 100644 index 0000000..b50287e --- /dev/null +++ b/apps/claude-code/pr-review/docs/adr/0013-orchestrator-split-for-review-pr.md @@ -0,0 +1,55 @@ +# 0013. Split review-pr.md into a thin orchestrator and focused agents + +**Status:** Accepted (2026-05) + +## Context + +`review-pr.md` has grown to ~1000 lines as the re-review state machine, ADO write-back logic, and doc-context orchestration were added. This creates two compounding problems: + +1. **Token budget pressure.** The full command file is loaded into the parent context on every invocation. Combined with tool-call results flowing back from parallel review agents, average PR reviews reach +100 K tokens — unsustainable as the command grows further. + +2. **Growth risk.** A pre-PR mode (review without opening a PR) is an emerging user request. Adding a third operating mode to the current monolith would push the file toward ~1300 lines and worsen the token problem. + +The root cause is architectural: `review-pr.md` conflates orchestration (which mode are we in? what agents to launch?) with platform integration (fetch ADO threads, post inline comments) and re-review state management (classify threads, match findings, reply). + +The right model for `review-pr.md` is a thin coordinator: prerequisites block, mode detection block, and one delegation block per mode. The three focused agents own all data-fetch and write-back ADO operations. The one allowed inline ADO call is the mode-detection `az repos pr thread list` in the mode detection block — an orchestration concern, not a data-fetch or write-back operation; no `az devops invoke` commands remain in the orchestrator. + +## Decision + +Refactor `review-pr.md` into a **thin orchestrator** of ~200 lines that: + +1. Validates prerequisites and parses the PR URL (or detects absence of URL for pre-PR mode). +2. Detects the operating mode: **pre-PR**, **first-review**, or **re-review**. +3. Delegates immediately to a focused agent per mode. + +Three focused agents live in the plugin's `.agents/` directory (not in `pr-review-toolkit`, which is a read-only dependency): + +- **`pr-review:ado-fetcher`** — fetches PR metadata, iterations, changed files, and raw diff from ADO. Used by first-review and re-review modes. +- **`pr-review:re-review-coordinator`** — owns prior thread detection, partial-run check, thread classification, finding matching, and reply posting to classified threads. Used only in re-review mode. +- **`pr-review:ado-writer`** — owns the ADO write-back pipeline: posting inline threads, patching thread status, and posting the summary comment. Used by first-review and re-review modes. + +Pre-PR mode skips the ADO fetcher and writer entirely; it goes straight from the orchestrator to the `pr-review-toolkit` review agents and presents findings locally. + +**Compact sub-agent output.** Review agents (`pr-review-toolkit:code-reviewer`, etc.) are asked via the review-agent launch step in `review-pr.md` to return structured findings (`severity`, `filePath`, `startLine`, `endLine`, `title`, `body`) rather than prose with embedded code quotes. This keeps what flows back into the parent context small. This guidance stays in `review-pr.md`'s prompt, not in the toolkit agent definitions, because `pr-review-toolkit` is not owned by this plugin. + +**Re-review logic ownership.** The four Node.js modules in `scripts/re-review/` are already algorithmically platform-agnostic; only their input shapes are ADO-specific. When a second write-back platform (GitHub) is built, normalising to a canonical thread shape and lifting these modules to `pr-review-toolkit` is the correct move. That work is deferred until a second platform consumer exists. + +**Alternatives considered:** + +_Keep the monolith_ — continue adding to `review-pr.md`. Rejected because the token budget problem compounds with each new feature, and the pre-PR mode would require significant branching inside an already large file. + +_Lift re-review modules to pr-review-toolkit now_ — move the four Node.js modules to the toolkit as shared library code. Rejected because there is no second platform consumer yet; any canonical thread schema designed now would be speculative and likely wrong. + +_Option B: re-review coordinator as a procedural agent_ — keep re-review logic in a dedicated agent that reasons about edge cases rather than pure procedural code. Accepted in part: the `pr-review:re-review-coordinator` agent replaces the procedural inline steps, but the four Node.js modules remain as pure functions called from it. + +## Consequences + +- The parent context for a first-review or pre-PR run no longer loads re-review logic. +- Each focused agent only receives the context it needs; intermediate state (prior threads JSON, classification results, diff hunks) does not accumulate in the orchestrator context. +- Adding a fourth operating mode (e.g. post-merge audit) requires only a new agent plus a new branch in the ~200-line orchestrator. +- The three new agents must be documented in the plugin's `CONTEXT.md` under the appropriate relationship entries. + +**See also:** + +- `docs/issues/pr-review-orchestrator-split/PRD.md` for the feature PRD and implementation issues that deliver this split +- ADR 0008 (soft dependency on `pr-review-toolkit`) diff --git a/apps/claude-code/pr-review/docs/adr/0014-notice-tier-doctrine-and-failure-classification-helpers.md b/apps/claude-code/pr-review/docs/adr/0014-notice-tier-doctrine-and-failure-classification-helpers.md new file mode 100644 index 0000000..0fbc725 --- /dev/null +++ b/apps/claude-code/pr-review/docs/adr/0014-notice-tier-doctrine-and-failure-classification-helpers.md @@ -0,0 +1,77 @@ +# 0014. Notice Tier doctrine and failure-classification helpers + +**Status:** Accepted (2026-05) + +## Context + +ADO Fetcher reads and ADO Writer writes can fail in many ways — auth revoked, transient 5xx, a fetch returning an empty array that is sometimes a legitimate domain state and sometimes a degradation. Before this ADR, every failure was treated independently: + +- `parseIterations([])` silently defaulted to `{ latestIterationId: 1 }`, violating the CLAUDE.md "iteration 1 is never used" rule. +- `parseWorkItemIds(null)` returned `[]`, indistinguishable from a legitimate "no work items linked" PR. +- The diff-range fallback from incremental to full was silent — the Coordinator could not tell which range it was classifying threads against. +- The H1 hardening from PR #29 made the inline-post call site log-and-continue on errors, but the rule was call-site-local; the rest of the Writer's POSTs still had ad-hoc handling. + +The result was a Review that could be posted on degraded inputs (corrupting re-review detection forever afterwards on iteration drift) without the reviewer or the invoker seeing any signal. + +ADR 0013 split `review-pr.md` into a thin orchestrator and three focused agents but kept failure-handling logic inside the agent prompts as inline bash-and-Node heredocs. Those heredocs are hard to test in isolation and hard to keep consistent: the canonical HTTP-tier mapping (401 means abort everywhere; 5xx means degrade everywhere) cannot be enforced when each call site re-implements it. + +## Decision + +Adopt a **four-state Notice Tier doctrine** for every orchestration-agent operation: + +- **OK** — operation completed with a non-empty result. No Notice emitted. +- **EMPTY-BY-DESIGN** — operation completed with an empty result that is a legitimate domain state. Silent for most operations; the Doc-Context family is the one carve-out (when `WORK_ITEM_IDS=[]` the orchestrator emits an `info` Notice, because the reviewer cannot tell from the PR alone whether the bot considered linked business context). +- **DEGRADED** — operation failed but the Review can still complete with reduced coverage. Emits a `warning` Notice; the Review still posts. +- **ABORTED** — operation failed and continuing would corrupt cross-run state (Bot Signature drift, Summary thread desync, mode misdetection). The run stops before the Review Summary is composed; the failure goes to stderr plus the end-of-run Trailer. + +There is **no fifth ASK tier**. AFK invocations never block on user input. Failure modes that tempt an ASK tier are reclassified as ABORTED. + +Each orchestration agent emits a `NOTICES` JSON array as a new field in its structured result block. The orchestrator parses each agent's array, merges them via `mergeNotices` (deduplicating by `kind`), and passes the merged array to the ADO Writer alongside `FINDINGS`. The ADO Writer renders a `## Notices` block above the severity-grouped findings in the Review Summary content. Each item carries its own per-severity emoji prefix (`ℹ️` for `info`, `⚠` for `warning`); the heading stays bare so a mixed list does not require the heading emoji to misrepresent one tier. + +Notice shape: + +```js +{ severity: 'info' | 'warning', kind: NoticeKind, message: string } +``` + +`kind` is a small enum: `doc-context`, `diff-range`, `work-items`, `iterations`, `default-branch`, `partial-run-check`, `thread-match`, `thread-classify`, `inline-post`, `summary-post`, `patch-to-fixed`, `diff-parse`, `delta-reply`, `completion-marker`. Free-form strings and severity-coded numerics were rejected — the enum lets the merge step dedup by `kind` without parsing message text. Each `kind` value has exactly one source agent — this is the invariant that makes first-wins dedup safe. + +A mandatory single-line **Trailer** is printed to the Claude interface at end-of-run, regardless of mode or outcome: + +- ADO modes: `✅ Review posted: findings ( critical, important) · warning notices · info notices → ` +- Pre-PR mode: `✅ Pre-PR review complete: findings ( critical, important) · warning notices` +- Aborted: `❌ Review aborted: ` + +The same `NOTICES` array drives both the Summary rendering and the Trailer counts. Designed for AFK skim: the invoker sees outcome status without opening the PR. + +**Helper layer refinement of ADR 0013.** Failure classification moves from inline bash-and-Node heredocs to pure JS helpers under `scripts/ado/`. ADR 0013 keeps orchestration in agent prompts; this ADR refines that — orchestration still lives in agent prompts, but **failure classification** lives in helpers that the prompts call via `await import(...)`. New helper modules: + +- `scripts/ado/notices.mjs` — pure helpers `createNotice`, `mergeNotices`, `formatNoticesAsSummaryBlock`, `formatNoticesAsPrePrPreamble`, `formatTrailer`. +- `scripts/ado/classify-http-error.mjs` — canonical HTTP-tier mapping (added in PRD A slice A2; covered by ADR 0015). +- `scripts/ado/fetch-iterations.mjs`, `scripts/ado/fetch-work-items.mjs` — discriminated-union refactors of the existing parsers (added in A2 and A3, replacing `parseIterations` / `parseWorkItemIds`). + +Helpers come with `node:test` unit tests in the prior-art style of `tests/parse-diff-hunks.test.mjs`. + +## Consequences + +- The Bot Signature is never signed with an empty Iteration ID again (the discriminated-union refactor of `parseIterations` will reclassify the empty-`value` case as ABORTED in slice A3). +- A failure that today is silent (work-item fetch failed, diff-range fallback fired) becomes a Notice in the Summary and a count in the Trailer. +- A consequential failure (401/403 on iterations) becomes a fast abort with a clear stderr message instead of a corrupted Review. +- Agent prompts shrink. The bash side around a failure-classification call becomes uniform `if [ "$RESULT_OK" != "true" ]; then ...`. +- The doctrine and helper layer are reused by PRD B (the consumer side): the ADO Writer call sites, the Re-review Coordinator's PATCH-to-fixed and `match-finding` flows, and the Pre-PR `parseChangedFilesFromDiff` / default-branch-fallback all route through the same helpers. +- Adding a new failure mode is a `createNotice` call plus a `kind` enum entry, not a new ad-hoc bash branch. + +**Alternatives considered:** + +_Three tiers (OK / DEGRADED / ABORTED)._ Rejected because the legitimate empty cases (no work items, no prior threads, no findings) need a distinct classification — they are not failures. Conflating them with DEGRADED would either silence them entirely (losing the Doc-Context info signal) or fire false-positive Notices on every clean PR. + +_Five tiers including ASK._ Rejected because the plugin's deployment model is AFK — there is no user to ask. Every failure must be decidable from data the agent already has. ASK-flavoured failures are reclassified as ABORTED. + +_Free-form Notice strings rather than `{ severity, kind, message }`._ Rejected because dedup across agents (Fetcher and Doc-Context Orchestrator both noticing a Confluence outage) requires a stable key. + +**See also:** + +- ADR 0013 — orchestrator split for `review-pr.md` (this ADR refines its testing posture for failure classification). +- ADR 0015 — canonical HTTP-tier mapping (the concrete mapping consumed by the helper layer). +- ADR 0004 — incremental diff baseline (amended in slice A4 with the γ-downgrade rule consumed by PRD B). +- `docs/issues/pr-review-ado-fetcher-reliability/PRD.md` for the feature PRD and the slices that deliver the doctrine. diff --git a/apps/claude-code/pr-review/docs/adr/0015-canonical-http-tier-mapping.md b/apps/claude-code/pr-review/docs/adr/0015-canonical-http-tier-mapping.md new file mode 100644 index 0000000..282ab67 --- /dev/null +++ b/apps/claude-code/pr-review/docs/adr/0015-canonical-http-tier-mapping.md @@ -0,0 +1,80 @@ +# ADR 0015 — Canonical HTTP-Tier Mapping + +**Status:** Accepted +**Date:** 2026-05-13 +**Deciders:** Oriol Torrent Florensa +**Context:** ADR 0014 (failure-classification helper layer) + +--- + +## Context + +ADR 0014 introduced the four-tier Notice doctrine (OK / EMPTY-BY-DESIGN / DEGRADED / ABORTED) and moved +failure classification into pure JS helpers under `scripts/ado/`. The doctrine describes the four tiers; +this ADR records the exact HTTP status → tier mapping that every helper and call site must apply +consistently so that `401` means the same thing everywhere and no future contributor invents a divergent +mapping. + +--- + +## Decision + +### Canonical mapping table + +| HTTP outcome | Tier | Notes | +| --------------------- | -------- | --------------------------------------------------------------- | +| 200 / 201 | OK | Normal success. No Notice. | +| 404 | OK | Domain "the thing is already gone." Treat as success. | +| 409 | OK | Domain "state already changed." Treat as success. | +| 401 | ABORTED | Token expired or revoked. All subsequent writes will also fail. | +| 403 | ABORTED | Permission revoked. Same abort rule applies. | +| 5xx | DEGRADED | Transient backend failure. Emit Notice; continue if possible. | +| Other 4xx (400 / 422) | DEGRADED | Malformed request — likely a plugin bug. Emit Notice; continue. | +| Network error | DEGRADED | Treat identically to 5xx transient. | + +### 401 / 403 abort rule + +When a 401 or 403 response is received on any ADO operation: + +- **Read operations** (Fetcher): if the response is on a critical path (iterations), abort the run with a + clear stderr message naming `az devops login` as the remedy. If non-critical (work items), emit a + DEGRADED Notice and continue. +- **Write operations** (Writer, Coordinator): abort the writer/coordinator immediately. Subsequent writes + would all fail with the same auth error; aborting avoids partial writes and preserves the state needed + for re-review detection. + +### No retries in v1 + +Retries are not implemented. Reasons: + +1. Retries add latency that is already painful in AFK runs. +2. Retries introduce a new failure mode (retry storm) that the Notice surface does not yet describe. +3. The DEGRADED Notice produced without retries is accurate information: the operation failed once. + A retry that eventually succeeds would suppress a Notice the user might want to see. + +Re-evaluate if 5xx Notices prove painful in practice; retries can be added behind the same Notice +surface without changing the doctrine. + +### Implementation + +The canonical mapping is implemented in `scripts/ado/classify-http-error.mjs`: + +```js +classifyHttpError({ status, body, exitCode }) +// → { tier: 'ok' | 'degraded' | 'aborted', kind: string, message: string } +``` + +Every ADO call site that needs tier classification calls this helper. Per-call-site helpers +(`fetch-work-items.mjs`, `fetch-iterations.mjs`, `parse-write-response.mjs`) compose it with +their own response-parsing logic. + +--- + +## Consequences + +- Every HTTP failure in the plugin is classified by one function. Adding a new status code mapping + requires editing one file; the change propagates to all consumers automatically. +- 404 and 409 are OK — callers that previously had explicit 404/409 catch blocks can remove them. +- ABORTED on 401/403 is non-negotiable: a caller cannot downgrade to DEGRADED. +- Network errors (process exits with non-zero exit code and no HTTP status) are DEGRADED, not ABORTED, + because network errors are transient by nature. diff --git a/apps/claude-code/pr-review/docs/adr/README.md b/apps/claude-code/pr-review/docs/adr/README.md index 5c2fbe7..75cb0c9 100644 --- a/apps/claude-code/pr-review/docs/adr/README.md +++ b/apps/claude-code/pr-review/docs/adr/README.md @@ -18,3 +18,6 @@ See the root `docs/adr/README.md` for format and numbering conventions. | 0009 | Re-review summary delta is posted as a reply to the existing summary thread | Accepted | | 0010 | Inline Confluence client | Accepted | | 0011 | Additive parallel paths for doc-context extensibility | Accepted | +| 0012 | Plain-text Doc-Context agent return | Accepted | +| 0013 | Orchestrator split for review-pr | Accepted | +| 0014 | Notice Tier doctrine and failure-classification helpers | Accepted | diff --git a/apps/claude-code/pr-review/package.json b/apps/claude-code/pr-review/package.json index 05b6354..58e45e1 100644 --- a/apps/claude-code/pr-review/package.json +++ b/apps/claude-code/pr-review/package.json @@ -1,16 +1,16 @@ { "name": "pr-review", - "version": "0.9.1", + "version": "1.2.10", "private": true, "license": "LGPL-3.0-or-later", "type": "module", "packageManager": "pnpm@10.33.0", "engines": { - "node": ">=24", + "node": ">=22", "pnpm": ">=10" }, "scripts": { - "test": "node --test tests/parse-signature.test.mjs tests/classify-thread.test.mjs tests/match-finding.test.mjs tests/detect-prior-review.test.mjs tests/confluence-client.test.mjs", + "test": "node --test tests/parse-signature.test.mjs tests/classify-thread.test.mjs tests/match-finding.test.mjs tests/detect-prior-review.test.mjs tests/confluence-client.test.mjs tests/ado-fetcher.test.mjs tests/ado-writer.test.mjs tests/pre-pr.test.mjs tests/parse-diff-hunks.test.mjs tests/mode-detection.test.mjs tests/notices.test.mjs tests/classify-http-error.test.mjs tests/fetch-work-items.test.mjs tests/fetch-iterations.test.mjs tests/parse-write-response.test.mjs tests/detect-default-branch.test.mjs", "bump": "unic-bump", "sync-version": "unic-sync-version", "tag": "unic-tag", diff --git a/apps/claude-code/pr-review/scripts/ado-writer.mjs b/apps/claude-code/pr-review/scripts/ado-writer.mjs new file mode 100644 index 0000000..d2bece7 --- /dev/null +++ b/apps/claude-code/pr-review/scripts/ado-writer.mjs @@ -0,0 +1,46 @@ +// @ts-check + +/** + * @typedef {{ severity: string, kind: string, message: string }} Notice + * @typedef {{ ok: true, summaryThreadId: number | null, findingsPosted: number, notices: Notice[] }} AdoWriterResultOk + * @typedef {{ ok: false, reason: 'missing-block' | 'malformed', message?: string }} AdoWriterResultErr + * @typedef {AdoWriterResultOk | AdoWriterResultErr} AdoWriterResult + */ + +/** + * Parses the ADO Writer agent's output block into a discriminated-union result. + * Returns { ok: false, reason: 'missing-block' } when the result block is absent. + * Returns { ok: false, reason: 'malformed' } when the block is present but FINDINGS_POSTED is missing. + * + * @param {string} output + * @returns {AdoWriterResult} + */ +export function parseAdoWriterResult(output) { + const blockMatch = output.match(/ADO_WRITER_RESULT_START([\s\S]*?)ADO_WRITER_RESULT_END/) + if (!blockMatch) { + return { ok: false, reason: 'missing-block' } + } + + const block = blockMatch[1] + + const threadIdMatch = block.match(/SUMMARY_THREAD_ID:\s*(\d+)/) + const summaryThreadId = threadIdMatch ? Number(threadIdMatch[1]) : null + + const findingsMatch = block.match(/FINDINGS_POSTED:\s*(\d+)/) + if (!findingsMatch) { + return { ok: false, reason: 'malformed' } + } + const findingsPosted = Number(findingsMatch[1]) + + const noticesMatch = block.match(/NOTICES:\s*([\s\S]+?)(?=\n[A-Z_]|\n*$)/) + let notices = /** @type {Notice[]} */ ([]) + if (noticesMatch) { + try { + notices = JSON.parse(noticesMatch[1].trim()) + } catch { + return { ok: false, reason: 'malformed', message: 'Failed to parse NOTICES JSON from ADO Writer output' } + } + } + + return { ok: true, summaryThreadId, findingsPosted, notices } +} diff --git a/apps/claude-code/pr-review/scripts/ado/classify-http-error.mjs b/apps/claude-code/pr-review/scripts/ado/classify-http-error.mjs new file mode 100644 index 0000000..e49ce0b --- /dev/null +++ b/apps/claude-code/pr-review/scripts/ado/classify-http-error.mjs @@ -0,0 +1,49 @@ +// @ts-check + +const OK_STATUSES = new Set([200, 201, 404, 409]) +const ABORTED_STATUSES = new Set([401, 403]) + +/** + * Maps an HTTP outcome to a Notice tier. + * + * @param {{ status?: number, body?: string, exitCode?: number }} input + * @returns {{ tier: 'ok' | 'degraded' | 'aborted', kind: string, message: string }} + */ +export function classifyHttpError({ status = 0, body = '', exitCode = 0 } = {}) { + // Network/process error: no usable HTTP status, non-zero exit + if (!status && exitCode !== 0) { + const detail = body ? `: ${body.slice(0, 200)}` : '' + return { tier: 'degraded', kind: 'network', message: `Network error (exit ${exitCode})${detail}` } + } + + if (OK_STATUSES.has(status)) { + return { tier: 'ok', kind: 'ok', message: '' } + } + + if (ABORTED_STATUSES.has(status)) { + const detail = body ? ` — ${body.slice(0, 200)}` : '' + return { + tier: 'aborted', + kind: 'auth', + message: `HTTP ${status}: authentication/authorization failure${detail}. Try \`az devops login\` to re-authenticate.`, + } + } + + if (status >= 500 && status < 600) { + const detail = body ? ` — ${body.slice(0, 200)}` : '' + return { tier: 'degraded', kind: 'transient', message: `HTTP ${status}: server error${detail}` } + } + + if (status >= 400 && status < 500) { + const detail = body ? ` — ${body.slice(0, 200)}` : '' + return { tier: 'degraded', kind: 'malformed-request', message: `HTTP ${status}: request error${detail}` } + } + + // Non-zero exit with a status we don't recognise, or no status + zero exit (treat as ok) + if (exitCode !== 0) { + const detail = body ? `: ${body.slice(0, 200)}` : '' + return { tier: 'degraded', kind: 'network', message: `Process exited with code ${exitCode}${detail}` } + } + + return { tier: 'ok', kind: 'ok', message: '' } +} diff --git a/apps/claude-code/pr-review/scripts/ado/fetch-iterations.mjs b/apps/claude-code/pr-review/scripts/ado/fetch-iterations.mjs new file mode 100644 index 0000000..96de07f --- /dev/null +++ b/apps/claude-code/pr-review/scripts/ado/fetch-iterations.mjs @@ -0,0 +1,91 @@ +// @ts-check + +import { classifyHttpError } from './classify-http-error.mjs' + +/** + * @typedef {{ id: number, sourceRefCommit?: { commitId?: string } | null }} ADOIteration + */ + +/** + * Parses the raw response from the ADO pullRequestIterations endpoint. + * Returns a discriminated union so the Fetcher prompt can branch on ok/not-ok + * without falling back to the invalid `iterationId=1` default. + * + * @param {{ responseText: string, exitCode?: number }} input + * @returns {{ ok: true, latestIterationId: number, latestCommitSha: string } + * | { ok: false, reason: 'empty-iterations' | 'auth' | 'transient' | 'malformed', message: string }} + */ +export function fetchIterations({ responseText, exitCode = 0 }) { + // Try to extract an HTTP status code from the response body (ADO embeds statusCode in error JSON) + let status = 0 + /** @type {any} */ + let parsed = null + + if (responseText?.trim()) { + try { + parsed = JSON.parse(responseText) + status = typeof parsed?.statusCode === 'number' ? parsed.statusCode : 0 + } catch { + // parse failed — handled below + } + } + + // Route HTTP / network failures through the canonical tier mapper + if (exitCode !== 0 || status >= 400) { + const classification = classifyHttpError({ status, body: responseText, exitCode }) + if (classification.tier !== 'ok') { + const reason = + classification.tier === 'aborted' + ? 'auth' + : classification.kind === 'malformed-request' + ? 'malformed' + : 'transient' + return { ok: false, reason, message: classification.message } + } + } + + // No response body at all (and exitCode was 0, so not caught above) + if (!responseText || !responseText.trim()) { + return { ok: false, reason: 'malformed', message: 'Iterations fetch returned an empty response' } + } + + // JSON parse failed + if (parsed === null) { + return { + ok: false, + reason: 'malformed', + message: `Iterations response was not valid JSON: ${responseText.slice(0, 100)}`, + } + } + + // Missing value array + if (!Array.isArray(parsed?.value)) { + return { ok: false, reason: 'malformed', message: 'Iterations response missing `value` array' } + } + + // Empty value array → ABORTED (cannot sign a review without a valid iteration ID) + if (parsed.value.length === 0) { + return { + ok: false, + reason: 'empty-iterations', + message: 'Iterations endpoint returned empty value array. Cannot sign Review with a valid Iteration ID.', + } + } + + // Find the latest iteration by id; guard against null elements that ADO may return + const iterations = /** @type {ADOIteration[]} */ (parsed.value.filter((it) => it != null && typeof it === 'object')) + if (iterations.length === 0) { + return { + ok: false, + reason: 'empty-iterations', + message: 'Iterations endpoint returned empty value array. Cannot sign Review with a valid Iteration ID.', + } + } + const latest = iterations.reduce((max, it) => (it.id > max.id ? it : max), iterations[0]) + return { + ok: true, + latestIterationId: latest.id, + // '' when the iteration has no sourceRefCommit (pre-commit PR or detached HEAD) + latestCommitSha: latest.sourceRefCommit?.commitId ?? '', + } +} diff --git a/apps/claude-code/pr-review/scripts/ado/fetch-work-items.mjs b/apps/claude-code/pr-review/scripts/ado/fetch-work-items.mjs new file mode 100644 index 0000000..7cb9f18 --- /dev/null +++ b/apps/claude-code/pr-review/scripts/ado/fetch-work-items.mjs @@ -0,0 +1,68 @@ +// @ts-check + +import { classifyHttpError } from './classify-http-error.mjs' + +/** + * Parses the raw response from the ADO pullRequestWorkItems endpoint. + * Returns a discriminated union so callers can branch on ok/not-ok without + * conflating EMPTY-BY-DESIGN (no items linked) with a fetch failure. + * + * @param {{ responseText: string, exitCode?: number }} input + * @returns {{ ok: true, ids: number[] } | { ok: false, reason: 'auth' | 'transient' | 'malformed' | 'empty-response', message: string }} + */ +export function fetchWorkItems({ responseText, exitCode = 0 }) { + // Try to extract an HTTP status code from the response body (ADO embeds statusCode in error JSON) + let status = 0 + /** @type {any} */ + let parsed = null + + if (responseText?.trim()) { + try { + parsed = JSON.parse(responseText) + status = typeof parsed?.statusCode === 'number' ? parsed.statusCode : 0 + } catch { + // parse failed — handled below + } + } + + // Route HTTP / network failures through the canonical tier mapper + if (exitCode !== 0 || status >= 400) { + const classification = classifyHttpError({ status, body: responseText, exitCode }) + if (classification.tier !== 'ok') { + let reason + if (classification.tier === 'aborted') { + reason = /** @type {const} */ ('auth') + } else if (classification.kind === 'malformed-request') { + reason = /** @type {const} */ ('malformed') + } else { + reason = /** @type {const} */ ('transient') + } + return { ok: false, reason, message: classification.message } + } + } + + if (!responseText || !responseText.trim()) { + return { ok: false, reason: 'empty-response', message: 'Work-item fetch returned an empty response' } + } + + // JSON parse failed + if (parsed === null) { + return { + ok: false, + reason: 'malformed', + message: `Work-item response was not valid JSON: ${responseText.slice(0, 100)}`, + } + } + + if (!Array.isArray(parsed?.value)) { + return { ok: false, reason: 'malformed', message: 'Work-item response missing `value` array' } + } + + const ids = parsed.value + .filter( + (/** @type {unknown} */ wi) => + wi != null && typeof wi === 'object' && typeof (/** @type {any} */ (wi).id) === 'number' + ) + .map((/** @type {{ id: number }} */ wi) => wi.id) + return { ok: true, ids } +} diff --git a/apps/claude-code/pr-review/scripts/ado/notices.mjs b/apps/claude-code/pr-review/scripts/ado/notices.mjs new file mode 100644 index 0000000..3a533e0 --- /dev/null +++ b/apps/claude-code/pr-review/scripts/ado/notices.mjs @@ -0,0 +1,120 @@ +// @ts-check + +/** + * @typedef {'info' | 'warning'} NoticeSeverity + * @typedef {'doc-context' | 'diff-range' | 'work-items' | 'iterations' | 'default-branch' | 'partial-run-check' | 'thread-match' | 'thread-classify' | 'inline-post' | 'summary-post' | 'patch-to-fixed' | 'diff-parse' | 'delta-reply' | 'completion-marker'} NoticeKind + * @typedef {{ severity: NoticeSeverity, kind: NoticeKind, message: string }} Notice + * @typedef {'first-review' | 're-review' | 'pre-pr' | 'aborted'} TrailerMode + * @typedef {{ critical: number, important: number, minor: number }} FindingCounts + */ + +const SEVERITY_EMOJI = { + info: 'ℹ️', + warning: '⚠', +} + +/** + * Creates a Notice object with the canonical shape. + * + * @param {NoticeSeverity} severity + * @param {NoticeKind} kind + * @param {string} message + * @returns {Notice} + */ +export function createNotice(severity, kind, message) { + return { severity, kind, message } +} + +/** + * Merges multiple Notice arrays, deduplicating by `kind` (first wins). + * + * @param {...Notice[]} sources + * @returns {Notice[]} + */ +export function mergeNotices(...sources) { + const seen = new Set() + const out = [] + for (const list of sources) { + for (const notice of list ?? []) { + if (seen.has(notice.kind)) continue + seen.add(notice.kind) + out.push(notice) + } + } + return out +} + +/** + * Renders Notices as a markdown block for the ADO Review Summary content. + * Heading stays bare so mixed info/warning lists are not misrepresented; + * each item carries its own per-severity emoji prefix. + * Returns an empty string when there are no notices. + * + * @param {Notice[]} notices + * @returns {string} + */ +export function formatNoticesAsSummaryBlock(notices) { + if (!notices || notices.length === 0) return '' + const lines = ['## Notices', ''] + for (const n of notices) { + lines.push(`${SEVERITY_EMOJI[n.severity]} ${n.message}`) + } + return lines.join('\n') +} + +/** + * Renders Notices as a preamble block for Pre-PR mode output in the Claude + * interface — same per-item shape as the Summary block, without the heading. + * Returns an empty string when there are no notices. + * + * @param {Notice[]} notices + * @returns {string} + */ +export function formatNoticesAsPrePrPreamble(notices) { + if (!notices || notices.length === 0) return '' + return notices.map((n) => `${SEVERITY_EMOJI[n.severity]} ${n.message}`).join('\n') +} + +/** + * Renders the mandatory end-of-run Trailer line for the Claude interface. + * Carries findings counts (with severity breakdown), notice counts by severity, + * and (for ADO modes) the PR URL. + * Minor findings are excluded from the parenthetical breakdown to keep the + * trailer concise; only critical and important counts are surfaced inline. + * + * @param {object} input + * @param {TrailerMode} input.mode + * @param {FindingCounts} [input.findings] + * @param {Notice[]} [input.notices] + * @param {string} [input.prUrl] + * @param {string} [input.abortKind] + * @param {string} [input.abortReason] + * @returns {string} + */ +export function formatTrailer(input) { + if (input.mode === 'aborted') { + const kind = input.abortKind ?? 'unknown' + return input.abortReason ? `❌ Review aborted: ${kind} — ${input.abortReason}` : `❌ Review aborted: ${kind}` + } + const findings = input.findings ?? { critical: 0, important: 0, minor: 0 } + const notices = input.notices ?? [] + const total = findings.critical + findings.important + findings.minor + const warnings = notices.filter((n) => n.severity === 'warning').length + const infos = notices.filter((n) => n.severity === 'info').length + const findingsPart = `${total} ${plural(total, 'finding')} (${findings.critical} critical, ${findings.important} important)` + const warnPart = `${warnings} ${plural(warnings, 'warning notice')}` + const infoPart = `${infos} ${plural(infos, 'info notice')}` + if (input.mode === 'pre-pr') { + return `✅ Pre-PR review complete: ${findingsPart} · ${warnPart}` + } + const url = input.prUrl ?? '' + return `✅ Review posted: ${findingsPart} · ${warnPart} · ${infoPart} → ${url}` +} + +/** + * @param {number} n + * @param {string} word + */ +function plural(n, word) { + return n === 1 ? word : `${word}s` +} diff --git a/apps/claude-code/pr-review/scripts/ado/parse-write-response.mjs b/apps/claude-code/pr-review/scripts/ado/parse-write-response.mjs new file mode 100644 index 0000000..0838d00 --- /dev/null +++ b/apps/claude-code/pr-review/scripts/ado/parse-write-response.mjs @@ -0,0 +1,51 @@ +// @ts-check + +import { classifyHttpError } from './classify-http-error.mjs' + +/** + * Routes an ADO write-call outcome through the canonical HTTP-tier mapping and + * extracts the created resource's `id` from the response body. + * + * @param {{ httpExit: number, responseText: string, errStream?: string }} input + * @returns {{ ok: true, id: number | null } | { ok: false, tier: string, kind: string, message: string }} + */ +export function parseWriteResponse({ httpExit, responseText, errStream = '' }) { + let bodyStatus = 0 + /** @type {any} */ + let parsed = null + + if (responseText?.trim()) { + try { + parsed = JSON.parse(responseText) + bodyStatus = typeof parsed?.statusCode === 'number' ? parsed.statusCode : 0 + } catch { + // parse failed — handled below + } + } + + const classified = classifyHttpError({ status: bodyStatus, body: responseText, exitCode: httpExit }) + + if (classified.tier !== 'ok') { + const errDetail = errStream ? ` — ${errStream.slice(0, 200)}` : '' + return { ok: false, tier: classified.tier, kind: classified.kind, message: classified.message + errDetail } + } + + // tier is 'ok' — try to extract a numeric id from the response body + if (parsed !== null && typeof parsed?.id === 'number') { + return { ok: true, id: parsed.id } + } + + // 404 and 409 are canonical-ok with no id (thread gone / state already changed) + if (bodyStatus === 404 || bodyStatus === 409) { + return { ok: true, id: null } + } + + // 200/201 without a numeric id — the write response is malformed + const errDetail = errStream ? ` — ${errStream.slice(0, 200)}` : '' + return { + ok: false, + tier: 'degraded', + kind: 'malformed-response', + message: `Write response did not contain a numeric id field${errDetail}`, + } +} diff --git a/apps/claude-code/pr-review/scripts/mode-detection.mjs b/apps/claude-code/pr-review/scripts/mode-detection.mjs new file mode 100644 index 0000000..4665b7d --- /dev/null +++ b/apps/claude-code/pr-review/scripts/mode-detection.mjs @@ -0,0 +1,59 @@ +// @ts-check + +import { detectPriorReview } from './re-review/detect-prior-review.mjs' + +/** + * @typedef {{ + * mode: 'first-review' | 're-review', + * isRereview: boolean, + * priorIterationId: string, + * summaryThreadId: string, + * }} ModeDetectionResult + */ + +/** + * Classifies a PR as `first-review` or `re-review` from its already-fetched + * thread list. Wraps `detectPriorReview` and stringifies the optional numeric + * fields so the orchestrator can consume them via plain shell. + * + * Pure function. No I/O. + * + * @param {{ threads: unknown[], signaturePrefix: string }} input + * @returns {ModeDetectionResult} + */ +export function detectMode({ threads, signaturePrefix }) { + const r = detectPriorReview({ + // detect-prior-review accepts the raw ADO thread shape; the orchestrator + // passes whatever `az repos pr thread list` returned, untouched. + // @ts-expect-error -- runtime-validated by detectPriorReview's own guards + threads: Array.isArray(threads) ? threads : [], + signaturePrefix, + }) + return { + mode: r.isRereview ? 're-review' : 'first-review', + isRereview: r.isRereview, + priorIterationId: r.priorIterationId != null ? String(r.priorIterationId) : '', + summaryThreadId: r.summaryThread != null ? String(r.summaryThread.threadId) : '', + } +} + +/** + * Formats a `ModeDetectionResult` as four newline-separated shell-friendly + * lines, intended to be eval-captured by the orchestrator: + * + * MODE=first-review + * IS_REREVIEW=false + * PRIOR_ITERATION_ID= + * SUMMARY_THREAD_ID= + * + * @param {ModeDetectionResult} result + * @returns {string} + */ +export function formatModeEnv(result) { + return [ + `MODE=${result.mode}`, + `IS_REREVIEW=${result.isRereview ? 'true' : 'false'}`, + `PRIOR_ITERATION_ID=${result.priorIterationId}`, + `SUMMARY_THREAD_ID=${result.summaryThreadId}`, + ].join('\n') +} diff --git a/apps/claude-code/pr-review/scripts/pre-pr.mjs b/apps/claude-code/pr-review/scripts/pre-pr.mjs new file mode 100644 index 0000000..f6da5b4 --- /dev/null +++ b/apps/claude-code/pr-review/scripts/pre-pr.mjs @@ -0,0 +1,96 @@ +// @ts-check + +/** + * @typedef {{ severity: 'info' | 'warning', kind: string, message: string }} Notice + * @typedef {{ changedFiles: string[], filteredFiles: string[], rawDiff: string, notices: Notice[] }} PrePrContext + */ + +/** + * Skip-list patterns for files that should not be passed to review agents. + * Matches: + * - C# source-generated files (*.g.cs) + * - swagger.md / swagger.json + * - Serialization YAML files (*.serialization.yaml / *.serialization.yml) + * - Files named generated-types.* or under a generated/ directory + * + * @param {string} filePath - Leading-slash forward-slash path, e.g. /src/foo.ts + * @returns {boolean} true if the file should be excluded from review + */ +export function shouldSkipFile(filePath) { + const lower = filePath.toLowerCase() + + // C# source-generated files + if (lower.endsWith('.g.cs')) return true + + // swagger + if (lower.endsWith('swagger.md') || lower.endsWith('swagger.json')) return true + + // Serialization YAMLs + if (lower.endsWith('.serialization.yaml') || lower.endsWith('.serialization.yml')) return true + + // generated-types.* (e.g. generated-types.ts, generated-types.d.ts) + const basename = filePath.split('/').pop() ?? '' + if (basename.toLowerCase().startsWith('generated-types.')) return true + + // files under a generated/ directory segment (case-insensitive: e.g. /Generated/ on .NET) + if (lower.includes('/generated/')) return true + + return false +} + +/** + * Parses the file paths touched by a `git diff` output. + * Extracts the `b/` path from each `diff --git` header and returns unique + * paths with a leading slash, matching the ADO path format. + * + * @param {string} diffText - Raw output of `git diff` + * @returns {string[]} Unique file paths with leading slash + */ +export function parseChangedFilesFromDiff(diffText) { + if (!diffText) return [] + + const seen = new Set() + const paths = [] + + for (const line of diffText.split(/\r?\n/)) { + const m = line.match(/^diff --git a\/.*? b\/(.+)$/) + if (m) { + const filePath = `/${m[1]}` + if (!seen.has(filePath)) { + seen.add(filePath) + paths.push(filePath) + } + } + } + + return paths +} + +/** + * Builds the Pre-PR context object from a raw git diff string. + * Returns all changed files, the subset that should be reviewed (filtered), + * the raw diff text, and any structural Notices emitted during parsing. + * + * Suspicious-shape detection: if diffText is non-empty and contains at least + * one `diff --git` header but parseChangedFilesFromDiff yields zero paths, + * a DEGRADED Notice (kind: diff-parse) is pushed to the notices array. + * + * @param {string} diffText - Raw output of `git diff origin/...HEAD` + * @returns {PrePrContext} + */ +export function buildPrePrContext(diffText) { + const changedFiles = parseChangedFilesFromDiff(diffText) + const filteredFiles = changedFiles.filter((f) => !shouldSkipFile(f)) + /** @type {Notice[]} */ + const notices = [] + + if (diffText && changedFiles.length === 0 && /^diff --git /m.test(diffText)) { + notices.push({ + severity: 'warning', + kind: 'diff-parse', + message: 'Pre-PR diff parsed to zero files but contained diff headers — input may be malformed.', + }) + } + + return { changedFiles, filteredFiles, rawDiff: diffText, notices } +} diff --git a/apps/claude-code/pr-review/scripts/pre-pr/detect-default-branch.mjs b/apps/claude-code/pr-review/scripts/pre-pr/detect-default-branch.mjs new file mode 100644 index 0000000..3efa4ad --- /dev/null +++ b/apps/claude-code/pr-review/scripts/pre-pr/detect-default-branch.mjs @@ -0,0 +1,59 @@ +// @ts-check + +/** @typedef {import('../ado/notices.mjs').Notice} Notice */ +/** @typedef {{ branch: string | null, source: 'remote-show' | 'develop-fallback' | 'main-fallback' | 'master-fallback' | 'none', notice?: Notice }} DetectResult */ + +/** + * Detects the default branch via a prioritized fallback chain. + * + * Chain order: + * 1. remoteHeadBranch (non-empty) — parsed from `git remote show origin` HEAD branch line + * 2. 'develop' checked via branchExists + * 3. 'main' checked via branchExists + * 4. 'master' checked via branchExists + * 5. none — returns { branch: null, source: 'none' } + * + * Emits a warning Notice for every fallback level (levels 2–5). Level 1 is + * considered authoritative so no notice is emitted. Level 5 also emits a + * warning notice; the caller is expected to abort on branch: null. + * + * @param {{ branchExists: (name: string) => boolean, remoteHeadBranch: string }} input + * @returns {DetectResult} + */ +export function detectDefaultBranch({ branchExists, remoteHeadBranch }) { + if (remoteHeadBranch?.trim()) { + return { branch: remoteHeadBranch.trim(), source: 'remote-show' } + } + + /** @type {Array<[string, 'develop-fallback' | 'main-fallback' | 'master-fallback']>} */ + const fallbacks = [ + ['develop', 'develop-fallback'], + ['main', 'main-fallback'], + ['master', 'master-fallback'], + ] + + for (const [name, source] of fallbacks) { + if (branchExists(name)) { + return { + branch: name, + source, + notice: { + severity: 'warning', + kind: 'default-branch', + message: `Default branch not detected via remote-show; computed diff against origin/${name} (${source}).`, + }, + } + } + } + + return { + branch: null, + source: 'none', + notice: { + severity: 'warning', + kind: 'default-branch', + message: + 'Could not detect a default branch: remote-show failed and no develop/main/master branch found locally. Pre-PR run aborted.', + }, + } +} diff --git a/apps/claude-code/pr-review/scripts/re-review/classify-thread.mjs b/apps/claude-code/pr-review/scripts/re-review/classify-thread.mjs index 1b67fe3..4336b9f 100644 --- a/apps/claude-code/pr-review/scripts/re-review/classify-thread.mjs +++ b/apps/claude-code/pr-review/scripts/re-review/classify-thread.mjs @@ -10,49 +10,65 @@ const RESOLVED_STATUSES = new Set(['fixed', 'wontFix', 'closed', 'byDesign', 2, /** * Classifies a prior review thread into one of four states using diff hunk data. - * Rules evaluated in order (spec 05): - * 1. addressed — ADO status is resolved OR line range intersects a diff hunk + * Rules evaluated in order (ADR-0004): + * 1. addressed — ADO thread status is in RESOLVED_STATUSES (fixed / wontFix / closed / byDesign / 2–5) * 2. obsolete — filePath non-null and absent from diff (or file was deleted) - * 3. disputed — at least one comment has no bot signature - * 4. pending — all comments carry the bot signature + * 3. addressed — line range intersects a changed diff hunk + * 4. disputed — at least one comment has no bot signature + * 5. pending — all comments carry the bot signature * - * @param {{ thread: PriorThread, diffHunks: DiffHunk[], signaturePrefix: string }} input + * γ-downgrade (ADR-0004): when diffRange is 'full', outputs 'addressed' and 'obsolete' + * are remapped to 'pending' since diff-position evidence is unreliable on a widened range. + * 'disputed' is unaffected (its derivation is reviewer-reply-based, not diff-position-based). + * + * @param {{ thread: PriorThread, diffHunks: DiffHunk[], signaturePrefix: string, diffRange?: 'full' | 'incremental' }} input * @returns {'addressed' | 'disputed' | 'pending' | 'obsolete'} */ -export function classifyThread({ thread, diffHunks, signaturePrefix }) { +export function classifyThread({ thread, diffHunks, signaturePrefix, diffRange = 'incremental' }) { const { filePath, start, end, comments, status } = thread - if (RESOLVED_STATUSES.has(status)) return 'addressed' + /** @type {'addressed' | 'disputed' | 'pending' | 'obsolete'} */ + let result - const diffFiles = new Set(diffHunks.map((h) => h.filePath)) + if (RESOLVED_STATUSES.has(status)) { + result = 'addressed' + } else { + const diffFiles = new Set(diffHunks.map((h) => h.filePath)) - /** @type {Map>} */ - const hunkMap = new Map() - for (const h of diffHunks) { - const ranges = hunkMap.get(h.filePath) ?? [] - ranges.push([h.startLine, h.endLine]) - hunkMap.set(h.filePath, ranges) - } + /** @type {Map>} */ + const hunkMap = new Map() + for (const h of diffHunks) { + const ranges = hunkMap.get(h.filePath) ?? [] + ranges.push([h.startLine, h.endLine]) + hunkMap.set(h.filePath, ranges) + } - // Files whose every hunk is [0, 0] were deleted from the PR - const deletedFiles = new Set( - [...hunkMap.entries()].filter(([, ranges]) => ranges.every(([s, e]) => s === 0 && e === 0)).map(([fp]) => fp) - ) + // Files whose every hunk is [0, 0] were deleted from the PR + const deletedFiles = new Set( + [...hunkMap.entries()].filter(([, ranges]) => ranges.every(([s, e]) => s === 0 && e === 0)).map(([fp]) => fp) + ) - if (filePath !== null && (!diffFiles.has(filePath) || deletedFiles.has(filePath))) { - return 'obsolete' - } + if (filePath !== null && (!diffFiles.has(filePath) || deletedFiles.has(filePath))) { + result = 'obsolete' + } else { + const startLine = start?.line ?? null + const endLine = end?.line ?? null + const intersects = + filePath !== null && + startLine !== null && + endLine !== null && + (hunkMap.get(filePath) ?? []).some(([hs, he]) => Math.max(startLine, hs) <= Math.min(endLine, he)) - const startLine = start?.line ?? null - const endLine = end?.line ?? null - const intersects = - filePath !== null && - startLine !== null && - endLine !== null && - (hunkMap.get(filePath) ?? []).some(([hs, he]) => Math.max(startLine, hs) <= Math.min(endLine, he)) - - if (intersects) return 'addressed' + if (intersects) { + result = 'addressed' + } else { + const hasHuman = comments.some((c) => !(c.content ?? '').includes(signaturePrefix)) + result = hasHuman ? 'disputed' : 'pending' + } + } + } - const hasHuman = comments.some((c) => !(c.content ?? '').includes(signaturePrefix)) - return hasHuman ? 'disputed' : 'pending' + // γ-downgrade: full diff range makes diff-position verdicts unreliable + if (diffRange === 'full' && (result === 'addressed' || result === 'obsolete')) return 'pending' + return result } diff --git a/apps/claude-code/pr-review/scripts/re-review/match-finding.mjs b/apps/claude-code/pr-review/scripts/re-review/match-finding.mjs index b257b5b..d91bd70 100644 --- a/apps/claude-code/pr-review/scripts/re-review/match-finding.mjs +++ b/apps/claude-code/pr-review/scripts/re-review/match-finding.mjs @@ -11,11 +11,24 @@ * and line-range overlap with ±driftLines tolerance (default 3). * Summary threads are always skipped. * + * Returns `null` for a legitimate no-match. Throws a TypeError when the inputs + * are structurally invalid (distinguishable from a legitimate no-match so callers + * can surface a DEGRADED Notice instead of silently treating it as no-match). + * * @param {{ finding: Finding, priorThreads: PriorThread[], driftLines?: number }} input * @returns {PriorThread | null} */ export function matchFinding({ finding, priorThreads, driftLines = 3 }) { + if (!Array.isArray(priorThreads)) { + throw new TypeError('priorThreads must be an array') + } + if (finding == null || typeof finding !== 'object') { + throw new TypeError('finding must be an object with filePath, startLine, and endLine') + } const { filePath, startLine, endLine } = finding + if (typeof filePath !== 'string' || typeof startLine !== 'number' || typeof endLine !== 'number') { + throw new TypeError('finding must have filePath (string), startLine (number), and endLine (number)') + } const fs = startLine - driftLines const fe = endLine + driftLines diff --git a/apps/claude-code/pr-review/scripts/re-review/parse-diff-hunks.mjs b/apps/claude-code/pr-review/scripts/re-review/parse-diff-hunks.mjs new file mode 100644 index 0000000..1e75c08 --- /dev/null +++ b/apps/claude-code/pr-review/scripts/re-review/parse-diff-hunks.mjs @@ -0,0 +1,50 @@ +// @ts-check + +/** + * @typedef {{ filePath: string, startLine: number, endLine: number }} DiffHunk + */ + +const FILE_HEADER_RE = /^diff --git a\/.* b\/(.*)$/ +const HUNK_HEADER_RE = /^@@ -\d+(?:,\d+)? \+(\d+)(?:,(\d+))? @@/ + +/** + * Parses a unified `git diff` text into an array of per-hunk ranges on the +side. + * + * Each entry is `{ filePath, startLine, endLine }`. `filePath` is slash-prefixed + * (e.g. `/src/foo.ts`) to match the format consumed by `classify-thread` and + * `match-finding`. No deduplication is performed — every hunk produces an entry. + * + * Hunk headers without a `+side` (binary diffs, pure deletes) are skipped. + * Hunk headers appearing before any `diff --git` line are ignored. + * CRLF line endings are handled transparently. + * + * Pure function. No I/O. + * + * @param {string} rawDiff + * @returns {DiffHunk[]} + */ +export function parseDiffHunks(rawDiff) { + if (!rawDiff) return [] + /** @type {DiffHunk[]} */ + const hunks = [] + /** @type {string | null} */ + let currentFile = null + + const lines = rawDiff.split(/\r?\n/) + for (const line of lines) { + const fileMatch = line.match(FILE_HEADER_RE) + if (fileMatch) { + currentFile = `/${fileMatch[1]}` + continue + } + const hunkMatch = line.match(HUNK_HEADER_RE) + if (hunkMatch && currentFile) { + const startLine = Number(hunkMatch[1]) + const count = hunkMatch[2] != null ? Number(hunkMatch[2]) : 1 + const endLine = startLine + Math.max(count - 1, 0) + hunks.push({ filePath: currentFile, startLine, endLine }) + } + } + + return hunks +} diff --git a/apps/claude-code/pr-review/tests/ado-fetcher.test.mjs b/apps/claude-code/pr-review/tests/ado-fetcher.test.mjs new file mode 100644 index 0000000..aeeb769 --- /dev/null +++ b/apps/claude-code/pr-review/tests/ado-fetcher.test.mjs @@ -0,0 +1,108 @@ +// @ts-check + +import assert from 'node:assert/strict' +import { readFileSync } from 'node:fs' +import { describe, it } from 'node:test' + +/** Reads the ado-fetcher agent markdown for content assertions */ +const agentContent = readFileSync(new URL('../.agents/ado-fetcher.md', import.meta.url), 'utf8') + +describe('ado-fetcher agent content', () => { + it('contains no ADO write HTTP methods (POST/PATCH/DELETE)', () => { + // Allow POST only in comments/explanatory text preceded by 'no' or 'Never' + // The guard: strip lines that are clearly explanatory (contain "Never" or "no write") + const lines = agentContent.split('\n') + const suspectLines = lines.filter((line) => { + const trimmed = line.trim() + // Skip comment lines and the "Never add" instruction line itself + if (trimmed.startsWith('#')) return false + if (trimmed.toLowerCase().includes('never add')) return false + if (trimmed.toLowerCase().includes('no write')) return false + // Flag --http-method POST/PATCH/DELETE + return /--http-method\s+(POST|PATCH|DELETE)/i.test(trimmed) + }) + assert.deepEqual(suspectLines, [], `Agent contains write operations: ${suspectLines.join(' | ')}`) + }) + + it('declares allowed-tools in frontmatter', () => { + assert.ok(agentContent.startsWith('---'), 'Missing YAML frontmatter') + assert.ok(agentContent.includes('allowed-tools:'), 'Missing allowed-tools key') + }) + + it('outputs a structured context block with required fields', () => { + const requiredFields = [ + 'ADO_FETCHER_RESULT_START', + 'ADO_FETCHER_RESULT_END', + 'REPO_ID', + 'PR_TITLE', + 'LATEST_ITERATION_ID', + 'LATEST_COMMIT_SHA', + 'DIFF_RANGE', + 'WORK_ITEM_IDS', + 'CHANGED_FILES', + 'RAW_DIFF', + ] + for (const field of requiredFields) { + assert.ok(agentContent.includes(field), `Missing required output field: ${field}`) + } + }) + + it('emits DIFF_RANGE=incremental on successful incremental diff and DIFF_RANGE=full on fallback', () => { + assert.ok(agentContent.includes('DIFF_RANGE=incremental'), 'Missing DIFF_RANGE=incremental assignment') + assert.ok(agentContent.includes('DIFF_RANGE=full'), 'Missing DIFF_RANGE=full assignment') + }) + + it('sets DIFF_RANGE_FALLBACK=true when prior commit is unreachable', () => { + assert.ok( + agentContent.includes('DIFF_RANGE_FALLBACK=true'), + 'Fallback branch must set DIFF_RANGE_FALLBACK=true so Step 6 can emit the diff-range Notice' + ) + }) + + it('emits a warning diff-range Notice when DIFF_RANGE_FALLBACK is set', () => { + assert.ok( + agentContent.includes('diff-range'), + 'Step 6 must check DIFF_RANGE_FALLBACK and emit a warning Notice with kind: diff-range' + ) + assert.ok( + agentContent.includes('DIFF_RANGE_FB') || agentContent.includes('DIFF_RANGE_FALLBACK'), + 'Step 6 must pass the fallback flag into the Notice-building script' + ) + }) + + it('aborts on empty iterations (no iterationId=1 default)', () => { + assert.ok( + !agentContent.includes('defaulting to iteration 1') && !agentContent.includes('iterationId=1'), + 'Agent must not fall back to iterationId=1 — empty iterations must abort the run' + ) + assert.ok( + agentContent.includes('empty-iterations') || + agentContent.includes('fetch-iterations') || + agentContent.includes('fetchIterations'), + 'Agent must delegate iteration parsing to fetchIterations helper' + ) + }) + + it('documents that merged PRs are handled without error', () => { + assert.ok( + agentContent.includes('already merged') || + agentContent.includes('mergeStatus') || + agentContent.includes('continue without error'), + 'Agent must document handling of already-merged PRs' + ) + }) + + it('invokes the fetchIterations helper from scripts/ado/fetch-iterations.mjs', () => { + assert.ok( + agentContent.includes('fetchIterations') || agentContent.includes('fetch-iterations'), + 'Agent must delegate iteration parsing to fetchIterations helper' + ) + }) + + it('invokes the fetchWorkItems helper from scripts/ado/fetch-work-items.mjs', () => { + assert.ok( + agentContent.includes('fetchWorkItems') || agentContent.includes('fetch-work-items'), + 'Agent must delegate work-item fetching to fetchWorkItems helper' + ) + }) +}) diff --git a/apps/claude-code/pr-review/tests/ado-writer.test.mjs b/apps/claude-code/pr-review/tests/ado-writer.test.mjs new file mode 100644 index 0000000..8154091 --- /dev/null +++ b/apps/claude-code/pr-review/tests/ado-writer.test.mjs @@ -0,0 +1,269 @@ +// @ts-check + +import assert from 'node:assert/strict' +import { readFileSync } from 'node:fs' +import { describe, it } from 'node:test' +import { parseAdoWriterResult } from '../scripts/ado-writer.mjs' + +/** Reads the ado-writer agent markdown for content assertions */ +const agentContent = readFileSync(new URL('../.agents/ado-writer.md', import.meta.url), 'utf8') + +describe('ado-writer agent content', () => { + it('declares allowed-tools in frontmatter', () => { + assert.ok(agentContent.startsWith('---'), 'Missing YAML frontmatter') + assert.ok(agentContent.includes('allowed-tools:'), 'Missing allowed-tools key') + }) + + it('contains no ADO read-only operations (GET)', () => { + const lines = agentContent.split('\n') + const suspectLines = lines.filter((line) => { + const trimmed = line.trim() + if (trimmed.startsWith('#')) return false + return /--http-method\s+GET/i.test(trimmed) + }) + assert.deepEqual( + suspectLines, + [], + `Agent contains GET operations (reads should stay in ado-fetcher): ${suspectLines.join(' | ')}` + ) + }) + + it('accepts all required input fields', () => { + const requiredInputs = [ + 'ORG_URL', + 'PROJECT', + 'REPO_ID', + 'PR_ID', + 'LATEST_ITERATION_ID', + 'SUMMARY_THREAD_ID', + 'MODE', + ] + for (const field of requiredInputs) { + assert.ok(agentContent.includes(field), `Missing required input field: ${field}`) + } + }) + + it('accepts a findings list with the compact finding schema', () => { + const requiredFindingFields = ['severity', 'filePath', 'startLine', 'endLine', 'title', 'body'] + for (const field of requiredFindingFields) { + assert.ok(agentContent.includes(field), `Missing compact finding field: ${field}`) + } + }) + + it('posts inline comment threads using POST to pullRequestThreads', () => { + assert.ok( + agentContent.includes('pullRequestThreads') && agentContent.includes('--http-method POST'), + 'Agent must POST to pullRequestThreads for inline comments' + ) + }) + + it('includes threadContext with filePath and line range in inline comments', () => { + assert.ok(agentContent.includes('threadContext'), 'Agent must use threadContext for inline comments') + assert.ok(agentContent.includes('rightFileStart'), 'Agent must set rightFileStart in threadContext') + assert.ok(agentContent.includes('rightFileEnd'), 'Agent must set rightFileEnd in threadContext') + }) + + it('appends canonical Bot Signature to every comment', () => { + assert.ok(agentContent.includes('🤖 *Reviewed by Claude Code*'), 'Agent must include the canonical Bot Signature') + assert.ok(agentContent.includes('LATEST_ITERATION_ID'), 'Agent must include LATEST_ITERATION_ID in the signature') + }) + + it('posts full Review Summary on first-review mode', () => { + assert.ok( + agentContent.includes('first-review') || + agentContent.includes('first_review') || + agentContent.includes('IS_REREVIEW=false'), + 'Agent must handle first-review mode' + ) + assert.ok(agentContent.includes('PR Review Summary'), 'Agent must post PR Review Summary on first-review') + }) + + it('posts delta reply to existing summary thread on re-review with findings', () => { + assert.ok( + agentContent.includes('re-review') || agentContent.includes('IS_REREVIEW=true'), + 'Agent must handle re-review mode' + ) + assert.ok( + agentContent.includes('pullRequestThreadComments'), + 'Agent must POST to pullRequestThreadComments for delta reply' + ) + }) + + it('skips summary reply on re-review with zero new findings', () => { + assert.ok( + agentContent.includes('zero') || + agentContent.includes('no new findings') || + agentContent.includes('FINDINGS_POSTED=0') || + agentContent.includes('nothing to report') || + agentContent.includes('skip'), + 'Agent must document skipping summary reply when there are no new findings' + ) + }) + + it('retries without threadContext on ADO rejection', () => { + assert.ok( + agentContent.includes('threadContext') && + (agentContent.includes('retry') || + agentContent.includes('without') || + agentContent.includes('fallback') || + agentContent.includes('fall back') || + agentContent.includes('general comment')), + 'Agent must retry without threadContext when ADO rejects the inline placement' + ) + }) + + it('posts completion marker as final action', () => { + assert.ok(agentContent.includes('✅ Review complete'), 'Agent must post completion marker reply') + assert.ok( + agentContent.includes('completion marker') || + agentContent.includes('Completion marker') || + agentContent.includes('final action'), + 'Agent must document completion marker as final action' + ) + }) + + it('returns structured output block with SUMMARY_THREAD_ID and FINDINGS_POSTED', () => { + const requiredOutputFields = [ + 'ADO_WRITER_RESULT_START', + 'ADO_WRITER_RESULT_END', + 'SUMMARY_THREAD_ID', + 'FINDINGS_POSTED', + ] + for (const field of requiredOutputFields) { + assert.ok(agentContent.includes(field), `Missing required output field: ${field}`) + } + }) + + it('invokes parseAdoWriterResult helper from ado-writer.mjs', () => { + assert.ok( + agentContent.includes('parseAdoWriterResult'), + 'Agent must delegate output parsing to parseAdoWriterResult helper' + ) + }) +}) + +describe('parseAdoWriterResult', () => { + it('parses a valid result block into summaryThreadId and findingsPosted', () => { + const output = ` +ADO_WRITER_RESULT_START +SUMMARY_THREAD_ID: 42 +FINDINGS_POSTED: 5 +NOTICES: [] +ADO_WRITER_RESULT_END +`.trim() + const result = parseAdoWriterResult(output) + assert.equal(result.ok, true) + assert.equal(result.summaryThreadId, 42) + assert.equal(result.findingsPosted, 5) + assert.deepEqual(result.notices, []) + }) + + it('returns summaryThreadId=null when SUMMARY_THREAD_ID is empty', () => { + const output = ` +ADO_WRITER_RESULT_START +SUMMARY_THREAD_ID: +FINDINGS_POSTED: 0 +NOTICES: [] +ADO_WRITER_RESULT_END +`.trim() + const result = parseAdoWriterResult(output) + assert.equal(result.ok, true) + assert.equal(result.summaryThreadId, null) + assert.equal(result.findingsPosted, 0) + assert.deepEqual(result.notices, []) + }) + + it('returns { ok: false, reason: "missing-block" } when no result block is present', () => { + const result = parseAdoWriterResult('No result block here') + assert.equal(result.ok, false) + assert.equal(result.reason, 'missing-block') + }) + + it('returns { ok: false, reason: "malformed" } when block is present but FINDINGS_POSTED is absent', () => { + const output = `ADO_WRITER_RESULT_START\nSUMMARY_THREAD_ID: 5\nNOTICES: []\nADO_WRITER_RESULT_END` + const result = parseAdoWriterResult(output) + assert.equal(result.ok, false) + assert.equal(result.reason, 'malformed') + }) + + it('handles FINDINGS_POSTED=0 explicitly', () => { + const output = `ADO_WRITER_RESULT_START\nSUMMARY_THREAD_ID: 7\nFINDINGS_POSTED: 0\nNOTICES: []\nADO_WRITER_RESULT_END` + const result = parseAdoWriterResult(output) + assert.equal(result.ok, true) + assert.equal(result.summaryThreadId, 7) + assert.equal(result.findingsPosted, 0) + }) + + it('handles output with extra content around the result block', () => { + const output = [ + 'Posting inline comments...', + 'ADO_WRITER_RESULT_START', + 'SUMMARY_THREAD_ID: 99', + 'FINDINGS_POSTED: 3', + 'NOTICES: []', + 'ADO_WRITER_RESULT_END', + 'Done.', + ].join('\n') + const result = parseAdoWriterResult(output) + assert.equal(result.ok, true) + assert.equal(result.summaryThreadId, 99) + assert.equal(result.findingsPosted, 3) + }) + + it('parses NOTICES array from result block', () => { + const notices = [ + { + severity: 'warning', + kind: 'inline-post', + message: 'Failed to post inline thread at /src/foo.ts:42 (HTTP 503).', + }, + ] + const output = [ + 'ADO_WRITER_RESULT_START', + 'SUMMARY_THREAD_ID: 10', + 'FINDINGS_POSTED: 2', + `NOTICES: ${JSON.stringify(notices)}`, + 'ADO_WRITER_RESULT_END', + ].join('\n') + const result = parseAdoWriterResult(output) + assert.equal(result.ok, true) + assert.deepEqual(result.notices, notices) + }) + + it('returns empty notices when NOTICES field is absent (legacy block)', () => { + const output = `ADO_WRITER_RESULT_START\nSUMMARY_THREAD_ID: 5\nFINDINGS_POSTED: 1\nADO_WRITER_RESULT_END` + const result = parseAdoWriterResult(output) + assert.equal(result.ok, true) + assert.deepEqual(result.notices, []) + }) + + it('returns { ok: false, reason: "malformed" } when NOTICES field is malformed JSON', () => { + const output = `ADO_WRITER_RESULT_START\nSUMMARY_THREAD_ID: 5\nFINDINGS_POSTED: 1\nNOTICES: [broken\nADO_WRITER_RESULT_END` + const result = parseAdoWriterResult(output) + assert.equal(result.ok, false) + assert.equal(result.reason, 'malformed') + }) +}) + +describe('ado-writer agent uses parse-write-response helper', () => { + it('references parse-write-response.mjs in the agent prompt', () => { + assert.ok( + agentContent.includes('parse-write-response.mjs') || agentContent.includes('parseWriteResponse'), + 'Agent must delegate write-response parsing to parse-write-response helper' + ) + }) + + it('emits a NOTICES array in the result block', () => { + assert.ok(agentContent.includes('NOTICES:'), 'Agent result block must include a NOTICES field') + }) + + it('initialises a NOTICES array before posting begins', () => { + assert.ok( + agentContent.includes('NOTICES=') || + agentContent.includes('NOTICES =(') || + agentContent.includes("NOTICES='[]'") || + agentContent.includes('NOTICES="[]"'), + 'Agent must initialise a NOTICES array before writing begins' + ) + }) +}) diff --git a/apps/claude-code/pr-review/tests/classify-http-error.test.mjs b/apps/claude-code/pr-review/tests/classify-http-error.test.mjs new file mode 100644 index 0000000..f22ca2b --- /dev/null +++ b/apps/claude-code/pr-review/tests/classify-http-error.test.mjs @@ -0,0 +1,105 @@ +// @ts-check + +import assert from 'node:assert/strict' +import { describe, it } from 'node:test' +import { classifyHttpError } from '../scripts/ado/classify-http-error.mjs' + +describe('classifyHttpError — OK tier', () => { + it('HTTP 200 → ok', () => { + const r = classifyHttpError({ status: 200, body: '{"id":1}', exitCode: 0 }) + assert.equal(r.tier, 'ok') + }) + + it('HTTP 201 → ok', () => { + const r = classifyHttpError({ status: 201, body: '{"id":2}', exitCode: 0 }) + assert.equal(r.tier, 'ok') + }) + + it('HTTP 404 → ok (domain: thing already gone)', () => { + const r = classifyHttpError({ status: 404, body: '', exitCode: 0 }) + assert.equal(r.tier, 'ok') + }) + + it('HTTP 409 → ok (domain: state already changed)', () => { + const r = classifyHttpError({ status: 409, body: '', exitCode: 0 }) + assert.equal(r.tier, 'ok') + }) +}) + +describe('classifyHttpError — ABORTED tier', () => { + it('HTTP 401 → aborted with kind=auth', () => { + const r = classifyHttpError({ status: 401, body: 'Unauthorized', exitCode: 1 }) + assert.equal(r.tier, 'aborted') + assert.equal(r.kind, 'auth') + assert.ok(r.message.length > 0) + }) + + it('HTTP 403 → aborted with kind=auth', () => { + const r = classifyHttpError({ status: 403, body: 'Forbidden', exitCode: 1 }) + assert.equal(r.tier, 'aborted') + assert.equal(r.kind, 'auth') + }) +}) + +describe('classifyHttpError — DEGRADED tier', () => { + it('HTTP 500 → degraded with kind=transient', () => { + const r = classifyHttpError({ status: 500, body: 'Internal Server Error', exitCode: 1 }) + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'transient') + }) + + it('HTTP 503 → degraded with kind=transient', () => { + const r = classifyHttpError({ status: 503, body: 'Service Unavailable', exitCode: 1 }) + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'transient') + }) + + it('HTTP 400 → degraded with kind=malformed-request', () => { + const r = classifyHttpError({ status: 400, body: 'Bad Request', exitCode: 1 }) + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'malformed-request') + }) + + it('HTTP 422 → degraded with kind=malformed-request', () => { + const r = classifyHttpError({ status: 422, body: 'Unprocessable Entity', exitCode: 1 }) + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'malformed-request') + }) + + it('network error (exitCode=1, no status) → degraded with kind=network', () => { + const r = classifyHttpError({ status: 0, body: '', exitCode: 1 }) + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'network') + }) + + it('network error (exitCode=2, no status) → degraded with kind=network', () => { + const r = classifyHttpError({ status: 0, body: 'connection refused', exitCode: 2 }) + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'network') + assert.ok(r.message.includes('connection refused') || r.message.includes('2')) + }) +}) + +describe('classifyHttpError — message content', () => { + it('includes the HTTP status in the message for 5xx errors', () => { + const r = classifyHttpError({ status: 503, body: 'Service Unavailable', exitCode: 1 }) + assert.ok(r.message.includes('503'), `expected message to contain "503", got: ${r.message}`) + }) + + it('includes body excerpt in message for 4xx errors', () => { + const body = 'Invalid parameter: filePath must start with /' + const r = classifyHttpError({ status: 400, body, exitCode: 1 }) + assert.ok(r.message.includes('400'), `expected message to contain "400", got: ${r.message}`) + }) + + it('malformed JSON body does not crash — uses status to determine tier', () => { + const r = classifyHttpError({ status: 401, body: '<<>>', exitCode: 1 }) + assert.equal(r.tier, 'aborted') + assert.equal(r.kind, 'auth') + }) + + it('ok tier returns empty message', () => { + const r = classifyHttpError({ status: 200, body: '{"id":1}', exitCode: 0 }) + assert.equal(r.message, '') + }) +}) diff --git a/apps/claude-code/pr-review/tests/classify-thread.test.mjs b/apps/claude-code/pr-review/tests/classify-thread.test.mjs index d27415f..04f8482 100644 --- a/apps/claude-code/pr-review/tests/classify-thread.test.mjs +++ b/apps/claude-code/pr-review/tests/classify-thread.test.mjs @@ -166,4 +166,127 @@ describe('classifyThread', () => { } assert.equal(classifyThread({ thread, diffHunks: noChangeDiff, signaturePrefix: SIGNATURE_PREFIX }), 'addressed') }) + + it('γ-downgrade: diffRange=full, line intersects hunk → pending (not addressed)', () => { + /** @type {import('../scripts/re-review/classify-thread.mjs').PriorThread} */ + const thread = { + threadId: 200, + filePath: '/src/utils.ts', + start: { line: 10 }, + end: { line: 15 }, + comments: [{ content: `Finding.\n---\n${SIGNATURE_PREFIX} — Iteration 1` }], + status: 'active', + } + /** @type {import('../scripts/re-review/classify-thread.mjs').DiffHunk[]} */ + const hunks = [{ filePath: '/src/utils.ts', startLine: 12, endLine: 13 }] + assert.equal( + classifyThread({ thread, diffHunks: hunks, signaturePrefix: SIGNATURE_PREFIX, diffRange: 'full' }), + 'pending' + ) + }) + + it('γ-downgrade: diffRange=full, file absent from diff → pending (not obsolete)', () => { + /** @type {import('../scripts/re-review/classify-thread.mjs').PriorThread} */ + const thread = { + threadId: 201, + filePath: '/src/legacy.ts', + start: { line: 5 }, + end: { line: 5 }, + comments: [{ content: `Finding.\n---\n${SIGNATURE_PREFIX} — Iteration 1` }], + status: 'active', + } + assert.equal( + classifyThread({ thread, diffHunks: withChangesDiff, signaturePrefix: SIGNATURE_PREFIX, diffRange: 'full' }), + 'pending' + ) + }) + + it('γ-downgrade: diffRange=full, disputed thread → still disputed (unaffected)', () => { + const thread = toThread(loadFixture('threads-disputed').value[0]) + assert.equal( + classifyThread({ thread, diffHunks: withChangesDiff, signaturePrefix: SIGNATURE_PREFIX, diffRange: 'full' }), + 'disputed' + ) + }) + + it('γ-downgrade: diffRange=full, status=fixed → pending (status-based addressed downgraded)', () => { + /** @type {import('../scripts/re-review/classify-thread.mjs').PriorThread} */ + const thread = { + threadId: 202, + filePath: '/src/feature.ts', + start: { line: 10 }, + end: { line: 10 }, + comments: [{ content: `Finding.\n---\n${SIGNATURE_PREFIX} — Iteration 1` }], + status: 'fixed', + } + assert.equal( + classifyThread({ thread, diffHunks: noChangeDiff, signaturePrefix: SIGNATURE_PREFIX, diffRange: 'full' }), + 'pending' + ) + }) + + it('string status wontFix → addressed', () => { + /** @type {import('../scripts/re-review/classify-thread.mjs').PriorThread} */ + const thread = { + threadId: 106, + filePath: '/src/api.ts', + start: { line: 1 }, + end: { line: 1 }, + comments: [{ content: `Finding.\n---\n${SIGNATURE_PREFIX} — Iteration 1` }], + status: 'wontFix', + } + assert.equal(classifyThread({ thread, diffHunks: noChangeDiff, signaturePrefix: SIGNATURE_PREFIX }), 'addressed') + }) + + it('string status closed → addressed', () => { + /** @type {import('../scripts/re-review/classify-thread.mjs').PriorThread} */ + const thread = { + threadId: 107, + filePath: '/src/api.ts', + start: { line: 1 }, + end: { line: 1 }, + comments: [{ content: `Finding.\n---\n${SIGNATURE_PREFIX} — Iteration 1` }], + status: 'closed', + } + assert.equal(classifyThread({ thread, diffHunks: noChangeDiff, signaturePrefix: SIGNATURE_PREFIX }), 'addressed') + }) + + it('string status byDesign → addressed', () => { + /** @type {import('../scripts/re-review/classify-thread.mjs').PriorThread} */ + const thread = { + threadId: 108, + filePath: '/src/api.ts', + start: { line: 1 }, + end: { line: 1 }, + comments: [{ content: `Finding.\n---\n${SIGNATURE_PREFIX} — Iteration 1` }], + status: 'byDesign', + } + assert.equal(classifyThread({ thread, diffHunks: noChangeDiff, signaturePrefix: SIGNATURE_PREFIX }), 'addressed') + }) + + it('numeric status 3 → addressed', () => { + /** @type {import('../scripts/re-review/classify-thread.mjs').PriorThread} */ + const thread = { + threadId: 109, + filePath: '/src/api.ts', + start: { line: 1 }, + end: { line: 1 }, + comments: [{ content: `Finding.\n---\n${SIGNATURE_PREFIX} — Iteration 1` }], + status: 3, + } + assert.equal(classifyThread({ thread, diffHunks: noChangeDiff, signaturePrefix: SIGNATURE_PREFIX }), 'addressed') + }) + + it('numeric status 4 → addressed', () => { + /** @type {import('../scripts/re-review/classify-thread.mjs').PriorThread} */ + const thread = { + threadId: 110, + filePath: '/src/api.ts', + start: { line: 1 }, + end: { line: 1 }, + comments: [{ content: `Finding.\n---\n${SIGNATURE_PREFIX} — Iteration 1` }], + status: 4, + } + assert.equal(classifyThread({ thread, diffHunks: noChangeDiff, signaturePrefix: SIGNATURE_PREFIX }), 'addressed') + }) }) diff --git a/apps/claude-code/pr-review/tests/detect-default-branch.test.mjs b/apps/claude-code/pr-review/tests/detect-default-branch.test.mjs new file mode 100644 index 0000000..a7e9467 --- /dev/null +++ b/apps/claude-code/pr-review/tests/detect-default-branch.test.mjs @@ -0,0 +1,73 @@ +// @ts-check + +import assert from 'node:assert/strict' +import { describe, it } from 'node:test' +import { detectDefaultBranch } from '../scripts/pre-pr/detect-default-branch.mjs' + +const noBranch = () => false +const allBranches = () => true + +describe('detectDefaultBranch', () => { + it('remoteHeadBranch set → returns it as branch with source remote-show, no notice', () => { + const result = detectDefaultBranch({ branchExists: noBranch, remoteHeadBranch: 'main' }) + assert.equal(result.branch, 'main') + assert.equal(result.source, 'remote-show') + assert.equal(result.notice, undefined) + }) + + it('remoteHeadBranch = "develop" → returns develop with source remote-show, no notice', () => { + const result = detectDefaultBranch({ branchExists: noBranch, remoteHeadBranch: 'develop' }) + assert.equal(result.branch, 'develop') + assert.equal(result.source, 'remote-show') + assert.equal(result.notice, undefined) + }) + + it('remoteHeadBranch empty, develop exists → develop-fallback + warning notice', () => { + const result = detectDefaultBranch({ branchExists: (n) => n === 'develop', remoteHeadBranch: '' }) + assert.equal(result.branch, 'develop') + assert.equal(result.source, 'develop-fallback') + assert.equal(result.notice?.severity, 'warning') + assert.equal(result.notice?.kind, 'default-branch') + assert.ok(result.notice?.message.includes('develop')) + }) + + it('remoteHeadBranch empty, no develop, main exists → main-fallback + warning notice', () => { + const result = detectDefaultBranch({ branchExists: (n) => n === 'main', remoteHeadBranch: '' }) + assert.equal(result.branch, 'main') + assert.equal(result.source, 'main-fallback') + assert.equal(result.notice?.kind, 'default-branch') + assert.ok(result.notice?.message.includes('main')) + }) + + it('remoteHeadBranch empty, no develop/main, master exists → master-fallback + warning notice', () => { + const result = detectDefaultBranch({ branchExists: (n) => n === 'master', remoteHeadBranch: '' }) + assert.equal(result.branch, 'master') + assert.equal(result.source, 'master-fallback') + assert.equal(result.notice?.kind, 'default-branch') + assert.ok(result.notice?.message.includes('master')) + }) + + it('remoteHeadBranch is whitespace-only → falls through to develop fallback', () => { + const result = detectDefaultBranch({ + branchExists: (name) => name === 'develop', + remoteHeadBranch: ' ', + }) + assert.equal(result.branch, 'develop') + assert.equal(result.source, 'develop-fallback') + }) + + it('remoteHeadBranch empty, no branches → source none, branch null, notice present', () => { + const result = detectDefaultBranch({ branchExists: noBranch, remoteHeadBranch: '' }) + assert.equal(result.branch, null) + assert.equal(result.source, 'none') + assert.equal(result.notice?.severity, 'warning') + assert.equal(result.notice?.kind, 'default-branch') + assert.ok(result.notice?.message.length > 0) + }) + + it('fallback chain prioritises develop over main over master', () => { + const result = detectDefaultBranch({ branchExists: allBranches, remoteHeadBranch: '' }) + assert.equal(result.branch, 'develop') + assert.equal(result.source, 'develop-fallback') + }) +}) diff --git a/apps/claude-code/pr-review/tests/fetch-iterations.test.mjs b/apps/claude-code/pr-review/tests/fetch-iterations.test.mjs new file mode 100644 index 0000000..e111fef --- /dev/null +++ b/apps/claude-code/pr-review/tests/fetch-iterations.test.mjs @@ -0,0 +1,83 @@ +// @ts-check + +import assert from 'node:assert/strict' +import { describe, it } from 'node:test' +import { fetchIterations } from '../scripts/ado/fetch-iterations.mjs' + +const iter = (id, commitId) => ({ + id, + sourceRefCommit: commitId != null ? { commitId } : null, +}) + +const okResponse = (iterations) => JSON.stringify({ value: iterations }) + +describe('fetchIterations', () => { + it('single iteration → ok with its id and commit SHA', () => { + const result = fetchIterations({ responseText: okResponse([iter(1, 'abc123')]), exitCode: 0 }) + assert.deepEqual(result, { ok: true, latestIterationId: 1, latestCommitSha: 'abc123' }) + }) + + it('multiple iterations → ok with the max id and its commit SHA', () => { + const result = fetchIterations({ + responseText: okResponse([iter(1, 'aaa'), iter(3, 'ccc'), iter(2, 'bbb')]), + exitCode: 0, + }) + assert.deepEqual(result, { ok: true, latestIterationId: 3, latestCommitSha: 'ccc' }) + }) + + it('iteration with null sourceRefCommit → ok with empty commitSha', () => { + const result = fetchIterations({ responseText: okResponse([iter(2, null)]), exitCode: 0 }) + assert.deepEqual(result, { ok: true, latestIterationId: 2, latestCommitSha: '' }) + }) + + it('empty value array → empty-iterations failure', () => { + const result = fetchIterations({ responseText: okResponse([]), exitCode: 0 }) + assert.equal(result.ok, false) + assert.equal(result.reason, 'empty-iterations') + assert.ok(result.message.length > 0) + }) + + it('non-zero exit with no body → transient failure (network)', () => { + const result = fetchIterations({ responseText: '', exitCode: 1 }) + assert.equal(result.ok, false) + assert.equal(result.reason, 'transient') + assert.match(result.message, /exit 1/) + }) + + it('401 status in response body → auth failure', () => { + const body = JSON.stringify({ statusCode: 401, message: 'Unauthorized' }) + const result = fetchIterations({ responseText: body, exitCode: 1 }) + assert.equal(result.ok, false) + assert.equal(result.reason, 'auth') + assert.match(result.message, /401/) + }) + + it('5xx status in response body → transient failure', () => { + const body = JSON.stringify({ statusCode: 503, message: 'Service Unavailable' }) + const result = fetchIterations({ responseText: body, exitCode: 1 }) + assert.equal(result.ok, false) + assert.equal(result.reason, 'transient') + assert.match(result.message, /503/) + }) + + it('malformed JSON response with zero exit → malformed failure', () => { + const result = fetchIterations({ responseText: 'not-valid-json', exitCode: 0 }) + assert.equal(result.ok, false) + assert.equal(result.reason, 'malformed') + }) + + it('exitCode=0 but value key absent → { ok: false, reason: malformed }', () => { + const r = fetchIterations({ responseText: JSON.stringify({ count: 0 }), exitCode: 0 }) + assert.equal(r.ok, false) + if (!r.ok) assert.equal(r.reason, 'malformed') + }) + + it('HTTP 400 response → { ok: false, reason: malformed }', () => { + const r = fetchIterations({ + responseText: JSON.stringify({ statusCode: 400, message: 'Bad Request' }), + exitCode: 0, + }) + assert.equal(r.ok, false) + if (!r.ok) assert.equal(r.reason, 'malformed') + }) +}) diff --git a/apps/claude-code/pr-review/tests/fetch-work-items.test.mjs b/apps/claude-code/pr-review/tests/fetch-work-items.test.mjs new file mode 100644 index 0000000..b63931b --- /dev/null +++ b/apps/claude-code/pr-review/tests/fetch-work-items.test.mjs @@ -0,0 +1,119 @@ +// @ts-check + +import assert from 'node:assert/strict' +import { describe, it } from 'node:test' +import { fetchWorkItems } from '../scripts/ado/fetch-work-items.mjs' + +describe('fetchWorkItems — OK results', () => { + it('empty value array → { ok: true, ids: [] } (EMPTY-BY-DESIGN)', () => { + const r = fetchWorkItems({ responseText: JSON.stringify({ value: [] }), exitCode: 0 }) + assert.deepEqual(r, { ok: true, ids: [] }) + }) + + it('populated work items → { ok: true, ids: [...] }', () => { + const r = fetchWorkItems({ responseText: JSON.stringify({ value: [{ id: 42 }, { id: 7 }] }), exitCode: 0 }) + assert.deepEqual(r, { ok: true, ids: [42, 7] }) + }) + + it('preserves order of IDs', () => { + const r = fetchWorkItems({ + responseText: JSON.stringify({ value: [{ id: 3 }, { id: 1 }, { id: 2 }] }), + exitCode: 0, + }) + assert.ok(r.ok) + if (r.ok) assert.deepEqual(r.ids, [3, 1, 2]) + }) + + it('null elements in value array are skipped silently', () => { + const r = fetchWorkItems({ + responseText: JSON.stringify({ value: [null, { id: 5 }, null, { id: 9 }] }), + exitCode: 0, + }) + assert.ok(r.ok) + if (r.ok) assert.deepEqual(r.ids, [5, 9]) + }) + + it('non-object elements in value array are skipped', () => { + const r = fetchWorkItems({ + responseText: JSON.stringify({ value: [{ id: 1 }, 'stray-string', { id: 2 }] }), + exitCode: 0, + }) + assert.ok(r.ok) + if (r.ok) assert.deepEqual(r.ids, [1, 2]) + }) +}) + +describe('fetchWorkItems — failure results', () => { + it('non-zero exit code (auth body) → { ok: false, reason: "auth" }', () => { + const r = fetchWorkItems({ + responseText: JSON.stringify({ statusCode: 401, message: 'TF400813: unauthorized' }), + exitCode: 1, + }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.reason, 'auth') + assert.ok(typeof r.message === 'string') + } + }) + + it('non-zero exit code (5xx body) → { ok: false, reason: "transient" }', () => { + const r = fetchWorkItems({ + responseText: JSON.stringify({ statusCode: 503, message: 'Service unavailable' }), + exitCode: 1, + }) + assert.equal(r.ok, false) + if (!r.ok) assert.equal(r.reason, 'transient') + }) + + it('non-zero exit code (4xx non-auth body) → { ok: false, reason: "malformed" }', () => { + const r = fetchWorkItems({ + responseText: JSON.stringify({ statusCode: 400, message: 'Bad request' }), + exitCode: 1, + }) + assert.equal(r.ok, false) + if (!r.ok) assert.equal(r.reason, 'malformed') + }) + + it('non-zero exit with no parseable body → { ok: false, reason: "transient" }', () => { + const r = fetchWorkItems({ responseText: '', exitCode: 1 }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.reason, 'transient') + assert.ok(typeof r.message === 'string') + } + }) + + it('non-zero exit with auth body excerpt → message includes auth-related text', () => { + const r = fetchWorkItems({ responseText: 'TF401349: OAuth token is not valid', exitCode: 1 }) + assert.equal(r.ok, false) + if (!r.ok) assert.ok(r.message.includes('TF401349')) + }) + + it('exitCode=0 but empty responseText → { ok: false }', () => { + const r = fetchWorkItems({ responseText: '', exitCode: 0 }) + assert.equal(r.ok, false) + }) + + it('exitCode=0 but malformed JSON → { ok: false, reason: malformed }', () => { + const r = fetchWorkItems({ responseText: '<<>>', exitCode: 0 }) + assert.equal(r.ok, false) + if (!r.ok) assert.equal(r.reason, 'malformed') + }) + + it('exitCode=0 but response has no value key → { ok: false, reason: malformed }', () => { + const r = fetchWorkItems({ responseText: JSON.stringify({ count: 0 }), exitCode: 0 }) + assert.equal(r.ok, false) + if (!r.ok) assert.equal(r.reason, 'malformed') + }) + + it('ADO error response body (non-zero exit, 401 status) → { ok: false, reason: "auth" }', () => { + const errorBody = JSON.stringify({ + statusCode: 401, + message: 'VS403487: The client is unauthorized.', + errorCode: 0, + }) + const r = fetchWorkItems({ responseText: errorBody, exitCode: 1 }) + assert.equal(r.ok, false) + if (!r.ok) assert.equal(r.reason, 'auth') + }) +}) diff --git a/apps/claude-code/pr-review/tests/match-finding.test.mjs b/apps/claude-code/pr-review/tests/match-finding.test.mjs index 72939f6..c75d281 100644 --- a/apps/claude-code/pr-review/tests/match-finding.test.mjs +++ b/apps/claude-code/pr-review/tests/match-finding.test.mjs @@ -74,4 +74,18 @@ describe('matchFinding', () => { const result = matchFinding({ finding, priorThreads }) assert.equal(result, null) }) + + it('throws TypeError when priorThreads is not an array', () => { + const finding = { filePath: '/src/api.ts', startLine: 42, endLine: 42 } + assert.throws(() => matchFinding({ finding, priorThreads: 'not-an-array' }), TypeError) + }) + + it('throws TypeError when finding is null', () => { + assert.throws(() => matchFinding({ finding: null, priorThreads: [] }), TypeError) + }) + + it('throws TypeError when finding has wrong field types', () => { + const finding = { filePath: '/src/api.ts', startLine: '42', endLine: 42 } + assert.throws(() => matchFinding({ finding, priorThreads: [] }), TypeError) + }) }) diff --git a/apps/claude-code/pr-review/tests/mode-detection.test.mjs b/apps/claude-code/pr-review/tests/mode-detection.test.mjs new file mode 100644 index 0000000..aef8671 --- /dev/null +++ b/apps/claude-code/pr-review/tests/mode-detection.test.mjs @@ -0,0 +1,72 @@ +// @ts-check + +import assert from 'node:assert/strict' +import { describe, it } from 'node:test' +import { detectMode, formatModeEnv } from '../scripts/mode-detection.mjs' + +const SIGNATURE_PREFIX = '🤖 *Reviewed by Claude Code*' + +describe('detectMode', () => { + it('no threads → first-review with empty fields', () => { + const r = detectMode({ threads: [], signaturePrefix: SIGNATURE_PREFIX }) + assert.equal(r.mode, 'first-review') + assert.equal(r.isRereview, false) + assert.equal(r.priorIterationId, '') + assert.equal(r.summaryThreadId, '') + }) + + it('non-array threads → first-review (defensive)', () => { + // @ts-expect-error — exercising defensive path + const r = detectMode({ threads: null, signaturePrefix: SIGNATURE_PREFIX }) + assert.equal(r.mode, 'first-review') + assert.equal(r.isRereview, false) + }) + + it('threads without signature → first-review', () => { + const threads = [{ id: 1, comments: [{ content: 'hello from a human' }] }] + const r = detectMode({ threads, signaturePrefix: SIGNATURE_PREFIX }) + assert.equal(r.mode, 'first-review') + assert.equal(r.isRereview, false) + }) + + it('thread with signature and iteration → re-review with stringified fields', () => { + const threads = [ + { + id: 42, + threadContext: null, + comments: [ + { + content: `## PR Review Summary\n\nfoo\n\n---\n${SIGNATURE_PREFIX} — Iteration 3`, + }, + ], + }, + ] + const r = detectMode({ threads, signaturePrefix: SIGNATURE_PREFIX }) + assert.equal(r.mode, 're-review') + assert.equal(r.isRereview, true) + assert.equal(r.priorIterationId, '3') + assert.equal(r.summaryThreadId, '42') + }) +}) + +describe('formatModeEnv', () => { + it('emits four KEY=value lines for first-review', () => { + const r = detectMode({ threads: [], signaturePrefix: SIGNATURE_PREFIX }) + const env = formatModeEnv(r) + assert.equal( + env, + ['MODE=first-review', 'IS_REREVIEW=false', 'PRIOR_ITERATION_ID=', 'SUMMARY_THREAD_ID='].join('\n') + ) + }) + + it('emits stringified IDs for re-review', () => { + const r = { + /** @type {'re-review'} */ mode: /** @type {const} */ ('re-review'), + isRereview: true, + priorIterationId: '3', + summaryThreadId: '42', + } + const env = formatModeEnv(r) + assert.equal(env, ['MODE=re-review', 'IS_REREVIEW=true', 'PRIOR_ITERATION_ID=3', 'SUMMARY_THREAD_ID=42'].join('\n')) + }) +}) diff --git a/apps/claude-code/pr-review/tests/notices.test.mjs b/apps/claude-code/pr-review/tests/notices.test.mjs new file mode 100644 index 0000000..805ff77 --- /dev/null +++ b/apps/claude-code/pr-review/tests/notices.test.mjs @@ -0,0 +1,139 @@ +// @ts-check + +import assert from 'node:assert/strict' +import { describe, it } from 'node:test' +import { + createNotice, + formatNoticesAsPrePrPreamble, + formatNoticesAsSummaryBlock, + formatTrailer, + mergeNotices, +} from '../scripts/ado/notices.mjs' + +describe('createNotice', () => { + it('returns a Notice with the canonical shape', () => { + const n = createNotice('info', 'doc-context', 'hello') + assert.deepEqual(n, { severity: 'info', kind: 'doc-context', message: 'hello' }) + }) +}) + +describe('mergeNotices', () => { + it('returns [] when no sources are passed', () => { + assert.deepEqual(mergeNotices(), []) + }) + + it('returns [] when all sources are empty', () => { + assert.deepEqual(mergeNotices([], []), []) + }) + + it('preserves order across sources', () => { + const a = [createNotice('warning', 'work-items', 'a')] + const b = [createNotice('warning', 'diff-range', 'b')] + assert.deepEqual(mergeNotices(a, b), [ + { severity: 'warning', kind: 'work-items', message: 'a' }, + { severity: 'warning', kind: 'diff-range', message: 'b' }, + ]) + }) + + it('dedupes by kind across sources — first wins', () => { + const a = [createNotice('warning', 'work-items', 'first')] + const b = [createNotice('warning', 'work-items', 'second')] + assert.deepEqual(mergeNotices(a, b), [{ severity: 'warning', kind: 'work-items', message: 'first' }]) + }) +}) + +describe('formatNoticesAsSummaryBlock', () => { + it('returns empty string for empty input', () => { + assert.equal(formatNoticesAsSummaryBlock([]), '') + }) + + it('renders heading + per-severity emoji lines', () => { + const notices = [ + createNotice('info', 'doc-context', 'No work items linked.'), + createNotice('warning', 'diff-range', 'Incremental diff unavailable.'), + ] + const out = formatNoticesAsSummaryBlock(notices) + assert.ok(out.startsWith('## Notices')) + assert.ok(out.includes('ℹ️ No work items linked.')) + assert.ok(out.includes('⚠ Incremental diff unavailable.')) + }) +}) + +describe('formatNoticesAsPrePrPreamble', () => { + it('returns empty string for empty input', () => { + assert.equal(formatNoticesAsPrePrPreamble([]), '') + }) + + it('omits the heading and renders one per-severity line per Notice', () => { + const notices = [createNotice('warning', 'default-branch', 'Default branch fallback.')] + assert.equal(formatNoticesAsPrePrPreamble(notices), '⚠ Default branch fallback.') + }) +}) + +describe('formatTrailer', () => { + it('first-review with findings and one info notice', () => { + const out = formatTrailer({ + mode: 'first-review', + findings: { critical: 1, important: 2, minor: 0 }, + notices: [createNotice('info', 'doc-context', 'x')], + prUrl: 'https://example.com/pr/1', + }) + assert.equal( + out, + '✅ Review posted: 3 findings (1 critical, 2 important) · 0 warning notices · 1 info notice → https://example.com/pr/1' + ) + }) + + it('singular "finding" when only one', () => { + const out = formatTrailer({ + mode: 'first-review', + findings: { critical: 0, important: 1, minor: 0 }, + notices: [], + prUrl: 'https://example.com/pr/2', + }) + assert.ok(out.startsWith('✅ Review posted: 1 finding (')) + }) + + it('pre-pr mode omits info-notice count and PR URL', () => { + const out = formatTrailer({ + mode: 'pre-pr', + findings: { critical: 0, important: 0, minor: 3 }, + notices: [createNotice('warning', 'default-branch', 'fb')], + }) + assert.equal(out, '✅ Pre-PR review complete: 3 findings (0 critical, 0 important) · 1 warning notice') + }) + + it('aborted mode prints reason + kind', () => { + const out = formatTrailer({ mode: 'aborted', abortKind: 'auth', abortReason: 'token expired' }) + assert.equal(out, '❌ Review aborted: auth — token expired') + }) + + it('aborted mode with missing fields produces a still-readable line', () => { + assert.equal(formatTrailer({ mode: 'aborted' }), '❌ Review aborted: unknown') + }) + + it('aborted with no abortReason omits separator', () => { + assert.equal(formatTrailer({ mode: 'aborted', abortKind: 'auth' }), '❌ Review aborted: auth') + }) + + it('re-review mode produces same trailer format as first-review', () => { + const out = formatTrailer({ + mode: 're-review', + findings: { critical: 1, important: 0, minor: 0 }, + notices: [], + prUrl: 'https://dev.azure.com/org/proj/_git/repo/pullrequest/42', + }) + assert.ok(out.startsWith('✅ Review posted:')) + assert.ok(out.includes('https://dev.azure.com')) + }) +}) + +describe('mergeNotices', () => { + it('mergeNotices tolerates null and undefined sources', () => { + const n = createNotice('info', 'doc-context', 'test') + // @ts-ignore — intentional test of runtime tolerance for null/undefined + const result = mergeNotices(null, [n], undefined) + assert.equal(result.length, 1) + assert.equal(result[0].kind, 'doc-context') + }) +}) diff --git a/apps/claude-code/pr-review/tests/parse-diff-hunks.test.mjs b/apps/claude-code/pr-review/tests/parse-diff-hunks.test.mjs new file mode 100644 index 0000000..933d9a3 --- /dev/null +++ b/apps/claude-code/pr-review/tests/parse-diff-hunks.test.mjs @@ -0,0 +1,80 @@ +// @ts-check + +import assert from 'node:assert/strict' +import { describe, it } from 'node:test' +import { parseDiffHunks } from '../scripts/re-review/parse-diff-hunks.mjs' + +describe('parseDiffHunks', () => { + it('returns [] for empty input', () => { + assert.deepEqual(parseDiffHunks(''), []) + }) + + it('parses a single-file single-hunk diff into one slash-prefixed entry', () => { + const diff = [ + 'diff --git a/src/foo.ts b/src/foo.ts', + 'index abc..def 100644', + '--- a/src/foo.ts', + '+++ b/src/foo.ts', + '@@ -10,3 +10,5 @@', + ' context', + '+added', + '+added', + ].join('\n') + const result = parseDiffHunks(diff) + assert.deepEqual(result, [{ filePath: '/src/foo.ts', startLine: 10, endLine: 14 }]) + }) + + it('preserves per-hunk granularity across multi-file diff (no dedup)', () => { + const diff = [ + 'diff --git a/src/a.ts b/src/a.ts', + '@@ -1,2 +1,2 @@', + ' x', + '+y', + '@@ -20,1 +20,3 @@', + '+a', + '+b', + '+c', + 'diff --git a/src/b.ts b/src/b.ts', + '@@ -5,1 +5,1 @@', + '-old', + '+new', + ].join('\n') + const result = parseDiffHunks(diff) + assert.deepEqual(result, [ + { filePath: '/src/a.ts', startLine: 1, endLine: 2 }, + { filePath: '/src/a.ts', startLine: 20, endLine: 22 }, + { filePath: '/src/b.ts', startLine: 5, endLine: 5 }, + ]) + }) + + it('defaults count to 1 when hunk header omits the count (@@ -1 +5 @@)', () => { + const diff = ['diff --git a/x.md b/x.md', '@@ -1 +5 @@', '+only-line'].join('\n') + const result = parseDiffHunks(diff) + assert.deepEqual(result, [{ filePath: '/x.md', startLine: 5, endLine: 5 }]) + }) + + it('skips hunk headers that lack the +side (binary diff or pure delete header)', () => { + const diff = [ + 'diff --git a/bin/blob.png b/bin/blob.png', + 'Binary files a/bin/blob.png and b/bin/blob.png differ', + 'diff --git a/src/keep.ts b/src/keep.ts', + '@@ -3,2 +3,2 @@', + '-old', + '+new', + ].join('\n') + const result = parseDiffHunks(diff) + assert.deepEqual(result, [{ filePath: '/src/keep.ts', startLine: 3, endLine: 4 }]) + }) + + it('is robust to CRLF line endings', () => { + const diff = ['diff --git a/src/foo.ts b/src/foo.ts', '@@ -10,2 +12,3 @@', ' ctx', '+a', '+b'].join('\r\n') + const result = parseDiffHunks(diff) + assert.deepEqual(result, [{ filePath: '/src/foo.ts', startLine: 12, endLine: 14 }]) + }) + + it('ignores hunk headers that appear before any diff --git line (no current file)', () => { + const diff = ['@@ -1,2 +1,2 @@', ' a', '+b'].join('\n') + const result = parseDiffHunks(diff) + assert.deepEqual(result, []) + }) +}) diff --git a/apps/claude-code/pr-review/tests/parse-write-response.test.mjs b/apps/claude-code/pr-review/tests/parse-write-response.test.mjs new file mode 100644 index 0000000..68a89a3 --- /dev/null +++ b/apps/claude-code/pr-review/tests/parse-write-response.test.mjs @@ -0,0 +1,165 @@ +// @ts-check + +import assert from 'node:assert/strict' +import { describe, it } from 'node:test' +import { parseWriteResponse } from '../scripts/ado/parse-write-response.mjs' + +describe('parseWriteResponse — OK tier', () => { + it('HTTP 200 with numeric id → { ok: true, id: N }', () => { + const r = parseWriteResponse({ httpExit: 0, responseText: '{"id":123,"url":"https://dev.azure.com/..."}' }) + assert.deepEqual(r, { ok: true, id: 123 }) + }) + + it('HTTP 201 with numeric id → { ok: true, id: N }', () => { + const r = parseWriteResponse({ httpExit: 0, responseText: '{"id":42}' }) + assert.deepEqual(r, { ok: true, id: 42 }) + }) + + it('HTTP 404 (domain ok — thread deleted) → { ok: true, id: null }', () => { + const r = parseWriteResponse({ + httpExit: 1, + responseText: '{"statusCode":404,"message":"Thread not found"}', + }) + assert.deepEqual(r, { ok: true, id: null }) + }) + + it('HTTP 409 (domain ok — state already changed) → { ok: true, id: null }', () => { + const r = parseWriteResponse({ + httpExit: 1, + responseText: '{"statusCode":409,"message":"Status already fixed"}', + }) + assert.deepEqual(r, { ok: true, id: null }) + }) +}) + +describe('parseWriteResponse — ABORTED tier', () => { + it('HTTP 401 → { ok: false, tier: aborted, kind: auth }', () => { + const r = parseWriteResponse({ + httpExit: 1, + responseText: '{"statusCode":401,"message":"Unauthorized"}', + }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.tier, 'aborted') + assert.equal(r.kind, 'auth') + assert.ok(r.message.length > 0) + } + }) + + it('HTTP 403 → { ok: false, tier: aborted, kind: auth }', () => { + const r = parseWriteResponse({ + httpExit: 1, + responseText: '{"statusCode":403,"message":"Forbidden"}', + }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.tier, 'aborted') + assert.equal(r.kind, 'auth') + } + }) +}) + +describe('parseWriteResponse — DEGRADED tier', () => { + it('HTTP 500 → { ok: false, tier: degraded, kind: transient }', () => { + const r = parseWriteResponse({ + httpExit: 1, + responseText: '{"statusCode":500,"message":"Internal Server Error"}', + }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'transient') + } + }) + + it('HTTP 503 → { ok: false, tier: degraded, kind: transient }', () => { + const r = parseWriteResponse({ + httpExit: 1, + responseText: '{"statusCode":503,"message":"Service Unavailable"}', + }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'transient') + } + }) + + it('network error (exitCode=1, no body) → { ok: false, tier: degraded, kind: network }', () => { + const r = parseWriteResponse({ httpExit: 1, responseText: '' }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'network') + } + }) + + it('network error (exitCode=2, plain text body) → { ok: false, tier: degraded, kind: network }', () => { + const r = parseWriteResponse({ httpExit: 2, responseText: 'connection refused' }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'network') + } + }) + + it('HTTP 400 response → { ok: false, tier: degraded, kind: malformed-request }', () => { + const r = parseWriteResponse({ httpExit: 0, responseText: JSON.stringify({ statusCode: 400 }) }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'malformed-request') + } + }) + + it('HTTP 422 response → { ok: false, tier: degraded, kind: malformed-request }', () => { + const r = parseWriteResponse({ httpExit: 0, responseText: JSON.stringify({ statusCode: 422 }) }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'malformed-request') + } + }) + + it('malformed JSON body with non-zero exit → { ok: false, tier: degraded, kind: network }', () => { + const r = parseWriteResponse({ httpExit: 1, responseText: '<<>>' }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'network') + } + }) + + it('malformed JSON body with zero exit → { ok: false, tier: degraded, kind: malformed-response }', () => { + const r = parseWriteResponse({ httpExit: 0, responseText: '<<>>' }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'malformed-response') + } + }) + + it('missing id field on 200 response → { ok: false, tier: degraded, kind: malformed-response }', () => { + const r = parseWriteResponse({ + httpExit: 0, + responseText: '{"result":"ok","type":"comment"}', + }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.tier, 'degraded') + assert.equal(r.kind, 'malformed-response') + } + }) + + it('errStream content appears in malformed-response message', () => { + const r = parseWriteResponse({ + httpExit: 0, + responseText: '{"result":"ok"}', + errStream: 'az: error: something went wrong', + }) + assert.equal(r.ok, false) + if (!r.ok) { + assert.equal(r.kind, 'malformed-response') + assert.ok(r.message.includes('az: error: something went wrong')) + } + }) +}) diff --git a/apps/claude-code/pr-review/tests/pre-pr.test.mjs b/apps/claude-code/pr-review/tests/pre-pr.test.mjs new file mode 100644 index 0000000..99ce23f --- /dev/null +++ b/apps/claude-code/pr-review/tests/pre-pr.test.mjs @@ -0,0 +1,391 @@ +// @ts-check + +import assert from 'node:assert/strict' +import { readFileSync } from 'node:fs' +import { describe, it } from 'node:test' +import { buildPrePrContext, parseChangedFilesFromDiff, shouldSkipFile } from '../scripts/pre-pr.mjs' + +/** Reads the review-pr command for content assertions */ +const commandContent = readFileSync(new URL('../commands/review-pr.md', import.meta.url), 'utf8') + +// --------------------------------------------------------------------------- +// parseChangedFilesFromDiff +// --------------------------------------------------------------------------- + +describe('parseChangedFilesFromDiff', () => { + it('empty diff → returns empty array', () => { + assert.deepEqual(parseChangedFilesFromDiff(''), []) + }) + + it('single changed file → returns one path with leading slash', () => { + const diff = `diff --git a/src/api.ts b/src/api.ts +index 1234567..abcdefg 100644 +--- a/src/api.ts ++++ b/src/api.ts +@@ -1,3 +1,4 @@ + unchanged ++added line +` + const result = parseChangedFilesFromDiff(diff) + assert.deepEqual(result, ['/src/api.ts']) + }) + + it('multiple changed files → returns all paths', () => { + const diff = `diff --git a/src/api.ts b/src/api.ts +index 1234567..abcdefg 100644 +--- a/src/api.ts ++++ b/src/api.ts +@@ -1,1 +1,2 @@ ++added +diff --git a/tests/api.test.ts b/tests/api.test.ts +index 1111111..2222222 100644 +--- a/tests/api.test.ts ++++ b/tests/api.test.ts +@@ -1,1 +1,2 @@ ++test added +` + const result = parseChangedFilesFromDiff(diff) + assert.deepEqual(result, ['/src/api.ts', '/tests/api.test.ts']) + }) + + it('renamed file uses b/ path (new name)', () => { + const diff = `diff --git a/old/name.ts b/new/name.ts +similarity index 90% +rename from old/name.ts +rename to new/name.ts +` + const result = parseChangedFilesFromDiff(diff) + assert.deepEqual(result, ['/new/name.ts']) + }) + + it('deduplicates identical paths from multiple hunks', () => { + const diff = `diff --git a/src/index.ts b/src/index.ts +index 111..222 100644 +--- a/src/index.ts ++++ b/src/index.ts +@@ -1,2 +1,3 @@ ++first hunk +diff --git a/src/index.ts b/src/index.ts +@@ -10,2 +11,3 @@ ++second hunk +` + const result = parseChangedFilesFromDiff(diff) + assert.deepEqual(result, ['/src/index.ts']) + }) + + it('nested directory paths preserved', () => { + const diff = `diff --git a/a/b/c/deep.ts b/a/b/c/deep.ts +index 000..111 100644 +` + const result = parseChangedFilesFromDiff(diff) + assert.deepEqual(result, ['/a/b/c/deep.ts']) + }) + + it('CRLF-separated diff produces clean paths (no trailing \\r)', () => { + const diff = + 'diff --git a/src/foo.ts b/src/foo.ts\r\nindex 000..111 100644\r\n--- a/src/foo.ts\r\n+++ b/src/foo.ts\r\n' + const result = parseChangedFilesFromDiff(diff) + assert.deepEqual(result, ['/src/foo.ts']) + }) +}) + +// --------------------------------------------------------------------------- +// shouldSkipFile +// --------------------------------------------------------------------------- + +describe('shouldSkipFile', () => { + it('non-generated .ts file → false (keep)', () => { + assert.equal(shouldSkipFile('/src/api.ts'), false) + }) + + it('*.g.cs file → true (skip)', () => { + assert.equal(shouldSkipFile('/Generated/Models/UserModel.g.cs'), true) + }) + + it('swagger.md → true (skip)', () => { + assert.equal(shouldSkipFile('/docs/swagger.md'), true) + }) + + it('swagger.json → true (skip)', () => { + assert.equal(shouldSkipFile('/api/swagger.json'), true) + }) + + it('serialization YAML ending in .serialization.yaml → true (skip)', () => { + assert.equal(shouldSkipFile('/config/types.serialization.yaml'), true) + }) + + it('regular .yaml file → false (keep)', () => { + assert.equal(shouldSkipFile('/config/pipeline.yaml'), false) + }) + + it('regular .yml file → false (keep)', () => { + assert.equal(shouldSkipFile('/config/ci.yml'), false) + }) + + it('file named generated-types.ts → true (skip)', () => { + assert.equal(shouldSkipFile('/src/generated-types.ts'), true) + }) + + it('file under a generated/ directory → true (skip)', () => { + assert.equal(shouldSkipFile('/src/generated/api-client.ts'), true) + }) + + it('file under a capitalised Generated/ directory (.NET-style) → true (skip)', () => { + assert.equal(shouldSkipFile('/Source/Generated/ApiClient.cs'), true) + }) + + it('normal source file with no skip pattern → false (keep)', () => { + assert.equal(shouldSkipFile('/src/services/user.service.ts'), false) + }) +}) + +// --------------------------------------------------------------------------- +// buildPrePrContext +// --------------------------------------------------------------------------- + +describe('buildPrePrContext', () => { + it('returns rawDiff unchanged', () => { + const diff = `diff --git a/src/foo.ts b/src/foo.ts\nindex 000..111 100644\n` + const ctx = buildPrePrContext(diff) + assert.equal(ctx.rawDiff, diff) + }) + + it('changedFiles contains all parsed paths', () => { + const diff = `diff --git a/src/foo.ts b/src/foo.ts\nindex 000..111 100644\n` + const ctx = buildPrePrContext(diff) + assert.deepEqual(ctx.changedFiles, ['/src/foo.ts']) + }) + + it('filteredFiles excludes skipped files', () => { + const diff = [ + 'diff --git a/src/api.ts b/src/api.ts', + 'index 000..111 100644', + 'diff --git a/Generated/Foo.g.cs b/Generated/Foo.g.cs', + 'index 222..333 100644', + ].join('\n') + const ctx = buildPrePrContext(diff) + assert.deepEqual(ctx.changedFiles, ['/src/api.ts', '/Generated/Foo.g.cs']) + assert.deepEqual(ctx.filteredFiles, ['/src/api.ts']) + }) + + it('empty diff → all arrays empty', () => { + const ctx = buildPrePrContext('') + assert.deepEqual(ctx.changedFiles, []) + assert.deepEqual(ctx.filteredFiles, []) + assert.equal(ctx.rawDiff, '') + }) + + it('returns empty notices for a normal diff', () => { + const diff = `diff --git a/src/foo.ts b/src/foo.ts\nindex 000..111 100644\n` + const ctx = buildPrePrContext(diff) + assert.deepEqual(ctx.notices, []) + }) + + it('suspicious-shape: diff --git header present but zero paths parsed → DEGRADED diff-parse Notice', () => { + // A line that looks like a diff header but has an empty b/ path won't match the regex + const diff = `diff --git a/foo b/\nindex 000..111 100644\n` + const ctx = buildPrePrContext(diff) + assert.equal(ctx.changedFiles.length, 0) + assert.equal(ctx.notices.length, 1) + assert.equal(ctx.notices[0].kind, 'diff-parse') + assert.equal(ctx.notices[0].severity, 'warning') + }) + + it('no suspicious-shape Notice when diff is empty (not malformed)', () => { + const ctx = buildPrePrContext('') + assert.deepEqual(ctx.notices, []) + }) +}) + +// --------------------------------------------------------------------------- +// review-pr.md command content — compact sub-agent output guidance +// --------------------------------------------------------------------------- + +describe('review-pr command — compact sub-agent output guidance', () => { + /** Slice of Step 6 — the review-agent launch step in ADO modes */ + const step6Section = commandContent.slice(commandContent.indexOf('## Step 6'), commandContent.indexOf('## Step 7')) + + /** Pre-PR step D — the review-agent launch step in pre-PR mode */ + const stepDSection = commandContent.slice(commandContent.indexOf('### Step D'), commandContent.indexOf('### Step E')) + + /** The shared "Compact finding schema" block referenced by both Step 6 and Step D */ + const schemaStart = commandContent.indexOf('### Compact finding schema') + const schemaEnd = commandContent.indexOf('### Aspect-filter selection') + const schemaSection = schemaStart >= 0 && schemaEnd > schemaStart ? commandContent.slice(schemaStart, schemaEnd) : '' + + it('orchestrator defines a single Compact finding schema block', () => { + assert.ok(schemaSection.length > 0, 'review-pr.md must define a "### Compact finding schema" block') + }) + + it('Step 6 references the compact finding schema', () => { + assert.ok( + step6Section.toLowerCase().includes('compact finding schema'), + 'Step 6 must reference the shared compact finding schema' + ) + }) + + it('Step D references the compact finding schema', () => { + assert.ok( + stepDSection.toLowerCase().includes('compact finding schema'), + 'Step D must reference the shared compact finding schema' + ) + }) + + it('schema instructs agents to return a JSON array of findings', () => { + assert.ok( + schemaSection.includes('JSON') && schemaSection.includes('array'), + 'Compact finding schema must instruct review agents to return a JSON array of findings' + ) + }) + + it('schema requires all six finding fields', () => { + const requiredFields = ['severity', 'filePath', 'startLine', 'endLine', 'title', 'body'] + for (const field of requiredFields) { + assert.ok(schemaSection.includes(field), `Compact finding schema must mention required field: ${field}`) + } + }) + + it('schema instructs agents to omit code quotes from return value', () => { + assert.ok( + schemaSection.toLowerCase().includes('code quote'), + 'Compact finding schema must instruct agents to omit code quotes from the return value' + ) + }) + + it('schema instructs agents to omit prose reasoning from return value', () => { + assert.ok( + schemaSection.toLowerCase().includes('reasoning') || + schemaSection.toLowerCase().includes('prose') || + schemaSection.toLowerCase().includes('analysis'), + 'Compact finding schema must instruct agents to keep reasoning inside their own context, not in return value' + ) + }) + + it('schema severity values are exactly critical / important / minor', () => { + assert.ok(schemaSection.includes('critical'), 'Compact finding schema must specify "critical" as a severity value') + assert.ok( + schemaSection.includes('important'), + 'Compact finding schema must specify "important" as a severity value' + ) + assert.ok(schemaSection.includes('minor'), 'Compact finding schema must specify "minor" as a severity value') + }) + + it('schema requires filePath to use leading slash and forward slashes', () => { + assert.ok( + schemaSection.includes('leading') || + schemaSection.includes('forward slash') || + schemaSection.includes('leading /'), + 'Compact finding schema must require filePath with leading slash and forward slashes matching ADO format' + ) + }) + + it('schema requires title to be one line capped at 80 chars', () => { + assert.ok( + schemaSection.includes('80') || schemaSection.includes('one line') || schemaSection.includes('≤ 80'), + 'Compact finding schema must require title to be one line, at most 80 characters' + ) + }) + + it('schema describes body as the exact text to post as the ADO or local-interface comment', () => { + assert.ok( + schemaSection.includes('body') && (schemaSection.includes('post') || schemaSection.includes('comment')), + 'Compact finding schema must describe body as the exact text to post as the ADO or local-interface comment' + ) + }) +}) + +// --------------------------------------------------------------------------- +// review-pr.md command content — Pre-PR mode section +// --------------------------------------------------------------------------- + +describe('review-pr command — Pre-PR mode', () => { + it('no longer contains "not yet implemented" stub', () => { + assert.ok( + !commandContent.includes('not yet implemented'), + 'Pre-PR mode stub must be replaced with real implementation' + ) + }) + + it('prints a console message confirming Pre-PR mode', () => { + assert.ok( + commandContent.includes('Pre-PR mode') || commandContent.includes('pre-PR mode'), + 'Command must print a Pre-PR mode confirmation message' + ) + }) + + it('uses git diff against upstream to get the local branch diff', () => { + assert.ok( + commandContent.includes('git diff') && commandContent.includes('origin/'), + 'Command must use git diff origin/...HEAD for Pre-PR mode' + ) + }) + + it('uses pre-pr.mjs helpers for diff parsing', () => { + assert.ok(commandContent.includes('pre-pr.mjs'), 'Command must import from pre-pr.mjs in Pre-PR mode') + }) + + it('launches review aspect agents in Pre-PR mode', () => { + assert.ok( + commandContent.includes('pr-review-toolkit:code-reviewer') && + commandContent.includes('pr-review-toolkit:silent-failure-hunter'), + 'Command must launch pr-review-toolkit review agents in Pre-PR mode' + ) + }) + + it('presents findings with severity, filePath, line range, title, body', () => { + const preprSection = commandContent.slice(commandContent.indexOf('## Pre-PR mode')) + assert.ok(preprSection.includes('severity'), 'Findings must include severity') + assert.ok(preprSection.includes('filePath'), 'Findings must include filePath') + assert.ok( + preprSection.includes('startLine') || preprSection.includes('line range') || preprSection.includes('line'), + 'Findings must include line range' + ) + assert.ok(preprSection.includes('title'), 'Findings must include title') + assert.ok(preprSection.includes('body'), 'Findings must include body') + }) + + it('contains no ADO API calls (az devops invoke / az repos)', () => { + const preprSection = commandContent.slice(commandContent.indexOf('## Pre-PR mode')) + assert.ok( + !preprSection.includes('az devops invoke') && !preprSection.includes('az repos'), + 'Pre-PR mode must not make ADO API calls' + ) + }) + + it('respects aspect filter from $ARGUMENTS', () => { + const preprSection = commandContent.slice(commandContent.indexOf('## Pre-PR mode')) + assert.ok( + preprSection.includes('aspect') || preprSection.includes('ARGUMENTS') || preprSection.includes('filter'), + 'Pre-PR mode must respect the aspect filter from $ARGUMENTS' + ) + }) + + it('does not invoke ADO Fetcher agent in Pre-PR mode', () => { + const preprSection = commandContent.slice(commandContent.indexOf('## Pre-PR mode')) + assert.ok(!preprSection.includes('ado-fetcher'), 'Pre-PR mode must not invoke the ado-fetcher agent') + }) + + it('does not invoke ADO Writer agent in Pre-PR mode', () => { + const preprSection = commandContent.slice(commandContent.indexOf('## Pre-PR mode')) + assert.ok(!preprSection.includes('ado-writer'), 'Pre-PR mode must not invoke the ado-writer agent') + }) + + it('does not invoke Re-review Coordinator in Pre-PR mode', () => { + const preprSection = commandContent.slice(commandContent.indexOf('## Pre-PR mode')) + assert.ok( + !preprSection.includes('re-review-coordinator'), + 'Pre-PR mode must not invoke the re-review-coordinator agent' + ) + }) + + it('prints a clear completion message when done', () => { + const preprSection = commandContent.slice(commandContent.indexOf('## Pre-PR mode')) + assert.ok( + preprSection.includes('complete') || + preprSection.includes('done') || + preprSection.includes('finished') || + preprSection.includes('✅'), + 'Pre-PR mode must print a completion message' + ) + }) +}) diff --git a/apps/claude-code/unic-confluence/CONTRIBUTING.md b/apps/claude-code/unic-confluence/CONTRIBUTING.md index 6f1d2cb..b2fd947 100644 --- a/apps/claude-code/unic-confluence/CONTRIBUTING.md +++ b/apps/claude-code/unic-confluence/CONTRIBUTING.md @@ -4,11 +4,11 @@ This project uses a spec-driven development workflow. New features and fixes are ## Prerequisites -| Tool | Version | How to get it | -| --------------- | ----------------- | ---------------------------------------------------------------------- | -| Node.js | ≥ 24 (Active LTS) | [nodejs.org](https://nodejs.org) | -| pnpm | ≥ 10 | `npm install -g pnpm` | -| Claude Code CLI | latest | [claude.ai/code](https://claude.ai/code) — required as Ralph's backend | +| Tool | Version | How to get it | +| --------------- | ----------------------------------------------- | ---------------------------------------------------------------------- | +| Node.js | ≥ 22 (see `.nvmrc` for the recommended version) | [nodejs.org](https://nodejs.org) | +| pnpm | ≥ 10 | `npm install -g pnpm` | +| Claude Code CLI | latest | [claude.ai/code](https://claude.ai/code) — required as Ralph's backend | Everything else (Ralph Orchestrator, Biome, TypeScript) is a project devDependency and installs with: diff --git a/apps/claude-code/unic-confluence/package.json b/apps/claude-code/unic-confluence/package.json index 202705b..21a4ad3 100644 --- a/apps/claude-code/unic-confluence/package.json +++ b/apps/claude-code/unic-confluence/package.json @@ -10,7 +10,7 @@ "typescript": "catalog:" }, "engines": { - "node": ">=24", + "node": ">=22", "pnpm": ">=10" }, "license": "LGPL-3.0-or-later", diff --git a/docs/adr/0027-feature-runner-context-bundle.md b/docs/adr/0027-feature-runner-context-bundle.md new file mode 100644 index 0000000..7936644 --- /dev/null +++ b/docs/adr/0027-feature-runner-context-bundle.md @@ -0,0 +1,43 @@ +# 0027. Feature Runner injects a scoped context bundle into every `/tdd` sub-agent invocation + +**Status:** Accepted (2026-05) + +## Context + +The Feature Runner invokes `/tdd` as a non-interactive sub-agent (via the Agent tool) for each issue in a feature. The first draft of the Feature Runner PRD specified "the issue file content as context" as the sole input to each `/tdd` invocation. + +This is insufficient. `/tdd` is designed to be interactive: its planning phase asks the user to confirm interface changes and approve which behaviours to test before writing any code. In AFK mode there is no user to ask. Without additional context, `/tdd` reasons from a single vertical slice with no access to the "why" behind the feature, the architectural constraints that apply, or the vocabulary the codebase uses. This creates a risk of a correct-but-wrong implementation — code that satisfies the issue's literal description but diverges from the intent established during the grilling and PRD sessions. + +Matt Pocock's reference AFK loop (`afk.sh`) injects all issue files plus recent commits as a single prompt string before invoking `/tdd`, establishing the precedent that the agent needs broader context than a single work item. + +## Decision + +The Feature Runner assembles a **context bundle** for each `/tdd` sub-agent invocation: + +1. **Issue file** — `## What to build` and `## Acceptance criteria` serve as the pre-answered planning conversation (see ADR-0029). +2. **PRD** — read from the feature's `docs/issues//PRD.md` (the slug is the feature directory name; the PRD lives at a fixed path within it). Carries the shared vision from the grilling session and the "why" behind the feature. Without it, the agent lacks the context needed to judge correctness beyond the literal issue description. +3. **Sibling issue files** — all other `NN-*.md` files in the feature directory. Provides dependency awareness and a "what is already resolved" signal without requiring the runner to summarise prior work. +4. **Scoped CONTEXT.md** — the domain glossary for the feature's domain (see scoping rule below). Ensures test names and interface vocabulary match the project's language. +5. **Scoped ADRs** — the architectural decisions constraining the implementation (see scoping rule below). +6. **Recent commits** — the last 5 git commits. The grilling and PRD process typically produces changes to CONTEXT.md and ADRs that land in commits before the Feature Runner runs. These commits carry the ideation trail that informed the PRD. + +### ADR scoping rule + +ADRs and CONTEXT.md are scoped to the domain of the feature, not the monorepo: + +- **Plugin feature** (PRD references paths under `apps/claude-code//`) → inject that plugin's `docs/adr/` and `CONTEXT.md`. +- **Repo/tooling feature** (PRD references paths outside `apps/`, e.g. `.claude/`, `docs/`, `packages/`) → inject the root `docs/adr/` and root `CONTEXT.md`. + +Scope is inferred by scanning the PRD for `apps/claude-code/` path references. Root ADRs cover versioning, tagging, and CI tooling — they are noise for plugin implementation work and must not be injected into plugin feature runs. + +## Considered options + +- **Lazy discovery** — let `/tdd` explore the codebase and find ADRs and CONTEXT.md on its own. Rejected: `/tdd` does instruct the agent to use the domain glossary and respect ADRs, but in non-interactive sub-agent mode this exploration is unreliable. Injection is guaranteed; discovery is not. +- **Issue file only** — the minimal approach from the first PRD draft. Rejected: `/tdd` loses the PRD's "why", the sibling issues' dependency signal, and the architectural constraints, all of which are needed to produce implementations that match the grilled intent. +- **Inject everything** — all ADRs from all directories. Rejected: root ADRs are irrelevant to plugin work and add context noise without signal. + +## Consequences + +- The Feature Runner skill reads the PRD from the fixed slug-derived path `docs/issues//PRD.md`. The `## Parent` link on issue files is informational for human readers; the runner does not parse it. Features without a `PRD.md` at that path cannot be run by the Feature Runner without manual intervention. +- The Feature Runner skill must scan the PRD for `apps/claude-code/` references to determine which CONTEXT.md and ADR directory to inject. +- The context bundle grows with the number of sibling issues; for features with many issues, later invocations carry more sibling context than earlier ones. This is acceptable — it mirrors the growing "what is done" signal available in real commits. diff --git a/docs/adr/0028-blocked-by-canonical-sequencing.md b/docs/adr/0028-blocked-by-canonical-sequencing.md new file mode 100644 index 0000000..1f82faa --- /dev/null +++ b/docs/adr/0028-blocked-by-canonical-sequencing.md @@ -0,0 +1,32 @@ +# 0028. `## Blocked by` is the canonical sequencing signal for Feature Runner issue execution + +**Status:** Accepted (2026-05) + +## Context + +Issues in `docs/issues//` are named with a numeric prefix (`NN-*.md`) produced by the `to-issues` skill, which publishes issues in dependency order so blockers get lower numbers. This makes numerical filename order a reliable proxy for execution order in practice. + +However, `to-issues` also records explicit dependency information in each issue's `## Blocked by` field. The numeric prefix is a UX convenience — it makes the dependency graph human-readable at a glance in a file browser. It is not an execution contract. The `to-issues` skill's "Blocked by" field is the canonical representation of the dependency graph: it can express non-linear dependencies that numerical order cannot (e.g. issue 03 blocking issue 02 after a user reorders slices during review). + +Treating numerical order as the execution contract would make the Feature Runner silently incorrect whenever `## Blocked by` and filename order diverge — a failure mode that would be invisible until a downstream issue ran on a broken foundation. + +## Decision + +The Feature Runner builds a **topological order** from `## Blocked by` references before executing any issue. Numerical filename order is used only as a tiebreaker when two issues have no dependency relationship between them. + +If `## Blocked by` references conflict with numerical filename order (i.e. a lower-numbered issue declares a blocker that is a higher-numbered issue), the Feature Runner halts with an error and surfaces the conflict to the user rather than proceeding in the wrong order. Silent execution on a potentially wrong order is not acceptable. + +Issues with `## Blocked by: None` (or equivalent) have no predecessors and may be placed anywhere in the topological order consistent with their number. + +## Considered options + +- **Numerical order only** — simpler to implement; no graph parsing required. Rejected: not an execution contract; silently wrong when user reorders slices or when `to-issues` produces a non-linear dependency graph. +- **`## Blocked by` order, silent fallback to numerical on conflict** — avoids halting. Rejected: a conflict between the two signals indicates a malformed feature (either the issue was hand-edited or `to-issues` produced unexpected output); proceeding silently would compound the error. +- **`## Blocked by` order, halt on conflict** — chosen. Forces the human to resolve ambiguity before the autonomous run begins, preventing downstream issues from inheriting a broken foundation. + +## Consequences + +- The Feature Runner skill must parse `## Blocked by` fields and construct a dependency graph before beginning execution. +- Features where `## Blocked by` and numerical order disagree will not run until the conflict is resolved by the developer. +- The `to-issues` skill's practice of publishing issues in dependency order (blockers first) remains a useful convention that keeps numerical order and the dependency graph aligned in the common case. +- The dependency graph also reveals which issues are parallelisable (those with `## Blocked by: None` and no dependents). The Feature Runner serialises all execution regardless — see the Feature Runner PRD Out of Scope for the rationale. diff --git a/docs/adr/0029-feature-runner-afk-invocation.md b/docs/adr/0029-feature-runner-afk-invocation.md new file mode 100644 index 0000000..59e12d6 --- /dev/null +++ b/docs/adr/0029-feature-runner-afk-invocation.md @@ -0,0 +1,36 @@ +# 0029. Feature Runner invokes `/tdd` non-interactively; issue acceptance criteria replace the planning phase + +**Status:** Accepted (2026-05) + +## Context + +`/tdd`'s planning phase is interactive: before writing any code it asks the user to confirm interface changes, prioritise which behaviours to test, and approve the plan. This is by design — it prevents the agent from outrunning its headlights on ambiguous requirements. + +The Feature Runner's core use case is autonomous, overnight execution (composable with `/loop`). There is no user present to answer planning questions. Invoking `/tdd` as a sub-agent via the Agent tool means there is no TTY — interactive prompts cannot be issued and the planning phase cannot execute as designed. + +Matt Pocock's reference AFK loop (`afk.sh`) resolves this by running Claude with `--print` (non-interactive mode) and injecting issue files as the implicit plan. The Agent tool in Claude Code is the equivalent mechanism: sub-agents run without a TTY and must infer their plan from the provided context. + +## Decision + +The Feature Runner invokes `/tdd` via the Agent tool (non-interactive). The issue's `## Acceptance criteria` section serves as the pre-answered planning conversation: + +- `## What to build` answers "what interface changes are needed" +- `## Acceptance criteria` answers "which behaviours to test" and "what does done look like" + +Because issues are produced by `to-issues` (which slices the PRD vertically and quizzes the user on the breakdown) and then reviewed by the user before reaching `ready-for-agent`, the acceptance criteria represent a human-approved definition of done. The interactive planning phase is not bypassed in substance — it was completed during the grilling and issue-writing pipeline; the Feature Runner simply does not repeat it at runtime. + +`/tdd` is not modified. No AFK flag or non-interactive variant is introduced. + +## Considered options + +- **Fork `/tdd` into a non-interactive variant** (`/tdd-afk` or similar) — would allow explicit suppression of planning prompts. Rejected: creates a maintenance burden; the two variants would diverge over time; the Agent tool already provides non-interactive execution without any skill changes. +- **Add an AFK flag to `/tdd`** — e.g. a frontmatter option or a prompt prefix that skips planning. Rejected: couples the `/tdd` skill to the Feature Runner's invocation model; `to-issues` already produces the information that the planning phase would gather. +- **Non-interactive Agent tool invocation with injected context bundle** — chosen. No skill modifications required; the planning information is supplied via the context bundle (ADR-0027) rather than elicited at runtime. + +## Consequences + +- Issues that reach the Feature Runner must have specific, human-reviewed `## Acceptance criteria`. Vague criteria (e.g. "the feature works correctly") remove the planning substitute and leave `/tdd` without a concrete definition of done. +- The `to-issues` + `/triage` → `ready-for-agent` pipeline is load-bearing: it is the point at which the planning conversation occurs. The Feature Runner depends on that pipeline having been followed correctly. +- `/tdd` remains unchanged and continues to work interactively when invoked directly by the user. +- When the queue is empty (no `ready-for-agent` features remain), the Feature Runner outputs `LOOP_COMPLETE` as its stop signal. This is what makes `/loop /implement-feature` composable for overnight draining: the `/loop` skill catches `LOOP_COMPLETE` and terminates the loop cleanly rather than spinning on an empty queue. This mirrors the Spec Runner's `completion_promise: LOOP_COMPLETE` in `ralph.yml` and Matt Pocock's `NO MORE TASKS` in `afk.sh`. +- The Feature Runner's prompt template explicitly instructs the sub-agent to invoke the `tdd` skill via the Skill tool, rather than relying on auto-discovery via trigger phrases. This makes the procedural-guidance load deterministic across Claude Code surfaces. diff --git a/docs/agents/feature-runner.md b/docs/agents/feature-runner.md new file mode 100644 index 0000000..e013e5a --- /dev/null +++ b/docs/agents/feature-runner.md @@ -0,0 +1,124 @@ +# Feature Runner + +The Feature Runner is the `/implement-feature` skill. It automates the implementation side of the AI-development cycle: given a Feature slug, it creates an isolated branch, works through all `ready-for-agent` issues in dependency order using `/tdd`, opens a pull request, and marks each issue `resolved`. It is the issue-tracker-driven counterpart to the Spec Runner. + +Invoke it with `/implement-feature ` (named Feature) or `/implement-feature` (auto-select). Compose it with `/loop` for overnight queue draining. + +## Lifecycle + +``` +feature selected → worktree created → issues implemented in topological order → PR opened → issues closed on merge +``` + +### 1. Feature selection + +- **Named**: `/implement-feature ` targets `docs/issues//` directly. +- **Auto-select**: `/implement-feature` with no argument scans `docs/issues/` and picks the first Feature (alphabetically by slug) where at least one `NN-*.md` file is `ready-for-agent` and no file is in an unprepped state (`needs-triage`, `needs-info`, `needs-specs`). Partially-completed Features (mix of `resolved` and `ready-for-agent`) are included — the runner resumes from where it stopped. Features where every issue is `resolved`, `closed`, or `rejected` are skipped (nothing left to run). Issues with status `ready-for-human` or `rejected` do not disqualify a feature. +- **Empty queue**: when no qualifying Feature exists, the runner emits `LOOP_COMPLETE` on its own line and exits cleanly. This is the stop signal that `/loop` uses to terminate an overnight run. + +### 2. Worktree creation + +The runner creates (or reuses) a git worktree and branch: + +- Branch: `feature/afk/` +- Worktree path: `.claude/worktrees/` + +If `.claude/worktrees/` already exists (prior failed run), the runner reuses it — `git worktree add` is skipped and the existing branch retains its committed work. If it does not exist, the runner creates it from `develop`. On success, the worktree is removed after the PR is opened. On failure, it is left in place for inspection. + +### 3. Issue implementation + +Issues are executed via `/tdd` sub-agent invocations in **topological order** derived from `## Blocked by` references (see [Dependency ordering](#dependency-ordering) below). Only `ready-for-agent` issues are executed. `resolved`, `closed`, and `rejected` issues satisfy dependencies but are skipped. `ready-for-human` issues are unsatisfied — if a `ready-for-agent` issue depends on one, the runner halts before executing anything. + +Before each invocation the runner outputs: + +``` +Implementing issue N of M: +``` + +After a successful `/tdd` invocation, the issue file is updated: `Status: ready-for-agent` → `Status: resolved`. + +On failure, a note is appended to the failing issue under `## Comments` (see [Failure behaviour](#failure-behaviour)) and the runner stops. No subsequent issues run. + +### 4. PR and cleanup + +When all issues are resolved, the runner: + +1. Pushes `feature/afk/` to origin. +2. Opens a pull request targeting `develop` via `gh pr create`. The PR title is `feat(): ` and the body references the PRD and lists all resolved issues. +3. Removes the worktree. + +Issues remain at `Status: resolved` until the PR is merged, at which point they are manually marked `closed` (or via a future hook). + +## Context bundle + +Each `/tdd` sub-agent invocation receives a six-part context bundle assembled by the runner: + +| Part | What it contains | Why | +| --------------------- | ------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | +| **Issue file** | `## What to build` and `## Acceptance criteria` | Replaces the interactive planning phase in AFK mode | +| **PRD** | `docs/issues//PRD.md` | Carries the "why" and shared vision from the grilling session | +| **Sibling issues** | All other `NN-*.md` files in the feature directory at current state | Shows what is already resolved and what is still pending | +| **Scoped CONTEXT.md** | Plugin or root `CONTEXT.md` (see ADR scope below) | Ensures interface vocabulary and test names match the domain glossary | +| **Scoped ADRs** | All `*.md` files in the scoped ADR directory | Communicates the architectural constraints that bind the implementation | +| **Recent commits** | `git log --oneline -5` | Carries the ideation trail from grilling and PRD work that landed just before the runner | + +### ADR scope + +ADRs are scoped to the domain of the Feature: + +- **Plugin Feature**: the PRD references one or more paths under `apps/claude-code//` → inject `apps/claude-code//CONTEXT.md` and `apps/claude-code//docs/adr/`. Root ADRs are **not** injected. +- **Repo/tooling Feature**: no `apps/claude-code//` references in the PRD → inject root `CONTEXT.md` and root `docs/adr/`. + +## Dependency ordering + +`## Blocked by` is the canonical dependency signal, not numeric filename order. Numeric order is a UX convenience produced by `/to-issues` (it publishes blockers first so numbers usually match), but it is not an execution contract. + +The runner builds a topological execution order from `## Blocked by` references across all `NN-*.md` files before executing anything. + +**Conflict detection**: if an issue A lists issue B in `## Blocked by`, and B has a higher numeric prefix than A, the dependency contradicts the numeric convention. The runner halts before executing any issue and reports: + +``` +Feature Runner error: dependency conflict detected. + Issue NN- is blocked by NN-, but NN- has a higher number than NN-. + This conflicts with the numerical ordering convention. Resolve the ordering manually before re-running. +``` + +`## Blocked by: None`, `## Blocked by: None — can start immediately`, or a missing `## Blocked by` section all mean the issue has no predecessors. + +## Failure behaviour + +When a `/tdd` sub-agent cannot complete an issue: + +1. The issue status is changed to `needs-info`. This prevents the auto-selection path from picking up this Feature on subsequent `/loop` iterations until the developer investigates and restores `ready-for-agent` (or closes/rejects the issue). +2. The runner appends a failure note to the issue file under `## Comments`: + +```markdown +## Comments + +> _This was generated by AI during triage._ + +**Feature Runner failure** — `/tdd` could not complete this issue. Status has been set to `needs-info`. + +The worktree at `.claude/worktrees/` has been left in place for inspection. Once the issue is resolved manually, restore `**Status:** ready-for-agent` and re-run `/implement-feature ` to resume. Alternatively, close or reject the issue if it should not be retried. +``` + +3. The runner stops. No subsequent issues in the Feature are executed. +4. The worktree is left at `.claude/worktrees/` on `feature/afk/` for inspection. + +Re-running `/implement-feature ` after a manual fix resumes from the first `ready-for-agent` issue (already-resolved issues are skipped via the topological filter). + +## Historical cleanup convention + +When a Spec in `docs/plans/` is marked `done` and a corresponding `docs/issues//` directory exists, the issues in that directory were implemented via the Spec Runner, not the Feature Runner. They will never be processed by `/implement-feature` and should not remain at `ready-for-agent`. + +**Convention**: manually mark all `NN-*.md` files in `docs/issues//` as `closed` and append a note: + +```markdown +## Comments + +> _This was generated by AI during triage._ + +Marked `closed` — implemented via the Spec Runner (see `docs/plans/.md`, marked `done`). The Feature Runner was not used for this Feature. +``` + +This prevents the auto-selection path from picking up stale Features and keeps the issue tracker accurate. diff --git a/docs/inbox/adapt-release-package-to-gitflow.md b/docs/inbox/adapt-release-package-to-gitflow.md index 8bdd2b6..1b3cda9 100644 --- a/docs/inbox/adapt-release-package-to-gitflow.md +++ b/docs/inbox/adapt-release-package-to-gitflow.md @@ -11,3 +11,15 @@ created: 2026-05-03 Adapt release package to gitflow If I'm using Git-flow, shouldn't the release process be adapted? And CI? Now I need to remember to merge main into develop before starting a new feature branch. + +## Triage Notes + +**Nature:** Release tooling + CI changes for Gitflow hygiene. + +The pain point is that after a hotfix or release merges to `main`, `develop` falls behind and the developer must remember to backfill it manually. Both `packages/release-tools/` and the CI workflows in `.github/workflows/` may need adjusting. + +**What grilling needs to resolve:** + +- Which scenarios create the drift? Hotfix merges? Release PRs from `develop` → `main`? +- Should the fix be a GitHub Actions workflow step (auto-merge `main` back into `develop` after a release merges), a documented manual step, or a `release-tools` script? +- Are there edge cases where auto-backfill would be dangerous (e.g. `main` has a hotfix that conflicts with in-flight feature work on `develop`)? diff --git a/docs/inbox/add-github-copilot-config-file.md b/docs/inbox/add-github-copilot-config-file.md deleted file mode 100644 index ba2430a..0000000 --- a/docs/inbox/add-github-copilot-config-file.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -title: Add GitHub Copilot config file -created: 2026-05-03 ---- - -**Status:** needs-specs -**Category:** enhancement - -> _This was generated by AI during triage._ - -Add GitHub Copilot config file. No reviews on generated or external files, like `.agents/` or folders from raw ideas like `docs/inbox` diff --git a/docs/inbox/add-github-support-to-pr-review.md b/docs/inbox/add-github-support-to-pr-review.md index 5ea0bfb..6859644 100644 --- a/docs/inbox/add-github-support-to-pr-review.md +++ b/docs/inbox/add-github-support-to-pr-review.md @@ -9,3 +9,17 @@ created: 2026-05-03 > _This was generated by AI during triage._ Add GitHub support to pr-review + +## Triage Notes + +**Nature:** Large feature — currently `pr-review` is ADO-only; this would add a GitHub PR review path. + +The plugin communicates with ADO via `ado-fetcher` and `ado-writer` sub-agents. Supporting GitHub PRs would require equivalent `github-fetcher` and `github-writer` agents using the GitHub REST API (or `gh` CLI), plus a top-level dispatch in the orchestrator to route based on the detected remote. + +**What grilling needs to resolve:** + +- Does GitHub support run alongside ADO (auto-detect remote from `git remote`), or is it configured explicitly? +- Authentication: `gh` CLI token, `GITHUB_TOKEN` env var, or a stored PAT? +- Thread model mapping: GitHub uses inline review comments and PR-level comments — how does the existing classification logic (`addressed`, `pending`, `disputed`, `obsolete`) map onto GitHub's review state machine? +- Does re-review work the same way on GitHub (detect prior bot comments by signature)? +- Scope: MVP only (post comments) or full feature parity with the ADO path from day one? diff --git a/docs/inbox/alternative-doc-sources-for-doc-context.md b/docs/inbox/alternative-doc-sources-for-doc-context.md index b4599d1..eeaa3ab 100644 --- a/docs/inbox/alternative-doc-sources-for-doc-context.md +++ b/docs/inbox/alternative-doc-sources-for-doc-context.md @@ -27,3 +27,14 @@ which doc sources are active. Credential handling per source also needs design. Relates to: `alternative-work-item-sources-for-doc-context.md` (same extensibility dimension, different axis). + +## Triage Notes + +**Nature:** Multi-source doc client design — additive extension to the doc context enrichment layer. + +Blocked on two open design questions that need grilling before a spec can be written: + +1. **Dispatch strategy** — URL-pattern auto-detection (simpler UX, fragile for private URLs) vs. explicit config listing active doc sources (more setup, more predictable). +2. **Credential handling** — each source (Notion, SharePoint, GitHub Wiki) has a different auth model; needs a consistent discovery pattern (env vars? config file? per-source entry?). + +Should be grilled together with `alternative-work-item-sources-for-doc-context.md` — both share the same extensibility architecture. diff --git a/docs/inbox/alternative-work-item-sources-for-doc-context.md b/docs/inbox/alternative-work-item-sources-for-doc-context.md index 2445ac7..832a6c9 100644 --- a/docs/inbox/alternative-work-item-sources-for-doc-context.md +++ b/docs/inbox/alternative-work-item-sources-for-doc-context.md @@ -26,3 +26,15 @@ each configured source — no rewrite of the ADO path needed, purely additive. Architecture note: if the number of supported sources grows, consider a config file (similar to `setup-matt-pocock-skills/issue-tracker-*.md`) that declares which work item trackers are active for a given install. Needs grilling before implementation. + +## Triage Notes + +**Nature:** Multi-source work item client design — additive extension to the doc context enrichment layer. + +Blocked on design decisions that need grilling: + +1. **Source discovery** — how does the plugin know which tracker a linked URL belongs to? URL pattern matching, or explicit config? +2. **Credential handling** — Jira uses API tokens + Basic auth; GitHub Issues uses `gh` CLI or a PAT. Needs a consistent abstraction across clients. +3. **Config file shape** — the architecture note suggests a declarative config; grilling should nail down the exact format before implementation. + +Should be grilled together with `alternative-doc-sources-for-doc-context.md` — both involve the same extensibility pattern, different axes. diff --git a/docs/inbox/automate-qa-in-github.md b/docs/inbox/automate-qa-in-github.md index 3a7f170..2d7cb89 100644 --- a/docs/inbox/automate-qa-in-github.md +++ b/docs/inbox/automate-qa-in-github.md @@ -42,3 +42,16 @@ My usual workflow: ``` Is this worth for this repo only? Or for `pr-review` app too? + +## Triage Notes + +**Nature:** QA workflow automation — encode the maintainer's end-to-end PR quality loop as a first-class skill or command. + +Closely related to `review-pr-review-command-process.md` — both describe the same loop from different angles. Strong candidate for merging into a single PRD. Should be grilled together. + +**What grilling needs to resolve:** + +- One feature or two? If merged, what is the slug? +- Target scope: this repo only (monorepo-specific) or a general skill usable across any repo? +- Delivery vehicle: a new slash command, an extension to `/pr-review-toolkit:review-pr`, or a CLAUDE.md prompt template? +- Context-rot mitigation via sub-agents — how does the orchestration differ from the existing `pr-review-toolkit` sub-agent model? diff --git a/docs/inbox/ci-test-job-missing-pr-review.md b/docs/inbox/ci-test-job-missing-pr-review.md new file mode 100644 index 0000000..d6690b8 --- /dev/null +++ b/docs/inbox/ci-test-job-missing-pr-review.md @@ -0,0 +1,40 @@ +--- +title: CI test job filter omits pr-review +created: 2026-05-13 +--- + +**Status:** needs-triage +**Category:** ci + +> _This was generated by AI during triage of PR #29._ + +`.github/workflows/ci.yml` defines a `test` matrix job that runs `pnpm --filter test` only when the corresponding package changed. The matrix currently includes `auto-format`, `unic-confluence` (under the output name `confluence-publish` — separate naming bug), and `release-tools`. **`pr-review` is not in the matrix.** + +Consequences: + +- Every PR that changes `apps/claude-code/pr-review/**` skips CI tests entirely. The `Detect changed packages` job runs the filter (which already declares `pr-review:` correctly), but the downstream `test` job's `if:` condition (lines 48–51 in `ci.yml`) does not include `pr-review` and the matrix `package:` list does not include it either. +- PR #29 (orchestrator split) added the helpers `scripts/ado-fetcher.mjs`, `scripts/ado-writer.mjs`, `scripts/pre-pr.mjs`, `scripts/mode-detection.mjs`, `scripts/re-review/parse-diff-hunks.mjs` and accompanying tests under `tests/`. Local `pnpm --filter pr-review test` runs 142 tests across all of them. None of these run in CI. + +There's also a latent output-key mismatch on line 28: the filter has key `unic-confluence` but the outputs declare `confluence-publish`. Even though the `if:` clause uses `unic-confluence` (the filter key), the outputs declaration wouldn't expose it — that probably already silently masks confluence tests too. + +## Fix sketch + +In `.github/workflows/ci.yml`: + +1. Add `pr-review: ${{ steps.filter.outputs.pr-review }}` to the `changes` job outputs. +2. Add `needs.changes.outputs.pr-review == 'true'` to the `test` job's `if:` condition. +3. Add the package to the matrix: + ```yaml + - name: pr-review + changed: ${{ needs.changes.outputs.pr-review }} + ``` +4. Fix the `unic-confluence` / `confluence-publish` output-key mismatch while in there. + +## What grilling needs to resolve + +- Does the `pr-review` test suite have any cross-platform-flaky tests that would slow the Windows / macOS matrix? Quick local run suggests no (pure helpers, no fs/network). +- Should this be batched with a broader CI audit (the `confluence-publish` typo, the auto-format / release-tools filter coverage)? + +## Source + +PR #29 review (triage step 2 — CI checks). The `Test ...` job appears as `skipping` on every pr-review PR; root cause is matrix filter omission rather than a check failure. diff --git a/docs/inbox/define-and-freeze-conventional-commits-scopes.md b/docs/inbox/define-and-freeze-conventional-commits-scopes.md deleted file mode 100644 index c4e8773..0000000 --- a/docs/inbox/define-and-freeze-conventional-commits-scopes.md +++ /dev/null @@ -1,17 +0,0 @@ ---- -title: Define and freeze conventional commits scopes -created: 2026-05-03 ---- - -**Status:** needs-specs -**Category:** enhancement - -> _This was generated by AI during triage._ - -Define and freeze conventional commits scopes - -1. chore: - 1. per technology (like prettier, biome, js, ts, etc) - 2. per dev-tech (like VSCode, WebStorm, etc) -2. feat or fix: per app or per package (like unic-confluence, pr-review, biome-config) -3. docs for PRDs, ARDs or plans: per app or package or whole repo (like pr-review, release-tools, unic-agents-plugins) diff --git a/docs/inbox/fix-warnings-in-ci-after-merging-a-branch.md b/docs/inbox/fix-warnings-in-ci-after-merging-a-branch.md deleted file mode 100644 index a391be3..0000000 --- a/docs/inbox/fix-warnings-in-ci-after-merging-a-branch.md +++ /dev/null @@ -1,19 +0,0 @@ ---- -title: Fix warnings in CI after merging a branch -created: 2026-05-03 ---- - -**Status:** needs-specs -**Category:** bug - -> _This was generated by AI during triage._ - -Fix warnings in CI after merging a branch - -Actions running after merging a branch have warnings: - -```txt -[**Detect changed packages**](https://github.com/unic/unic-agents-plugins/actions/runs/25256449326/job/74056537707#step:7:6) - -Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v4, dorny/paths-filter@v3. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: [https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/](https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/) -``` diff --git a/docs/inbox/migrate-plans-numbering-to-4-digits-prefix.md b/docs/inbox/migrate-plans-numbering-to-4-digits-prefix.md deleted file mode 100644 index 8ccabb6..0000000 --- a/docs/inbox/migrate-plans-numbering-to-4-digits-prefix.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -title: Migrate plans numbering to 4-digits prefix -created: 2026-05-03 ---- - -**Status:** needs-specs -**Category:** enhancement - -> _This was generated by AI during triage._ - -Migrate plans numbering to 4-digits prefix diff --git a/docs/inbox/plans-issues-sync-gap.md b/docs/inbox/plans-issues-sync-gap.md deleted file mode 100644 index 2bb89df..0000000 --- a/docs/inbox/plans-issues-sync-gap.md +++ /dev/null @@ -1,35 +0,0 @@ ---- -title: no bridge between docs/plans/ specs and docs/issues/ triage files -created: 2026-05-08 ---- - -**Status:** needs-specs -**Category:** enhancement - -> _This was generated by AI during triage._ - -no bridge between docs/plans/ specs and docs/issues/ triage files - -## The gap - -The repo has two parallel systems for tracking work, and they are never automatically synced: - -- **`docs/plans/NN-slug.md`** — Ralph's spec queue. Written manually. Ralph marks them `done` after implementing. Consumed by `pnpm ralph`. -- **`docs/issues//PRD.md` + `NN-*.md`** — Triage issue tracker. Created by `/to-prd` + `/to-issues`. Managed by `/triage`. Consumed by AFK agents directly. - -No skill writes to `docs/plans/`. No skill reads `docs/issues/` to feed Ralph. The two systems are created and maintained independently, by hand. - -## How drift happens - -1. A session runs `grill → /to-prd → /to-issues`, producing `docs/issues//`. -2. Separately (in the same or a different session), the user manually writes a `docs/plans/NN-slug.md` spec for Ralph. -3. Ralph runs the spec and marks it `done` in `docs/plans/README.md`. -4. The corresponding `docs/issues/` files are never updated — they stay at `ready-for-agent` indefinitely. - -Real example: `apps/claude-code/pr-review/docs/plans/11-doc-context-spawn-reliability.md` is `Status: done — 2026-05-08`, but `docs/issues/pr-review-doc-context-spawn-reliability/*.md` all still read `ready-for-agent`. - -## What's missing - -- A convention (or skill step) that marks `docs/issues/` files `resolved` / `closed` when the corresponding Ralph spec is marked `done`. -- Or alternatively: a single workflow where either Ralph reads from `docs/issues/` directly, or `/to-issues` writes into `docs/plans/` format, so there's only one source of truth. -- At minimum: documentation making it explicit that the two systems are independent and must be kept in sync manually. diff --git a/docs/inbox/plugin-naming-conventions-add-unic-prefix.md b/docs/inbox/plugin-naming-conventions-add-unic-prefix.md deleted file mode 100644 index d2f9ca3..0000000 --- a/docs/inbox/plugin-naming-conventions-add-unic-prefix.md +++ /dev/null @@ -1,35 +0,0 @@ ---- -title: plugin naming conventions - add unic prefix -created: 2026-05-07 ---- - -**Status:** needs-specs -**Category:** enhancement - -> _This was generated by AI during triage._ - -plugin naming conventions - add unic prefix - -## How it works - -The slash command name is assembled by Claude Code from two parts: - -``` -/: -``` - -- **``** — the `"name"` field in `.claude-plugin/plugin.json` -- **``** — the filename under `commands/` without `.md` - -Example: plugin name `pr-review` + command file `review-pr.md` → `/pr-review:review-pr` - -`unic-confluence` already follows the desired pattern: plugin name `unic-confluence` + command `unic-confluence.md` → `/unic-confluence:unic-confluence`. - -## To add a `unic` prefix to all plugins - -Rename the `"name"` field in each plugin's `.claude-plugin/plugin.json`: - -- `pr-review` → `unic-pr-review` (command becomes `/unic-pr-review:review-pr`) -- `auto-format` → `unic-auto-format` - -**Breaking change:** existing installs have `"pr-review@unic": true` in `enabledPlugins` — they'd need to update to `"unic-pr-review@unic": true`. Safe to do now if the plugin isn't widely distributed. diff --git a/docs/inbox/pr-review-prompt-content-tests-brittleness.md b/docs/inbox/pr-review-prompt-content-tests-brittleness.md new file mode 100644 index 0000000..4a6e151 --- /dev/null +++ b/docs/inbox/pr-review-prompt-content-tests-brittleness.md @@ -0,0 +1,39 @@ +--- +title: pr-review prompt-content tests are brittle +created: 2026-05-13 +--- + +**Status:** needs-triage +**Category:** tech-debt + +> _This was generated by AI during triage of PR #29._ + +The pr-review test suite includes ~60% of its lines in "prompt-content assertions" against `.agents/*.md` and `commands/review-pr.md`. They string-grep markdown prose for substrings like `"no code quotes"`, `"leading slash"`, `"≤ 80"` etc. These work today but are brittle: a behaviourally-equivalent rewrite of the prompt prose (e.g. "no inline code quotes" instead of "no code quotes") breaks the test, and several assertions OR three to seven substring fallbacks because the author already knew the wording would drift. + +The orchestrator-split PR (#29) realigned some of these tests to read from a single shared `### Compact finding schema` block instead of slicing Step 6 / Step D — already an improvement. But the underlying anti-pattern is still present in: + +- `tests/ado-fetcher.test.mjs` — frontmatter + Step 1/2/3 prose substring assertions, plus a "no ADO write HTTP methods" test that uses a regex over a stripped slice of the markdown (with multiple opt-out clauses). +- `tests/ado-writer.test.mjs` — mirror "GET-forbidden" test on the writer prompt with the same fragility, plus a 5-way OR substring assertion on the zero-findings branch wording. +- `tests/pre-pr.test.mjs` — Pre-PR "absence" assertions (`does not invoke ADO Fetcher` etc.) which are valuable, and the compact-output guidance assertions which are now schema-block focused (good). + +The PRD's own testing decision says "No new unit tests required for the three new agents — their behaviour is best verified by integration against a real ADO PR (smoke test)." The current tests stretch that — and the section-slice approach silently passes when `indexOf` returns `-1` because `slice(-1, ...)` yields an empty string. + +## Proposed direction + +Replace prompt-prose substring assertions with one of: + +- **Structured contract blocks inside the prompts.** E.g. an `` / `` fence in `.agents/ado-fetcher.md` listing the output fields, parsed by a `parseFetcherContract` helper. Tests then assert against the parsed structure, not the prose. +- **Header-level structural assertions.** "ADO Fetcher must declare an Inputs section listing exactly these fields" — parse the markdown headings. +- **A single static snapshot test of the contract block** that updates with an intentional `--update-snapshot` flag. + +For the substring-OR-chain assertions ("zero" || "no new findings" || "FINDINGS_POSTED=0" || "nothing to report" || "skip"), the assertion is so permissive it carries no signal — drop or replace with a structural marker. + +## What grilling needs to resolve + +- Is the right replacement a structured contract block, snapshot tests, or just deleting the noisiest assertions? +- Should the prompts evolve a small "spec frontmatter" convention (parseable by tests) so we can stop string-matching prose? +- How does this interact with the existing 4 re-review module tests (which are good and should stay)? + +## Source + +PR #29 pr-test-analyzer review. Affected tests are unmodified by PR #29 except for the Step 6 / Step D realignment that landed alongside the orchestrator trim. diff --git a/docs/inbox/pr-review-request-user-confirmation-before.md b/docs/inbox/pr-review-request-user-confirmation-before.md index 37909c3..c9c217a 100644 --- a/docs/inbox/pr-review-request-user-confirmation-before.md +++ b/docs/inbox/pr-review-request-user-confirmation-before.md @@ -9,3 +9,15 @@ created: 2026-05-07 > _This was generated by AI during triage._ pr-review request user confirmation before proceeding to checkout the branch to be reviewed + +## Triage Notes + +**Nature:** Small UX safety feature — avoid silent branch checkout that may disrupt the user's working state. + +The plugin currently checks out the PR branch automatically as part of the review flow. If the user has uncommitted work or is on a different branch, this can be disruptive with no warning. + +**What grilling needs to resolve:** + +- Exact trigger: confirmation before any checkout, or only when the working tree is dirty / a branch switch is needed? +- UX: a yes/no prompt via `AskUserQuestion`, or a `--no-checkout` flag that reviews from the remote diff only? +- Should the plugin stash/restore working changes automatically, or just warn and abort? diff --git a/docs/inbox/pr-review-supress-thanks-comments-on-addressed-threads.md b/docs/inbox/pr-review-supress-thanks-comments-on-addressed-threads.md deleted file mode 100644 index d416421..0000000 --- a/docs/inbox/pr-review-supress-thanks-comments-on-addressed-threads.md +++ /dev/null @@ -1,41 +0,0 @@ ---- -title: pr-review - suppress thanks comments on addressed threads -created: 2026-05-07 ---- - -**Status:** needs-specs -**Category:** enhancement - -> _This was generated by AI during triage._ - -pr-review - suppress thanks comments on addressed threads - -## How pr-review works (simple version) - -The plugin runs in **iterations**. Each iteration is identified by a number stamped in a signature on every bot comment: `🤖 *Reviewed by Claude Code* — Iteration N`. - -On a **re-review**, it: - -1. Fetches all PR threads and finds the ones it posted before (via that signature) -2. Computes an incremental diff (only changes since the last review) -3. **Classifies** each prior thread into one of four states: - - `addressed` — ADO thread status is resolved (`fixed`, `wontFix`, etc.) OR the code changed at those lines - - `disputed` — still active, but a human replied - - `pending` — still active, bot-only thread - - `obsolete` — the file/lines no longer exist in the diff - -Then for each new finding it decides whether to open a fresh thread or reply to an existing one. - ---- - -## Your actual question: are those "Resolved as of Iteration 6 — thanks!" comments needed? - -**No, they are purely cosmetic.** They can be skipped without any functional impact. - -The actual resolution signal the system relies on is the **Azure DevOps thread status PATCH** — when the bot marks a thread `addressed`, it also PATCHes its status to `fixed` (status=2) via the ADO API. That's what `classifyThread()` reads on the next re-review. - -The comment text is never parsed, compared, or used in any conditional logic. The system would behave identically if the reply said "done", "👍", or nothing at all. - ---- - -**TL;DR:** If you want to suppress those reply comments on addressed threads, you can remove the reply step safely — the PATCH on the thread status is the only thing that matters for the re-review to recognize a finding as resolved. diff --git a/docs/inbox/research-how-to-scaffold-or-create-new-plugins.md b/docs/inbox/research-how-to-scaffold-or-create-new-plugins.md deleted file mode 100644 index 8b54cf5..0000000 --- a/docs/inbox/research-how-to-scaffold-or-create-new-plugins.md +++ /dev/null @@ -1,13 +0,0 @@ ---- -title: Research how to scaffold or create new plugins -created: 2026-05-03 ---- - -**Status:** needs-triage -**Category:** enhancement - -> _This was generated by AI during triage._ - -Research how to scaffold or create new plugins - -`/grill-me` custom skills may not take in consideration Anthropic's `/create-dev:*` commands/skills. Would it be worth calling them? At start? At the end? diff --git a/docs/inbox/research-ralph-orchestrator-alternatives.md b/docs/inbox/research-ralph-orchestrator-alternatives.md deleted file mode 100644 index 6b5c4c0..0000000 --- a/docs/inbox/research-ralph-orchestrator-alternatives.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -title: research ralph-orchestrator alternatives -created: 2026-05-03 ---- - -**Status:** needs-info -**Category:** enhancement - -> _This was generated by AI during triage._ - -research ralph-orchestrator alternatives diff --git a/docs/inbox/review-pr-review-command-process.md b/docs/inbox/review-pr-review-command-process.md index b72f336..1e5f1ab 100644 --- a/docs/inbox/review-pr-review-command-process.md +++ b/docs/inbox/review-pr-review-command-process.md @@ -10,6 +10,12 @@ created: 2026-05-03 review pr-review command process. This idea is very related to `./automate-qa-in-github.md` +## Triage Notes + +**Nature:** Improvement to the pr-review command UX and process — formalise the maintainer's manual prompt into a proper skill/command with sub-agent support to prevent context-rot. + +Closely related to `automate-qa-in-github.md`; both describe the same end-to-end QA loop from slightly different angles. Should be grilled together — likely merges into a single PRD covering the full automated QA workflow. + Check Prompt I use for GitHub PRs: ```prompt diff --git a/docs/inbox/using-pr-review-on-active-ado-prs-wrongly.md b/docs/inbox/using-pr-review-on-active-ado-prs-wrongly.md index 68a7344..0c031c6 100644 --- a/docs/inbox/using-pr-review-on-active-ado-prs-wrongly.md +++ b/docs/inbox/using-pr-review-on-active-ado-prs-wrongly.md @@ -9,3 +9,16 @@ created: 2026-05-04 > _This was generated by AI during triage._ using pr-review on active ADO PRs wrongly identified as merged + +## Triage Notes + +**Nature:** Bug — the plugin misclassifies an active (open) ADO PR as merged, causing incorrect behaviour. + +Very sparse report; no repro steps or error output provided. + +**What we still need to reproduce and fix:** + +- What is the ADO PR status at the time of the wrong identification? (Active, Draft, something else?) +- Which code path reads the PR status — `ado-fetcher`? Which field on the ADO PR object is being checked (`status`, `mergeStatus`, something else)? +- Does this happen on all PRs or only under specific conditions (e.g. auto-complete enabled, a specific branch naming pattern, a particular reviewer state)? +- Exact error or unexpected output observed when the misclassification occurs. diff --git a/docs/issues/auto-format-config/01-config-module.md b/docs/issues/auto-format-config/01-config-module.md index d616655..76b08e6 100644 --- a/docs/issues/auto-format-config/01-config-module.md +++ b/docs/issues/auto-format-config/01-config-module.md @@ -1,6 +1,6 @@ # Extract `lib/config.mjs` with `DEFAULTS`, `loadConfig`, and tests -**Status:** resolved +**Status:** closed **Category:** refactor ## Parent diff --git a/docs/issues/auto-format-config/02-wire-up.md b/docs/issues/auto-format-config/02-wire-up.md index 2d7dddf..f14b0ea 100644 --- a/docs/issues/auto-format-config/02-wire-up.md +++ b/docs/issues/auto-format-config/02-wire-up.md @@ -1,6 +1,6 @@ # Update `format-hook.mjs` to use `lib/config.mjs` -**Status:** resolved +**Status:** closed **Category:** refactor ## Parent diff --git a/docs/issues/auto-format-config/03-version-bump.md b/docs/issues/auto-format-config/03-version-bump.md index 2b4a859..5959abb 100644 --- a/docs/issues/auto-format-config/03-version-bump.md +++ b/docs/issues/auto-format-config/03-version-bump.md @@ -1,6 +1,6 @@ # Version bump and CHANGELOG entry -**Status:** resolved +**Status:** closed **Category:** release ## Parent diff --git a/docs/issues/auto-format-runners/01-formatter-descriptor-type.md b/docs/issues/auto-format-runners/01-formatter-descriptor-type.md index f18b50d..b1848a8 100644 --- a/docs/issues/auto-format-runners/01-formatter-descriptor-type.md +++ b/docs/issues/auto-format-runners/01-formatter-descriptor-type.md @@ -1,6 +1,6 @@ # Add `FormatterDescriptor` typedef to `lib/types.mjs` -**Status:** resolved +**Status:** closed **Category:** refactor ## Parent diff --git a/docs/issues/auto-format-runners/02-runner-module.md b/docs/issues/auto-format-runners/02-runner-module.md index b80a39d..975faed 100644 --- a/docs/issues/auto-format-runners/02-runner-module.md +++ b/docs/issues/auto-format-runners/02-runner-module.md @@ -1,6 +1,6 @@ # Extract `lib/runners.mjs` with `runFormatter` and tests -**Status:** resolved +**Status:** closed **Category:** refactor ## Parent diff --git a/docs/issues/auto-format-runners/03-replace-runner-functions.md b/docs/issues/auto-format-runners/03-replace-runner-functions.md index cadd350..48c1530 100644 --- a/docs/issues/auto-format-runners/03-replace-runner-functions.md +++ b/docs/issues/auto-format-runners/03-replace-runner-functions.md @@ -1,6 +1,6 @@ # Replace runner functions with descriptors in `format-hook.mjs` -**Status:** resolved +**Status:** closed **Category:** refactor ## Parent diff --git a/docs/issues/auto-format-runners/04-version-bump.md b/docs/issues/auto-format-runners/04-version-bump.md index a62ca5f..be38de1 100644 --- a/docs/issues/auto-format-runners/04-version-bump.md +++ b/docs/issues/auto-format-runners/04-version-bump.md @@ -1,6 +1,6 @@ # Version bump and CHANGELOG entry -**Status:** resolved +**Status:** closed **Category:** release ## Parent diff --git a/docs/issues/ci-node24-upgrade/PRD.md b/docs/issues/ci-node24-upgrade/PRD.md new file mode 100644 index 0000000..9b0e9c3 --- /dev/null +++ b/docs/issues/ci-node24-upgrade/PRD.md @@ -0,0 +1,40 @@ +--- +title: CI — Upgrade GitHub Actions to Node.js 24-compatible versions +created: 2026-05-12 +--- + +**Status:** ready-for-agent +**Category:** bug + +> _This was generated by AI during triage._ + +## Problem Statement + +CI workflows emit deprecation warnings on every run because `actions/checkout@v4` and +`dorny/paths-filter@v3` still run on the Node.js 20 runtime, which GitHub is sunsetting: + +- **June 2, 2026** — Node.js 24 becomes the forced default; Node 20 actions may break. +- **September 16, 2026** — Node.js 20 is removed from runners entirely. + +The warnings appear in the "Detect changed packages" step and will eventually become +hard failures, breaking all CI runs for this repo. + +## Solution + +Update the pinned action versions in every workflow file under `.github/workflows/` to +releases that declare Node.js 24 support: + +- `actions/checkout` — check the latest `v4.x` patch or `v5` if available. +- `dorny/paths-filter` — check for a release that ships a Node 24 runtime. + +Alternatively, set `FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true` as a repo-level env var +as a temporary workaround, but prefer upgrading the action versions so the fix is durable. + +After updating, verify that CI passes on all three OS matrices (macOS, Windows, Linux) +and that no new warnings appear in the action logs. + +## Acceptance Criteria + +- No Node.js 20 deprecation warnings appear in any CI job after the change. +- All existing CI checks (lint, test, typecheck, verify:changelog) still pass. +- Action versions are pinned to a specific SHA or semver tag (not `@main`). diff --git a/docs/issues/conventional-commits-scopes/PRD.md b/docs/issues/conventional-commits-scopes/PRD.md new file mode 100644 index 0000000..eabb5dd --- /dev/null +++ b/docs/issues/conventional-commits-scopes/PRD.md @@ -0,0 +1,41 @@ +--- +title: Define and freeze conventional commits scopes for the monorepo +created: 2026-05-12 +--- + +**Status:** ready-for-agent +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Problem Statement + +Commit messages in this repo use conventional commits (`feat:`, `fix:`, `chore:`, etc.) +but the scope vocabulary is informal and inconsistent. Different contributors (human and +AI agents) invent scopes ad hoc, making the git log harder to scan, changelogs harder to +generate, and PR reviews harder to reason about. + +## Solution + +Define a canonical scope vocabulary and document it in `CLAUDE.md` (or a dedicated +`docs/conventions/commits.md` linked from `CLAUDE.md`). The proposed taxonomy: + +| Type | Scope examples | Notes | +| ---------------- | ------------------------------------------------------------------------------------------ | ---------------------------- | +| `feat`, `fix` | `pr-review`, `auto-format`, `unic-confluence`, `release-tools`, `biome-config`, `tsconfig` | One scope per app or package | +| `chore` | `ci`, `deps`, `biome`, `prettier`, `eslint`, `ts`, `vscode` | Per tooling concern | +| `docs` | `pr-review`, `auto-format`, `unic-agents-plugins` | Per plugin or repo-wide | +| `chore(release)` | _(no scope, or plugin name)_ | Version bumps, tags | +| `test` | plugin name or package name | Matches `feat`/`fix` scope | + +Once the vocabulary is agreed, optionally add a `commitlint` config +(`commitlint.config.mjs`) enforcing it via the existing `commit-msg` hook slot in +`.github/` or a local husky setup. + +## Acceptance Criteria + +- A written scope vocabulary exists in the repo and is referenced from `CLAUDE.md`. +- The scope list covers `feat`/`fix`, `chore`, `docs`, `test`, and `chore(release)`. +- (Optional) A `commitlint` config enforces the scopes in CI or via a local hook. +- Existing commits are not retroactively rewritten — the convention applies from the + merge date forward. diff --git a/docs/issues/feature-runner/01-skill-scaffold.md b/docs/issues/feature-runner/01-skill-scaffold.md new file mode 100644 index 0000000..fdd896c --- /dev/null +++ b/docs/issues/feature-runner/01-skill-scaffold.md @@ -0,0 +1,29 @@ +# Skill scaffold and minimal execution loop + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Create `.claude/skills/implement-feature/SKILL.md`. Implement the full happy path for a named-slug invocation on a feature with one or more issues: create a `feature/afk/` worktree from `develop`, read all `NN-*.md` files in the feature directory, filter to those with `Status: ready-for-agent` (skip `resolved` and `closed`), sort numerically, invoke `/tdd` as a non-interactive sub-agent for each issue in order with the issue file and PRD as context, mark each issue `Status: resolved` on successful completion. + +All file and git operations must use Claude tool calls — no shell scripts, no POSIX paths. This is the tracer bullet: the entire end-to-end happy path without failure handling, PR creation, or progress output. + +## Acceptance criteria + +- [ ] `.claude/skills/implement-feature/SKILL.md` exists and is invocable as `/implement-feature ` +- [ ] Running `/implement-feature ` creates a `feature/afk/` branch and worktree from `develop` +- [ ] Only issues at `Status: ready-for-agent` are processed; `resolved` and `closed` issues are skipped +- [ ] Each issue is handed to `/tdd` as a non-interactive sub-agent invocation with the issue file and PRD as context +- [ ] Each issue file is updated to `Status: resolved` after successful `/tdd` completion +- [ ] All file and git operations use Claude tool calls, not shell scripts (cross-platform requirement) + +## Blocked by + +None — can start immediately. diff --git a/docs/issues/feature-runner/02-failure-handling.md b/docs/issues/feature-runner/02-failure-handling.md new file mode 100644 index 0000000..f3fde0a --- /dev/null +++ b/docs/issues/feature-runner/02-failure-handling.md @@ -0,0 +1,26 @@ +# Failure handling + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Extend the skill to handle `/tdd` sub-agent failures. When `/tdd` fails on an issue, append a failure note to that issue file under a `## Comments` heading, leave the issue at `Status: ready-for-agent`, stop the runner, and leave the worktree in place for inspection. Subsequent issues in the feature must not run — they could inherit a broken foundation. + +## Acceptance criteria + +- [ ] When `/tdd` fails on an issue, a failure note is appended to that issue file under `## Comments` (prefixed with the AI-generated disclaimer) +- [ ] The failing issue remains at `Status: ready-for-agent` after the failure note is appended +- [ ] The runner stops after the first failure; no subsequent issues in the feature are executed +- [ ] The worktree is left in place (not removed) on failure, for inspection +- [ ] The runner surfaces a clear failure message to the user naming which issue failed + +## Blocked by + +`docs/issues/feature-runner/01-skill-scaffold.md` diff --git a/docs/issues/feature-runner/03-pr-creation.md b/docs/issues/feature-runner/03-pr-creation.md new file mode 100644 index 0000000..84c03cb --- /dev/null +++ b/docs/issues/feature-runner/03-pr-creation.md @@ -0,0 +1,28 @@ +# PR creation and worktree cleanup + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Extend the skill so that when all issues in a feature are `resolved`, it opens a pull request targeting `develop` using the `gh` CLI. The PR title is derived from the feature slug and the PRD's `title` frontmatter field. The PR body references the feature PRD and lists all resolved issues. After the PR is opened successfully, the worktree is removed. + +Blocked on failure handling (02) so the runner never reaches PR creation when an issue has failed. + +## Acceptance criteria + +- [ ] When all issues in the feature are `resolved`, `gh pr create` is called targeting `develop` +- [ ] PR title is derived from the feature slug and the PRD's `title` frontmatter field +- [ ] PR body includes a reference to `docs/issues//PRD.md` and a list of all resolved issues +- [ ] The worktree is removed after the PR is opened successfully +- [ ] No PR is opened if the runner stopped due to a `/tdd` failure (failure handling from 02 prevents this) + +## Blocked by + +`docs/issues/feature-runner/02-failure-handling.md` diff --git a/docs/issues/feature-runner/04-progress-reporting.md b/docs/issues/feature-runner/04-progress-reporting.md new file mode 100644 index 0000000..6f3ce24 --- /dev/null +++ b/docs/issues/feature-runner/04-progress-reporting.md @@ -0,0 +1,24 @@ +# Progress reporting + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Extend the skill to emit a progress line before each `/tdd` invocation so the user can judge when to check back. The line should name the current issue number, the total count, and the issue title. + +## Acceptance criteria + +- [ ] Before each `/tdd` invocation, the runner outputs a line in the format: `Implementing issue N of M: ` +- [ ] N counts from 1 and M is the total number of `ready-for-agent` issues in the feature (excluding already `resolved` or `closed` ones at run start) +- [ ] The progress line is emitted regardless of whether failure handling or PR creation are in place + +## Blocked by + +`docs/issues/feature-runner/01-skill-scaffold.md` diff --git a/docs/issues/feature-runner/05-full-context-bundle.md b/docs/issues/feature-runner/05-full-context-bundle.md new file mode 100644 index 0000000..5b9b26c --- /dev/null +++ b/docs/issues/feature-runner/05-full-context-bundle.md @@ -0,0 +1,27 @@ +# Full context bundle for /tdd sub-agent invocations + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Extend the `/tdd` sub-agent invocation to use the full context bundle defined in ADR-0027, replacing the minimal bundle (issue file + PRD only) from the scaffold slice. The bundle adds: all sibling issue files in the feature directory, domain-scoped `CONTEXT.md`, domain-scoped ADRs, and the last 5 git commits. + +Scope is inferred by scanning the PRD for `apps/claude-code/` path references. If found, inject that plugin's `docs/adr/` and `CONTEXT.md`. If not found, inject the root `docs/adr/` and root `CONTEXT.md`. + +## Acceptance criteria + +- [ ] Each `/tdd` invocation receives the full bundle: issue file, PRD, all sibling issue files, scoped `CONTEXT.md`, scoped ADRs, last 5 git commits +- [ ] For a feature whose PRD references paths under `apps/claude-code//`, the plugin's `docs/adr/` and `CONTEXT.md` are injected +- [ ] For a feature whose PRD does not reference any `apps/claude-code//` path, the root `docs/adr/` and root `CONTEXT.md` are injected +- [ ] Root ADRs are not injected into plugin feature runs + +## Blocked by + +`docs/issues/feature-runner/01-skill-scaffold.md` diff --git a/docs/issues/feature-runner/06-dependency-graph.md b/docs/issues/feature-runner/06-dependency-graph.md new file mode 100644 index 0000000..8de3d22 --- /dev/null +++ b/docs/issues/feature-runner/06-dependency-graph.md @@ -0,0 +1,28 @@ +# Dependency graph and topological ordering + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Extend the skill to parse `## Blocked by` references from all issue files in the feature, build a topological execution order, and replace the numerical sort from the scaffold slice. If any `## Blocked by` reference conflicts with numerical filename order, halt with a descriptive error before executing any issue. Issues with `## Blocked by: None` (or equivalent) have no predecessors. + +See ADR-0028 for the rationale: `## Blocked by` is the canonical dependency signal; numerical order is a UX convenience, not an execution contract. + +## Acceptance criteria + +- [ ] `## Blocked by` references are parsed from all `NN-*.md` files in the feature directory before execution begins +- [ ] Issues are executed in topological order derived from `## Blocked by`, not numerical filename order +- [ ] Issues with `## Blocked by: None` or no `## Blocked by` section have no predecessors in the graph +- [ ] If a `## Blocked by` reference points to an issue with a higher number than the blocking issue (conflict with numerical order), the runner halts with a descriptive error before executing any issue +- [ ] The error message names the conflicting issues so the developer can resolve the ordering + +## Blocked by + +`docs/issues/feature-runner/01-skill-scaffold.md` diff --git a/docs/issues/feature-runner/07-auto-selection.md b/docs/issues/feature-runner/07-auto-selection.md new file mode 100644 index 0000000..9110287 --- /dev/null +++ b/docs/issues/feature-runner/07-auto-selection.md @@ -0,0 +1,26 @@ +# Auto-selection and LOOP_COMPLETE + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Extend the skill to handle no-argument invocation. When invoked without a slug, scan `docs/issues/` for features where every `NN-*.md` file is at `Status: ready-for-agent`. Select the first such feature alphabetically and run it. When no qualifying feature exists, emit `LOOP_COMPLETE` on its own line and exit cleanly. This makes `/loop /implement-feature` composable for overnight queue draining — the `/loop` skill catches `LOOP_COMPLETE` and terminates. + +## Acceptance criteria + +- [ ] Running `/implement-feature` with no argument scans `docs/issues/` and selects the first feature (alphabetically by slug) where all `NN-*.md` files are `Status: ready-for-agent` +- [ ] The selected feature is then executed using the same logic as the named-slug path +- [ ] When no qualifying feature exists, the skill outputs `LOOP_COMPLETE` on its own line and exits cleanly with no error +- [ ] Running `/loop /implement-feature` terminates cleanly after `LOOP_COMPLETE` is emitted +- [ ] Features with a mix of `ready-for-agent` and `resolved`/`closed` issues are not selected (partial features are skipped) + +## Blocked by + +`docs/issues/feature-runner/01-skill-scaffold.md` diff --git a/docs/issues/feature-runner/08-feature-runner-docs.md b/docs/issues/feature-runner/08-feature-runner-docs.md new file mode 100644 index 0000000..cd5029a --- /dev/null +++ b/docs/issues/feature-runner/08-feature-runner-docs.md @@ -0,0 +1,38 @@ +# `docs/agents/feature-runner.md` reference document + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Write `docs/agents/feature-runner.md` — the agent reference document for the Feature Runner. It must describe the full lifecycle as implemented (not as planned), and document the historical cleanup convention. It must not duplicate content from `docs/agents/issue-tracker.md`, which is overwritten by `setup-matt-pocock-skills`. + +The document should cover: + +- The Feature Runner lifecycle: feature selected → worktree created → issues implemented in topological order → PR opened → issues marked `closed` on merge +- The context bundle injected into each `/tdd` invocation (what it contains and why) +- The `## Blocked by` sequencing rule and what happens on conflict +- The historical cleanup convention: when a Spec in `docs/plans/` is marked `done` and a corresponding `docs/issues//` folder exists, manually mark all issue files `closed` and append a note referencing the Spec + +## Acceptance criteria + +- [ ] `docs/agents/feature-runner.md` exists +- [ ] It documents the full Feature Runner lifecycle with all phases named correctly using domain vocabulary from `CONTEXT.md` +- [ ] It documents the context bundle contents (issue, PRD, siblings, scoped CONTEXT.md, scoped ADRs, recent commits) +- [ ] It documents the `## Blocked by` sequencing rule and conflict behaviour +- [ ] It documents the historical cleanup convention with a concrete example of the note to append +- [ ] It does not duplicate content from `docs/agents/issue-tracker.md` + +## Blocked by + +- `docs/issues/feature-runner/03-pr-creation.md` +- `docs/issues/feature-runner/04-progress-reporting.md` +- `docs/issues/feature-runner/05-full-context-bundle.md` +- `docs/issues/feature-runner/06-dependency-graph.md` +- `docs/issues/feature-runner/07-auto-selection.md` diff --git a/docs/issues/feature-runner/09-retry-on-tdd-failure.md b/docs/issues/feature-runner/09-retry-on-tdd-failure.md new file mode 100644 index 0000000..75ef81f --- /dev/null +++ b/docs/issues/feature-runner/09-retry-on-tdd-failure.md @@ -0,0 +1,21 @@ +# Retry mechanism on /tdd failure + +**Status:** rejected +**Category:** enhancement + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## Summary + +Add a retry mechanism to `implement-feature` so the runner tries the `/tdd` invocation again before stopping on failure. + +## Rejection reasoning + +Grilled 2026-05-09. Both failure modes that could trigger a retry are not worth automating: + +- **Agent tool errors** (safety classifier, quota exceeded) have unpredictable resolution windows (minutes to hours). An immediate or short-delay retry is a coin flip and provides no reliable value. +- **Sub-agent gave up** — `/tdd` already has a built-in red-green loop that exhausts retries internally. If it exits with failure, a second invocation hits the same wall. This is a signal for human review, not automation. + +The existing "stop + note + leave worktree" behaviour from issue 02 is already the correct response for every failure mode that can occur in practice. diff --git a/docs/issues/feature-runner/10-references-split.md b/docs/issues/feature-runner/10-references-split.md new file mode 100644 index 0000000..a03ccff --- /dev/null +++ b/docs/issues/feature-runner/10-references-split.md @@ -0,0 +1,72 @@ +# Extract protocol strings to references/runner-output-formats.md + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Create `.claude/skills/implement-feature/references/runner-output-formats.md` containing all verbatim output strings that the runner emits or embeds. Remove those strings from `SKILL.md` and replace each with a reference to the file by name. + +The verbatim strings to extract are: + +1. **Progress line** — `Implementing issue N of M: ` +2. **Dependency conflict error block** — the four-line `Feature Runner error: dependency conflict detected.` message (including the two indented detail lines and the resolution instruction) +3. **Failure note** — the full `## Comments` markdown block appended to a failing issue file (including the blockquote attribution and the bold `Feature Runner failure —` sentence) +4. **PR body template** — the heredoc body passed to `gh pr create` (Feature section, Resolved issues list, and the 🤖 attribution line) +5. **LOOP_COMPLETE signal** — the bare `LOOP_COMPLETE` string with a note that it must appear on its own line + +Each entry in `references/runner-output-formats.md` should have a short heading, the exact string (in a fenced code block where multiline), and one sentence explaining when the runner emits it. + +`SKILL.md` procedural steps should reference the file by name rather than repeating the strings inline, reducing SKILL.md by approximately 50 lines. + +## Acceptance criteria + +- [ ] `.claude/skills/implement-feature/references/runner-output-formats.md` exists and contains all five strings listed above +- [ ] Each entry has a heading, a fenced code block with the exact string, and a one-sentence context note +- [ ] `SKILL.md` no longer contains the verbatim strings inline; each step references `references/runner-output-formats.md` by name instead +- [ ] The procedural logic in `SKILL.md` is unchanged — only the literal strings move +- [ ] `SKILL.md` line count is reduced to approximately 160 lines or fewer + +## Blocked by + +`docs/issues/feature-runner/01-skill-scaffold.md` + +## Comments + +> _This was generated by AI during triage._ + +## Agent Brief + +**Category:** enhancement +**Summary:** Extract verbatim runner output strings from `implement-feature` SKILL.md into a dedicated `references/runner-output-formats.md` file. + +**Current behavior:** +Five verbatim output strings are embedded inline inside the procedural steps of the `implement-feature` skill: the progress line, the dependency conflict error block, the failure note markdown block, the PR body heredoc, and the LOOP_COMPLETE signal. Updating any of these strings requires editing the step prose rather than a named reference file. + +**Desired behavior:** +A `references/runner-output-formats.md` file exists inside the `implement-feature` skill directory alongside `SKILL.md`. It contains all five strings, each under a short heading, in a fenced code block, with one sentence explaining when the runner emits it. The procedural steps in `SKILL.md` reference the file by name instead of repeating the strings inline. Procedural logic is unchanged — only the literal strings move. + +**Key interfaces:** + +- The `implement-feature` skill directory structure — gains a `references/` subdirectory, consistent with the `new-plugin` and `verify-spec` skills which also use `references/` +- `references/runner-output-formats.md` — new file; its five sections must exactly match the strings currently in SKILL.md (no editorial changes to the strings themselves) + +**Acceptance criteria:** + +- [ ] `references/runner-output-formats.md` exists in the `implement-feature` skill directory and contains all five strings listed in `## What to build` +- [ ] Each entry has a heading, a fenced code block with the exact string, and a one-sentence context note +- [ ] `SKILL.md` no longer contains the verbatim strings inline; each relevant step references `references/runner-output-formats.md` by name +- [ ] The procedural logic in `SKILL.md` is unchanged — only the literal strings move +- [ ] `SKILL.md` line count is reduced to approximately 160 lines or fewer + +**Out of scope:** + +- Changing the content of any output string (this is a structural move only) +- Adding new output strings or removing existing ones +- Updating `docs/agents/feature-runner.md` (it already documents the strings in prose) diff --git a/docs/issues/feature-runner/11-quick-start.md b/docs/issues/feature-runner/11-quick-start.md new file mode 100644 index 0000000..31656b3 --- /dev/null +++ b/docs/issues/feature-runner/11-quick-start.md @@ -0,0 +1,67 @@ +# Add Quick start section to SKILL.md + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Add a `## Quick start` section near the top of `.claude/skills/implement-feature/SKILL.md`, immediately after the opening paragraph and before `## Steps`. The section should show the three realistic invocation patterns as annotated bullet points, so a user can orient themselves before reading the full step-by-step. + +The three patterns: + +1. **Named run** — `/implement-feature pr-review-doc-context-enrichment` — targets a specific feature slug directly +2. **Auto-select** — `/implement-feature` with no argument — scans `docs/issues/` and picks the first fully `ready-for-agent` feature alphabetically +3. **Overnight loop** — `/loop /implement-feature` — composes with the `/loop` skill to drain the queue unattended; the runner emits `LOOP_COMPLETE` when no qualifying feature remains, which terminates the loop + +No separate `EXAMPLES.md` file is needed — `docs/agents/feature-runner.md` already covers lifecycle and mechanics in prose. + +## Acceptance criteria + +- [ ] A `## Quick start` section exists in `SKILL.md` between the opening paragraph and `## Steps` +- [ ] The section contains exactly the three invocation patterns listed above, each as a bullet with a concrete example command and a one-line description +- [ ] No new files are created (no `EXAMPLES.md`) +- [ ] The rest of `SKILL.md` is unchanged + +## Blocked by + +`docs/issues/feature-runner/01-skill-scaffold.md` + +## Comments + +> _This was generated by AI during triage._ + +## Agent Brief + +**Category:** enhancement +**Summary:** Add a `## Quick start` section to the `implement-feature` skill so users can orient themselves before reading the full 7-step workflow. + +**Current behavior:** +The `implement-feature` SKILL.md opens with a one-paragraph description and jumps directly into `## Steps`. A user invoking the skill for the first time must read all 7 steps to understand the three basic usage patterns. + +**Desired behavior:** +A `## Quick start` section appears between the opening paragraph and `## Steps`. It contains exactly three bullet points — one per invocation pattern — each with a concrete example command and a one-line description of what it does. No new files are created. + +**Key interfaces:** + +- The `implement-feature` SKILL.md top-level structure: frontmatter → opening paragraph → `## Quick start` → `## Steps` +- The three invocation patterns (content defined in `## What to build`) must match the behavior already implemented in Step 0 of SKILL.md + +**Acceptance criteria:** + +- [ ] A `## Quick start` section exists in SKILL.md between the opening paragraph and `## Steps` +- [ ] The section contains exactly three bullet points covering: named run, auto-select, and overnight loop via `/loop` +- [ ] Each bullet includes a concrete example command and a one-line description +- [ ] No new files are created (no `EXAMPLES.md` or other companion files) +- [ ] All other sections of SKILL.md are unchanged + +**Out of scope:** + +- Changing any step logic or the 7-step structure +- Creating an `EXAMPLES.md` file +- Updating `docs/agents/feature-runner.md` diff --git a/docs/issues/feature-runner/12-heredoc-note-in-references.md b/docs/issues/feature-runner/12-heredoc-note-in-references.md new file mode 100644 index 0000000..2c448ca --- /dev/null +++ b/docs/issues/feature-runner/12-heredoc-note-in-references.md @@ -0,0 +1,74 @@ +# Add heredoc wrapping note to PR body template in runner-output-formats.md + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +The PR body template in `references/runner-output-formats.md` shows the multiline body content but gives no instruction on how to pass it to `gh pr create`. Step 7 of SKILL.md currently shows `--body ""` as a placeholder, but an agent executing the step has no guidance on quoting a multiline string for the `--body` flag. + +Add a "how to use" note directly under the PR body template in `references/runner-output-formats.md` showing the bash heredoc wrapper: + +``` +Pass via a bash heredoc: + +gh pr create \ + --base develop \ + --title "feat(): " \ + --body "$(cat <<'EOF' + +EOF +)" +``` + +No changes to SKILL.md are needed — step 7 already directs the agent to `references/runner-output-formats.md` for the body template. + +## Acceptance criteria + +- [ ] The PR body template section in `references/runner-output-formats.md` includes a note explaining that the body must be passed via a bash heredoc +- [ ] The note shows a complete, correct `gh pr create` command with the heredoc wrapper +- [ ] SKILL.md step 7 is unchanged +- [ ] No other sections of `references/runner-output-formats.md` are modified + +## Blocked by + +`docs/issues/feature-runner/10-references-split.md` + +## Comments + +> _This was generated by AI during triage._ + +## Agent Brief + +**Category:** enhancement +**Summary:** Add heredoc wrapping instruction to the PR body template in `references/runner-output-formats.md` so agents know how to pass multiline content to `gh pr create`. + +**Current behavior:** +The PR body template section shows the body content in a fenced code block but gives no instruction on how to embed it in a `gh pr create` call. Step 7 of SKILL.md shows `--body ""` as a placeholder — an agent executing this step has no guidance on quoting a multiline string for `--body`. + +**Desired behavior:** +The PR body template section in `references/runner-output-formats.md` includes a follow-on note showing the complete `gh pr create` command with the body content wrapped in a bash heredoc (`--body "$(cat <<'EOF' ... EOF)"`). An agent reading the section finds both the template content and the exact quoting pattern needed to use it. + +**Key interfaces:** + +- The PR body template section in `references/runner-output-formats.md` — gains a "how to use" note after the fenced code block +- SKILL.md step 7 — unchanged; it already directs the agent to the references file + +**Acceptance criteria:** + +- [ ] The PR body template section in `references/runner-output-formats.md` includes a note explaining that the body must be passed via a bash heredoc +- [ ] The note shows a complete, correct `gh pr create` command with the heredoc wrapper +- [ ] SKILL.md step 7 is unchanged +- [ ] No other sections of `references/runner-output-formats.md` are modified + +**Out of scope:** + +- Changing the PR body template content itself +- Adding cross-platform alternatives to the heredoc (the cross-platform concern pre-dates this issue and is not being addressed here) +- Modifying SKILL.md diff --git a/docs/issues/feature-runner/13-tdd-prompt-template-in-references.md b/docs/issues/feature-runner/13-tdd-prompt-template-in-references.md new file mode 100644 index 0000000..34a5918 --- /dev/null +++ b/docs/issues/feature-runner/13-tdd-prompt-template-in-references.md @@ -0,0 +1,91 @@ +# Extract /tdd prompt template to references/tdd-prompt-template.md + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +The 23-line `/tdd` prompt template in step 4 of SKILL.md is the last large inline block. It is a structured template (not a verbatim output string) that the agent fills in dynamically at runtime, but its presence mid-step interrupts the prose flow and keeps SKILL.md at 177 lines — above the ≤160 acceptance criterion set in issue 10. + +Create `references/tdd-prompt-template.md` containing the full prompt template. Update step 4 of SKILL.md to reference the file by name instead of repeating the template inline. + +The template to extract (currently in step 4 of SKILL.md, inside a fenced code block): + +``` +You are running /tdd in AFK mode. The planning phase is complete — do not ask for confirmation. Use the acceptance criteria below as the pre-approved plan and proceed directly to the red→green→refactor loop. + +Working directory: .claude/worktrees/ + +--- ISSUE --- + + +--- PRD (parent context) --- +/PRD.md> + +--- SIBLING ISSUES --- + + +--- CONTEXT.md --- + + +--- ADRs --- + + +--- RECENT COMMITS (last 5) --- + +``` + +The new file should have a short heading, the template in a fenced code block, and a one-sentence note explaining that placeholders are filled at runtime. + +Step 4 in SKILL.md should replace the fenced block with: "Construct the prompt using the template in `references/tdd-prompt-template.md`, substituting all `` values at runtime." + +## Acceptance criteria + +- [ ] `references/tdd-prompt-template.md` exists in the `implement-feature` skill directory and contains the full prompt template +- [ ] The template in the new file is identical to the template currently in SKILL.md step 4 (no content changes) +- [ ] Step 4 of SKILL.md no longer contains the inline fenced block; it references `references/tdd-prompt-template.md` by name +- [ ] SKILL.md line count drops to approximately 154 lines or fewer +- [ ] No other steps in SKILL.md are modified + +## Blocked by + +`docs/issues/feature-runner/10-references-split.md` + +## Comments + +> _This was generated by AI during triage._ + +## Agent Brief + +**Category:** enhancement +**Summary:** Extract the 23-line `/tdd` prompt template from SKILL.md step 4 into `references/tdd-prompt-template.md` to bring SKILL.md under the ≤160 line target. + +**Current behavior:** +Step 4 of the `implement-feature` skill contains a 23-line fenced code block with the `/tdd` prompt template inline. SKILL.md is 177 lines — above the ≤160 target set in issue 10. The template is the last remaining large inline block. + +**Desired behavior:** +`references/tdd-prompt-template.md` exists alongside `references/runner-output-formats.md` in the `implement-feature` skill directory. It contains the prompt template (with a heading, fenced code block, and one-sentence note about runtime substitution). Step 4 of SKILL.md replaces the inline block with a single sentence referencing the file by name. SKILL.md drops to approximately 154 lines. + +**Key interfaces:** + +- `references/tdd-prompt-template.md` — new file; template content must be identical to what is currently inline in SKILL.md step 4 +- SKILL.md step 4 — loses the inline fenced block, gains a one-sentence reference to the new file; all other prose in step 4 is unchanged + +**Acceptance criteria:** + +- [ ] `references/tdd-prompt-template.md` exists and contains the full prompt template, identical to the current inline version +- [ ] Step 4 of SKILL.md references `references/tdd-prompt-template.md` by name instead of repeating the template +- [ ] SKILL.md line count drops to approximately 154 lines or fewer +- [ ] No other steps in SKILL.md are modified + +**Out of scope:** + +- Changing the prompt template content (structural move only) +- Modifying `references/runner-output-formats.md` +- Updating `docs/agents/feature-runner.md` or ADR 0029 diff --git a/docs/issues/feature-runner/14-smarter-auto-select.md b/docs/issues/feature-runner/14-smarter-auto-select.md new file mode 100644 index 0000000..99069b2 --- /dev/null +++ b/docs/issues/feature-runner/14-smarter-auto-select.md @@ -0,0 +1,62 @@ +# Smarter auto-select: resume partial features and reuse existing worktrees + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Relax the auto-selection qualification rule so that partially-completed features (e.g. after a failure mid-run) are included alongside fresh ones. Update the worktree creation step to reuse an existing worktree rather than failing or recreating it. + +### Auto-select qualification (Step 0) + +**Current rule:** a feature qualifies only if every `NN-*.md` file is exactly `ready-for-agent`. + +**New rule:** a feature qualifies if: + +- Every `NN-*.md` file has a status in `{ready-for-agent, resolved, closed, rejected, ready-for-human}` +- At least one `NN-*.md` file has status `ready-for-agent` + +Any issue in `{needs-triage, needs-info, needs-specs}` (or any unrecognised state) disqualifies the whole feature — it is not fully prepped for autonomous execution. + +Alphabetical slug ordering is unchanged. + +### Worktree creation (Step 2) + +Before running `git worktree add`, check whether `.claude/worktrees/` already exists: + +``` +ls .claude/worktrees/ +``` + +- **Exists** (prior failed run) → skip `git worktree add` and reuse the existing worktree. The branch `feature/afk/` already contains the committed work from that run. +- **Does not exist** → create it as before: + +``` +git worktree add .claude/worktrees/ -b feature/afk/ develop +``` + +### Quick start note + +Add a note to the Quick start section explaining that a failed partial feature requires named invocation (`/implement-feature `) only if the loop is not running — if the loop is running, the new auto-select rule will pick it up automatically once the developer fixes the failing issue and leaves it at `ready-for-agent`. + +## Acceptance criteria + +- [ ] Auto-select picks up a feature where some issues are `resolved` and at least one is `ready-for-agent` +- [ ] Auto-select skips a feature that has any issue in `needs-triage`, `needs-info`, or `needs-specs` +- [ ] Auto-select still skips a feature where every issue is `resolved`, `closed`, or `rejected` (nothing left to run) +- [ ] Issues with status `ready-for-human` or `rejected` do not disqualify a feature +- [ ] When `.claude/worktrees/` already exists, the runner reuses it and does not call `git worktree add` +- [ ] When `.claude/worktrees/` does not exist, the runner creates it as before +- [ ] The execution queue (Step 3) continues to filter to `ready-for-agent` only — `resolved`, `closed`, `rejected`, and `ready-for-human` issues act only as satisfied dependency nodes +- [ ] `docs/agents/feature-runner.md` — Feature selection section updated to reflect the new qualification rule +- [ ] SKILL.md Quick start updated to note loop auto-resume behaviour after failure fix + +## Blocked by + +None — can start immediately. diff --git a/docs/issues/feature-runner/15-ready-for-human-unsatisfied-dependency.md b/docs/issues/feature-runner/15-ready-for-human-unsatisfied-dependency.md new file mode 100644 index 0000000..97dbba9 --- /dev/null +++ b/docs/issues/feature-runner/15-ready-for-human-unsatisfied-dependency.md @@ -0,0 +1,71 @@ +# Halt when a ready-for-agent issue depends on a ready-for-human blocker + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Extend the step 3 dependency validation so that `ready-for-human` blockers are treated as **unsatisfied** dependencies. If a `ready-for-agent` issue in the execution queue has a `## Blocked by` dependency whose status is `ready-for-human`, the runner must halt before executing anything. + +Also clarify the "satisfied" language in step 3 and `docs/agents/feature-runner.md` to be explicit about which statuses satisfy dependencies and which do not. + +### Dependency satisfaction model + +| Status | Satisfies dependents? | Executed? | +| ----------------- | --------------------- | --------------------------------------------------------- | +| `resolved` | Yes | No | +| `closed` | Yes | No | +| `rejected` | Yes | No — rejection is a terminal decision; dependents proceed | +| `ready-for-human` | **No** | No — human work may not be done; dependents are blocked | +| `ready-for-agent` | — | Yes | + +### New validation check (Step 3) + +After building the topological execution order, before executing anything, add a second check: + +For each `ready-for-agent` issue in the execution queue, inspect its `## Blocked by` list. If any listed blocker has status `ready-for-human`, halt immediately with the **unsatisfied dependency error** (see `references/runner-output-formats.md`). + +### New error format (`references/runner-output-formats.md`) + +Add an **unsatisfied dependency error** section: + +``` +Feature Runner error: unsatisfied dependency. + Issue NN- is blocked by NN-, but NN- has status ready-for-human. + Complete issue NN- manually (or update its status) before re-running /implement-feature . +``` + +### Language fix (Step 3 and `docs/agents/feature-runner.md`) + +Replace: + +> "`resolved` and `closed` issues are already satisfied and act only as satisfied dependencies, not as items to execute." + +With: + +> "`resolved`, `closed`, and `rejected` issues are satisfied and act as satisfied dependency nodes — not items to execute. `ready-for-human` issues are not satisfied: if a `ready-for-agent` issue depends on one, the runner halts before executing anything." + +## Rationale + +`ready-for-human` means the work still needs to be done — just not by the runner. If a `ready-for-agent` issue depends on that work, executing it on an incomplete foundation risks a correct-but-wrong implementation. + +`rejected` is treated as satisfied because it is a terminal, intentional decision: the team consciously ruled the work out. A dependent issue proceeding without a rejected blocker is exactly what triage intended. (Typical scenario: team rejects issue 01 after initial triage; issue 02 which was blocked by 01 should still run.) + +## Acceptance criteria + +- [ ] Step 3 of SKILL.md checks each `ready-for-agent` issue's blockers for `ready-for-human` status before executing anything +- [ ] If any such blocker is found, the runner halts with the unsatisfied dependency error before executing any issue +- [ ] `references/runner-output-formats.md` contains the new unsatisfied dependency error format +- [ ] Step 3 "satisfied" language lists `resolved`, `closed`, and `rejected` as satisfied; notes `ready-for-human` as unsatisfied +- [ ] `docs/agents/feature-runner.md` updated to reflect the same model +- [ ] A `rejected` blocker does not halt the runner (treated as satisfied) + +## Blocked by + +None — can start immediately. diff --git a/docs/issues/feature-runner/16-explicit-tdd-skill-invocation.md b/docs/issues/feature-runner/16-explicit-tdd-skill-invocation.md new file mode 100644 index 0000000..ccc5846 --- /dev/null +++ b/docs/issues/feature-runner/16-explicit-tdd-skill-invocation.md @@ -0,0 +1,49 @@ +# Explicit `/tdd` skill invocation and pinned `subagent_type` in step 4 + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +The Feature Runner's step 4 invokes `/tdd` via the Agent tool but does not pin `subagent_type` and does not instruct the sub-agent to explicitly load the `/tdd` skill. The current prompt template frames the sub-agent as "running /tdd in AFK mode" and relies on trigger-phrase auto-discovery to load `/tdd`'s procedural guidance — an unreliable mechanism. + +ADR-0029 ratifies replacing `/tdd`'s _planning_ phase with acceptance criteria. The rest of the skill (anti-horizontal-slicing, tracer-bullet loop, refactor-only-when-green, deep-modules / interface-design / refactoring sub-references) is still load-bearing and must be loaded deterministically. + +### Changes + +**1. `SKILL.md` step 4** — pin the subagent type. + +Replace the current "invoke `/tdd` as a non-interactive sub-agent using the Agent tool" sentence with one that names the subagent type explicitly: + +> Invoke the sub-agent using the Agent tool with `subagent_type: general-purpose` — the only stock type with access to both the `Skill` tool (to load `/tdd`) and `Edit`/`Write` tools (to write code). + +**2. `references/tdd-prompt-template.md`** — prepend an explicit skill invocation instruction. + +At the top of the prompt body (before the AFK framing), add: + +``` +Begin by invoking the `tdd` skill via the Skill tool to load its full procedural guidance (red→green→refactor, vertical-slice rule, deep-modules / interface-design / refactoring sub-references). Then follow it using the acceptance criteria below as the pre-approved plan — the planning phase is complete; do not ask for confirmation. +``` + +**3. `docs/adr/0029-feature-runner-afk-invocation.md`** — amend the Consequences section. + +Append: + +> The Feature Runner's prompt template explicitly instructs the sub-agent to invoke the `tdd` skill via the Skill tool, rather than relying on auto-discovery via trigger phrases. This makes the procedural-guidance load deterministic across Claude Code surfaces. + +## Acceptance criteria + +- [ ] `SKILL.md` step 4 names `subagent_type: general-purpose` for the `Agent` call +- [ ] `references/tdd-prompt-template.md` opens with an explicit Skill-tool invocation instruction for `tdd`, placed before the AFK framing +- [ ] `docs/adr/0029-feature-runner-afk-invocation.md` Consequences section reflects the explicit-invocation policy +- [ ] The existing AFK framing ("The planning phase is complete — do not ask for confirmation") is preserved after the new instruction + +## Blocked by + +None — can start immediately. diff --git a/docs/issues/feature-runner/17-prd-title-extraction-step-1.md b/docs/issues/feature-runner/17-prd-title-extraction-step-1.md new file mode 100644 index 0000000..c027c7c --- /dev/null +++ b/docs/issues/feature-runner/17-prd-title-extraction-step-1.md @@ -0,0 +1,32 @@ +# Extract PRD `title:` frontmatter in step 1 + +**Status:** closed +**Category:** bug + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Step 7 of SKILL.md says: _"Derive the PR title from the PRD's `title` frontmatter field (already read in step 1)"_. But step 1 only says to read the PRD and scan it for `apps/claude-code//` path references. It never says to extract the `title:` value from the YAML frontmatter. + +A re-implementer or a sub-agent following step 1 verbatim would complete the step without capturing the PRD title, then encounter the step 7 parenthetical with no value to substitute. + +### Change + +In `SKILL.md` step 1, under "**Read the PRD:**", add one explicit instruction after the plugin-path scan: + +> Also extract the `title:` field from the PRD's YAML frontmatter (the value between the opening `---` and closing `---` at the top of the file). Retain it for use in step 7's PR title derivation. + +## Acceptance criteria + +- [ ] Step 1 of SKILL.md explicitly instructs the runner to extract the PRD's `title:` YAML frontmatter value +- [ ] Step 7's parenthetical "(already read in step 1)" remains accurate — the PRD title is gathered in step 1, not on-demand in step 7 +- [ ] No other changes to SKILL.md + +## Blocked by + +None — can start immediately. diff --git a/docs/issues/feature-runner/18-failure-loop-protection.md b/docs/issues/feature-runner/18-failure-loop-protection.md new file mode 100644 index 0000000..6636297 --- /dev/null +++ b/docs/issues/feature-runner/18-failure-loop-protection.md @@ -0,0 +1,73 @@ +# Protect `/loop` from re-picking a failed Feature (flip status to `needs-info`) + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +When `/tdd` fails on an issue, the Feature Runner appends a failure note and stops — but leaves the failing issue at `ready-for-agent`. Under `/loop /implement-feature`, auto-select then re-picks the same Feature on the next iteration (because it still has a `ready-for-agent` issue) and retries the same failing issue, burning loop iterations indefinitely. + +The fix: on `/tdd` failure, flip the failing issue's `**Status:**` line from `ready-for-agent` to `needs-info`. `needs-info` is an existing triage label ("waiting on reporter for more information") and is already in the auto-select disqualification list at step 0 — no vocabulary expansion is required. + +### Changes + +**1. `SKILL.md` step 4 "On failure"** + +After appending the failure note, add a second edit: + +> Using the Edit tool, change the `**Status:** ready-for-agent` line to `**Status:** needs-info`. This prevents auto-select from picking up this Feature on subsequent loop iterations. + +**2. `references/runner-output-formats.md` — failure note** + +Update the failure note to mention the status flip and the recovery procedure: + +```markdown +## Comments + +> _This was generated by AI during triage._ + +**Feature Runner failure** — `/tdd` could not complete this issue. Status has been set to `needs-info`. + +The worktree at `.claude/worktrees/` has been left in place for inspection. Once the issue is resolved manually, restore `**Status:** ready-for-agent` and re-run `/implement-feature ` to resume. Alternatively, close or reject the issue if it should not be retried. +``` + +**3. `docs/agents/feature-runner.md` — "Failure behaviour" section** + +Update step 2 of the numbered list: + +Replace: + +> The issue remains at `Status: ready-for-agent`. + +With: + +> The issue status is changed to `needs-info`. This prevents the auto-selection path from picking up this Feature on subsequent `/loop` iterations until the developer investigates and restores `ready-for-agent` (or closes/rejects the issue). + +**4. `SKILL.md` Quick start — "Safe to interrupt" bullet** + +Distinguish Ctrl+C from `/tdd` failure explicitly. Update the bullet: + +> **Safe to interrupt** — Ctrl+C during any issue leaves that issue at `ready-for-agent`; re-running resumes from the first unresolved issue. Note: a **/tdd failure** (as opposed to a Ctrl+C interrupt) sets the failing issue to `needs-info`, preventing auto-select from retrying until the developer intervenes. + +## Rationale + +The overnight-loop use case is the primary composition for the Feature Runner. An auto-select that retries the same broken Feature undermines the purpose of `/loop`. `needs-info` already signals "blocked on a human" in the triage vocabulary — exactly the right state for a Feature Runner failure. + +## Acceptance criteria + +- [ ] On `/tdd` failure, the failing issue's `**Status:**` is changed to `needs-info` (not left at `ready-for-agent`) +- [ ] Subsequent auto-select runs skip the Feature (because `needs-info` already disqualifies a Feature in step 0) +- [ ] The failure note in `references/runner-output-formats.md` documents the `needs-info` flip and the recovery procedure (restore `ready-for-agent` and re-run, or close/reject) +- [ ] `docs/agents/feature-runner.md` "Failure behaviour" section reflects the status flip to `needs-info` +- [ ] SKILL.md Quick start distinguishes Ctrl+C (leaves `ready-for-agent`) from `/tdd` failure (sets `needs-info`) +- [ ] Manual interrupt (`Ctrl+C`) behaviour is unchanged: issue stays `ready-for-agent` + +## Blocked by + +None — can start immediately. diff --git a/docs/issues/feature-runner/19-skill-agents-doc-crosslink.md b/docs/issues/feature-runner/19-skill-agents-doc-crosslink.md new file mode 100644 index 0000000..0597a54 --- /dev/null +++ b/docs/issues/feature-runner/19-skill-agents-doc-crosslink.md @@ -0,0 +1,39 @@ +# Add cross-link from SKILL.md to `docs/agents/feature-runner.md` + +**Status:** closed +**Category:** documentation + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +`SKILL.md` and `docs/agents/feature-runner.md` serve distinct audiences and should remain separate: + +- `SKILL.md` — runtime guidance for the orchestrating Claude instance. +- `docs/agents/feature-runner.md` — human-facing reference: lifecycle diagram, dependency-satisfaction matrix, ADR scope, historical cleanup convention. + +Currently there is no link from SKILL.md to the agents doc. A human reading the skill (or a developer extending it) has no signal that a fuller reference exists. + +### Change + +In `SKILL.md` Supporting Documentation section, add one line pointing to the agents doc: + +```markdown +- **`docs/agents/feature-runner.md`** — human-facing reference: lifecycle diagram, dependency-satisfaction matrix, ADR scope, historical cleanup convention. Maintained in parallel; consult for broader context, not for runtime instructions. +``` + +No prose is removed from either file. + +## Acceptance criteria + +- [ ] `SKILL.md` Supporting Documentation section includes a pointer to `docs/agents/feature-runner.md` with a one-line description +- [ ] No prose is removed from SKILL.md or `docs/agents/feature-runner.md` +- [ ] SKILL.md word count remains under 2,000 words after the addition + +## Blocked by + +None — can start immediately. diff --git a/docs/issues/feature-runner/20-prompt-template-and-step-cleanup.md b/docs/issues/feature-runner/20-prompt-template-and-step-cleanup.md new file mode 100644 index 0000000..127c0e3 --- /dev/null +++ b/docs/issues/feature-runner/20-prompt-template-and-step-cleanup.md @@ -0,0 +1,60 @@ +# Prompt-template deduplication and step-1/step-4 wording cleanup + +**Status:** closed +**Category:** documentation + +> _This was generated by AI during triage._ + +## Parent + +`docs/issues/feature-runner/PRD.md` + +## What to build + +Three small cleanups surfaced in the second-pass review of issues 16–19. + +### 20.a — Deduplicate prompt-template framing + +`references/tdd-prompt-template.md` currently opens with two paragraphs that both say "the planning phase is complete; do not ask for confirmation" and "use the acceptance criteria as the pre-approved plan." This is a side-effect of issue 16 prepending the explicit `tdd` Skill-tool-load instruction while preserving the original AFK framing verbatim. Every `/tdd` sub-agent invocation receives the duplicate. + +Replace both paragraphs with one consolidated paragraph that retains all unique signal: AFK identity, planning-phase-complete, explicit `tdd` Skill tool load, acceptance-criteria-as-plan, red→green→refactor loop entry. + +**Replacement** (everything before the `Working directory:` line): + +``` +You are running `/tdd` in AFK mode. The interactive planning phase is complete — do not ask for confirmation. Begin by invoking the `tdd` skill via the Skill tool to load its full procedural guidance (red→green→refactor, vertical-slice rule, deep-modules / interface-design / refactoring sub-references), then follow it using the acceptance criteria below as the pre-approved plan and proceed directly to the red→green→refactor loop. +``` + +### 20.b — Tighten step 1 closing line + +`SKILL.md` step 1 closing line (currently "These four items (PRD, CONTEXT.md, ADRs, recent commits) are static") was not updated when issue 17 added the PRD `title:` frontmatter parse. The PRD entry should reflect that both content and title are captured. + +**Replacement:** + +> These items (PRD content + title, CONTEXT.md, ADRs, recent commits) are static — gather them once before the issue loop begins. + +### 20.c — Reorder step 4 "On failure" sub-steps + +`SKILL.md` step 4 "On failure" currently appends the failure note (whose text says "Status has been set to `needs-info`") _before_ flipping the status to `needs-info`. This briefly leaves the file self-contradicting: the note says the status is `needs-info` while the status line still reads `ready-for-agent`. + +Swap the order so the status flip happens first: + +1. Using the Edit tool, change the `**Status:** ready-for-agent` line to `**Status:** needs-info`. This prevents auto-select from picking up this Feature on subsequent loop iterations. +2. Append the **failure note** (see `references/runner-output-formats.md`) to the issue file using the Edit tool, substituting ``. +3. Stop the runner immediately. Do not execute any subsequent issues — they may depend on a foundation this issue was meant to lay. +4. Report to the user: which issue failed, that the worktree is at `.claude/worktrees/` on branch `feature/afk/`, and that no subsequent issues were run. + +`docs/agents/feature-runner.md` "Failure behaviour" should reflect the same numbered order (status flip is step 2 in the list, failure note is step 1 currently — update to match). + +## Acceptance criteria + +- [ ] `references/tdd-prompt-template.md` contains exactly one paragraph framing the AFK invocation, before the `Working directory:` line +- [ ] The single paragraph mentions: AFK mode, planning-phase-complete, `tdd` Skill tool load via Skill tool, acceptance criteria as pre-approved plan, red→green→refactor loop +- [ ] No information from the previous two-paragraph version is lost +- [ ] `SKILL.md` step 1 closing line names "PRD content + title" (or equivalent) alongside CONTEXT.md, ADRs, and recent commits +- [ ] `SKILL.md` step 4 "On failure" performs the status flip (sub-step 1) before appending the failure note (sub-step 2) +- [ ] `docs/agents/feature-runner.md` "Failure behaviour" numbered list reflects the same ordering (status flip before note) + +## Blocked by + +None — can start immediately. diff --git a/docs/issues/feature-runner/PRD.md b/docs/issues/feature-runner/PRD.md new file mode 100644 index 0000000..1a8082f --- /dev/null +++ b/docs/issues/feature-runner/PRD.md @@ -0,0 +1,170 @@ +--- +title: Feature Runner — issue queue runner for the AI-development cycle +created: 2026-05-09 +--- + +**Status:** closed +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Problem Statement + +The AI-development cycle in this repo follows the workflow: +`/grill-with-docs` → `/to-prd` → `/to-issues` → `/triage` → implementation. + +The final step has no automation. Once issues reach `ready-for-agent`, each one must be manually handed to `/tdd` one at a time. There is no skill that picks up a feature's issues, works through them sequentially in an isolated branch, and opens a PR when done. The queue drains only as fast as a human feeds it. + +By contrast, the Spec Runner (`pnpm ralph`) fully automates the `docs/plans/` workflow end-to-end. The `docs/issues/` workflow has no equivalent. + +A secondary consequence: when work was completed via the Spec Runner in parallel with issues being tracked in `docs/issues/`, those issues were never marked resolved. There is no documented convention for keeping the two systems in sync. + +## Solution + +Introduce a **Feature Runner** — a new Claude Code skill (`/implement-feature`) that automates the implementation side of the AI-development cycle. + +The Feature Runner takes a feature slug (or auto-selects the next `ready-for-agent` feature), creates an isolated git worktree and branch, works through the feature's numbered issues in order using `/tdd`, marks each issue `resolved` on completion, and opens a pull request targeting `develop` when all issues are done. On failure it stops and surfaces the error to the user. + +The skill is invocable both interactively (user picks a feature) and autonomously (no argument — auto-selects from the queue), making it composable with `/loop` for overnight queue draining. + +Alongside the skill, the domain vocabulary is extended ("Feature", "Feature Runner") and a new agent reference document records the Feature Runner lifecycle and the historical cleanup convention. + +## User Stories + +1. As a developer, I want to run `/implement-feature pr-review-orchestrator-split` so that the entire feature is implemented in one isolated branch without me manually coordinating each issue. +2. As a developer, I want to run `/implement-feature` with no argument so that the runner auto-selects the next `ready-for-agent` feature and starts working on it. +3. As a developer, I want to compose `/loop /implement-feature` so that the runner continuously drains the issue queue overnight without supervision. +4. As a developer, I want each feature to be implemented in its own git worktree and branch so that parallel features (if run in separate terminals) do not conflict. +5. As a developer, I want the runner to work through a feature's issues in dependency order (derived from `## Blocked by`) so that sequential dependencies between issues are always respected. +6. As a developer, I want each issue to be handed to `/tdd` with a full context bundle (issue file, PRD, sibling issues, CONTEXT.md, scoped ADRs, recent commits) so that the agent has the shared vision and architectural constraints without needing interactive clarification. +7. As a developer, I want each issue file to be marked `resolved` as it is completed so that I can see progress at a glance in `docs/issues/`. +8. As a developer, I want the runner to open a pull request targeting `develop` automatically when all issues in a feature are done so that I can review the work without extra steps. +9. As a developer, I want the runner to stop and surface the failure to me if `/tdd` cannot complete an issue so that I can intervene rather than having subsequent issues run on a broken foundation. +10. As a developer, I want the feature's issues to be marked `closed` when the PR is merged so that the issue tracker accurately reflects completed work. +11. As a developer, I want the runner to skip features that have no `ready-for-agent` issues so that I never get an error when the queue is empty. +12. As a developer, I want the runner to work on macOS, Linux, and Windows so that team members on any OS can use it without workarounds. +13. As a developer, I want to understand the relationship between Features and Specs using precise vocabulary so that I can discuss the two workflows without confusion. +14. As a developer, I want a documented convention for marking `docs/issues/` files `closed` when the corresponding Spec is marked `done` so that historical drift does not accumulate. +15. As a developer, I want the runner to tell me which issue it is working on and what the overall progress is (e.g. "issue 2 of 5") so that I can judge when to check back. +16. As a developer, I want to be able to interrupt a running Feature Runner without corrupting the issue state so that a partial run can be resumed safely. + +## Implementation Decisions + +### New skill: `/implement-feature` + +- Implemented as a Claude Code skill at `.claude/skills/implement-feature/SKILL.md`. +- No Node.js code — the skill uses Claude's built-in tools (file reads/writes, Bash for git and gh CLI, Agent tool for `/tdd` sub-invocations). +- Invocation: `/implement-feature [slug]`. With a slug, targets that feature directly and verifies that the directory exists, contains at least one `NN-*.md` file, and has at least one `ready-for-agent` issue — stopping with a specific error if any check fails. Without a slug, scans `docs/issues/` for features that **qualify** (at least one issue at `ready-for-agent` and every other issue in `{resolved, closed, rejected, ready-for-human}`) and picks the first alphabetically. The full qualification rule lives in `.claude/skills/implement-feature/SKILL.md` step 0. +- When invoked with no argument and the queue is empty, the skill outputs `LOOP_COMPLETE` before exiting. This is the configured `completion_promise` in `ralph.yml` and is the signal that both `/loop` (the Claude Code skill) and `ralph-orchestrator` use to stop the loop. The skill must emit this string on a line of its own so loop drivers can detect it reliably. +- Cross-platform: all git and file operations expressed as Claude tool calls, not shell scripts or POSIX paths. + +### Feature isolation + +- One git worktree per feature, created from the current `develop` branch. +- Branch naming: `feature/afk/`. +- The worktree is created at the start and removed after the PR is opened (or on failure, left for inspection). + +### Issue sequencing + +- Issues are discovered by reading `docs/issues//` and collecting files matching `NN-*.md`. +- Only files with `Status: ready-for-agent` are executed. Files already `resolved`, `closed`, or `rejected` are kept in the dependency graph as satisfied nodes but skipped for execution (supports resuming a partially completed feature). +- **`## Blocked by` is the canonical dependency signal, not numerical filename order.** Numerical ordering is a UX convenience produced by `to-issues` (it publishes blockers first so numbers usually match), but it is not an execution contract. The runner builds a topological order from `## Blocked by` references before executing. If `## Blocked by` references conflict with numerical order, the runner halts with an error rather than proceeding in the wrong order. If a `## Blocked by` reference names a file that does not exist in the feature directory, the runner also halts before executing anything (missing-blocker error) — it does not silently treat the reference as satisfied. +- Each issue is handed to `/tdd` as a non-interactive sub-agent invocation with the full context bundle (see below). + +### Context bundle + +The runner assembles a context bundle for each `/tdd` sub-agent invocation. The bundle contains: + +- **Issue file** — the `## What to build` and `## Acceptance criteria` that replace `/tdd`'s interactive planning phase (see AFK invocation below). +- **PRD** — read from `docs/issues//PRD.md` (slug-derived; the `## Parent` link on issue files is informational for human readers only). Carries the "why" and the shared vision from the grilling session; without it, `/tdd` reasons from a vertical slice with no broader context, risking a correct-but-wrong implementation. +- **Sibling issue files** — all other issues in the feature directory. Provides dependency awareness and "what is already resolved" signal without relying on the runner to summarise prior work. +- **CONTEXT.md** — the domain glossary scoped to the feature (see ADR scoping below). Ensures test names and interface vocabulary match the project's language. +- **Scoped ADRs** — the architectural decisions that constrain the implementation (see ADR scoping below). +- **Recent commits** — the last 5 git commits. The grilling and PRD process often produces changes to CONTEXT.md and ADRs; those changes land in commits before the Feature Runner runs. Commits carry the ideation trail that informed the PRD. + +### AFK invocation + +`/tdd` is invoked non-interactively via the Agent tool (no TTY). In interactive mode, `/tdd`'s planning phase asks the user to confirm the interface changes and prioritise which behaviours to test before writing any code. In AFK mode there is no user to ask. The issue's `## Acceptance criteria` serves as the pre-answered plan: it replaces the planning conversation and gives `/tdd` a concrete, human-reviewed definition of done to work toward. This mirrors how Matt Pocock's AFK loop (`afk.sh`) passes issue files as the implicit plan to a non-interactive Claude invocation. + +### ADR scoping + +ADRs injected into the context bundle are scoped to the domain of the feature, not the monorepo: + +- **Plugin feature** (PRD references paths under `apps/claude-code//`) → inject that plugin's `docs/adr/` and `CONTEXT.md`. +- **Repo/tooling feature** (PRD references paths under `.claude/`, `docs/`, `packages/`, etc.) → inject the root `docs/adr/` and root `CONTEXT.md`. + +Scope is inferred by scanning the PRD for `apps/claude-code/` path references. Root ADRs (versioning, tagging, CI tooling) are noise for plugin implementation work and must not be injected into plugin feature runs. + +### State transitions + +- Issue file: `ready-for-agent` → `resolved` after `/tdd` completes successfully. +- Feature (all issues): after PR is opened, the runner does not automatically mark issues `closed` — that happens when the PR is merged (manual or via a future hook). +- On `/tdd` failure: the failing issue is flipped to `needs-info` with a failure note appended; the runner stops. This prevents `/loop /implement-feature` from re-picking the same feature until a developer triages the failure. A Ctrl+C interrupt (as opposed to a `/tdd` failure) leaves the issue at `ready-for-agent`. + +### PR creation + +- PR is opened automatically using the `gh` CLI, targeting `develop`. +- PR title: derived from the feature slug and PRD title. +- PR body: references the feature PRD and lists the resolved issues. +- If `git push` or `gh pr create` fails, the runner stops without removing the worktree and reports the failure to the user; all issues remain at `resolved` since the failure is post-implementation. + +### Auto-selection heuristic + +- When invoked with no argument, the runner selects the feature with the earliest alphabetical slug that **qualifies**: at least one issue at `ready-for-agent`, and every other issue in `{resolved, closed, rejected, ready-for-human}`. Any issue at `needs-triage`, `needs-info`, or `needs-specs` disqualifies the whole feature. Features where everything is already `resolved`/`closed`/`rejected` (nothing left to run) are also skipped. Partial features (mix of completed and `ready-for-agent` issues) are picked up — this enables resuming after a failure fix. +- If no qualifying feature exists, the runner outputs `LOOP_COMPLETE` and exits. This is the stop signal that the `/loop` skill catches to terminate an overnight draining run cleanly. It mirrors the Spec Runner's `completion_promise: LOOP_COMPLETE` in `ralph.yml`. + +### CONTEXT.md vocabulary + +- **Feature**: a self-contained unit of work tracked as a directory under `docs/issues//`, containing a PRD and numbered implementation issues. The atomic input to the Feature Runner. + - _Avoid_: ticket, epic, story +- **Feature Runner**: the skill that implements a Feature's issues end-to-end in one worktree, branch, and PR. Currently backed by the `/implement-feature` skill. + - _Avoid_: issue runner, queue runner +- Relationship added: "A **Feature** drives one **Feature Runner** execution. A **Feature** is to the Feature Runner what a **Spec** is to the Spec Runner." + +### `docs/agents/feature-runner.md` + +- Documents the Feature Runner lifecycle: feature selected → worktree created → issues implemented in order → PR opened → issues closed on merge. +- Documents the historical cleanup convention: when a Spec in `docs/plans/` is marked `done` and a corresponding `docs/issues//` folder exists, manually mark all issue files in that folder `closed` and append a note referencing the Spec. +- This file is repo-specific and must not be placed in `docs/agents/issue-tracker.md` (that file is overwritten by `setup-matt-pocock-skills`). + +## Testing Decisions + +The Feature Runner is a Claude Code skill (markdown). There is no Node.js code to unit test. + +Acceptance is verified behaviorally: + +- Run `/implement-feature ` on a real feature with `ready-for-agent` issues and verify the full lifecycle (worktree, branch, issues resolved, PR opened). +- Run `/implement-feature` with no argument on a queue that has one ready feature and verify auto-selection. +- Run `/implement-feature` with no argument on an empty queue and verify clean exit. +- Simulate a `/tdd` failure mid-feature and verify the runner stops, surfaces the error, and leaves issue state intact. + +No prior test art applies (skills are not currently covered by `node:test`). + +## Out of Scope + +- **Cross-platform replacement for `ralph-orchestrator`**: the Spec Runner (`pnpm ralph`) has a Windows incompatibility and a different TUI model. Addressing that is a separate, larger effort. +- **Parallel issue execution within a feature**: the dependency graph built from `## Blocked by` references reveals which issues have no blockers and could run concurrently. The Feature Runner serialises all execution regardless — one issue at a time. This is an explicit decision, not an omission: single-developer AFK throughput does not require parallelism, and serialisation makes failure diagnosis straightforward. +- **Parallel feature execution**: running multiple features simultaneously in separate worktrees. Not needed for one-developer AFK overnight throughput; can be added later. +- **Merging `docs/plans/` and `docs/issues/` into one system**: the two workflows serve different purposes (Spec Runner for sequential batch work, Feature Runner for issue-tracker-driven work). Unification is out of scope. +- **Automatic `closed` transition on PR merge**: marking issues `closed` when a PR merges requires a git hook or CI step. Out of scope for this PRD; the runner stops at `resolved`. +- **Priority ordering across features**: the auto-select heuristic is alphabetical. A richer priority model (e.g. dependency graph across features) is not needed now. + +## Further Notes + +The Feature Runner completes the AI-development cycle. With it in place, the full loop is: + +``` +/grill-with-docs → /to-prd → /to-issues → /triage → /implement-feature +``` + +The historical drift between `docs/plans/` (Spec Runner) and `docs/issues/` (Feature Runner) is a one-time cleanup problem, not a structural gap. The stale `docs/issues/` folders that correspond to already-completed Specs should be manually marked `closed` before the Feature Runner is introduced, to avoid the runner attempting to implement already-done work. + +## Comments + +> _This was generated by AI during triage._ + +**2026-05-09 — Triage:** Marked `resolved`. The PRD has fulfilled its purpose — it defined the feature and spawned 15 implementation issues (13 `resolved`, 1 `rejected`, 1 `ready-for-agent` → subsequently `resolved`). No changes to the spec are needed. PRD will move to `closed` once all child issues are `closed`. + +**2026-05-09 — Skill review follow-up:** Surfaced four refinement issues during a post-resolution skill review. Filed as `16-explicit-tdd-skill-invocation.md`, `17-prd-title-extraction-step-1.md`, `18-failure-loop-protection.md`, `19-skill-agents-doc-crosslink.md`. PRD remains `resolved` — these are scope refinements, not a respec. + +**2026-05-09 — Second-pass review:** Issues 16–19 resolved. Second-pass surfaced one follow-up: `20-prompt-template-and-step-cleanup.md` — deduplicates the prompt-template AFK framing (side-effect of issue 16) and tightens two minor wording nits in SKILL.md step 1 and step 4. diff --git a/docs/issues/github-copilot-config/PRD.md b/docs/issues/github-copilot-config/PRD.md new file mode 100644 index 0000000..8df2e3c --- /dev/null +++ b/docs/issues/github-copilot-config/PRD.md @@ -0,0 +1,42 @@ +--- +title: Add GitHub Copilot config to exclude generated and AI-tooling files from review +created: 2026-05-12 +--- + +**Status:** ready-for-agent +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Problem Statement + +GitHub Copilot code review treats every file in the repo equally — including +AI-generated artefacts, raw capture files, and tooling configuration that has no +business logic to review: + +- `docs/inbox/` — unstructured raw ideas; not code, not worth reviewing. +- `docs/issues/` — triage markdown files generated during planning sessions. +- `.claude/` — Claude Code config, prompts, and skill definitions. +- `.out-of-scope/` — rejection records written by triage agents. +- `docs/conversations/`, `docs/research/` — raw session transcripts. + +Copilot reviews on these files produce noise, waste review quota, and can surface +misleading suggestions on content that is intentionally informal or machine-generated. + +## Solution + +Add a `.github/copilot-instructions.md` file (or the appropriate Copilot config +mechanism for code review exclusions) that instructs Copilot to skip the paths above. + +Research the current GitHub Copilot configuration surface for code review +(`copilot-instructions.md`, `.copilotignore`, or repo-settings API) and pick the +mechanism that applies to PR review specifically. Document the chosen approach in a +comment inside the config file. + +## Acceptance Criteria + +- A config file exists under `.github/` that tells Copilot to skip at minimum: + `docs/inbox/`, `docs/issues/`, `.claude/`, `.out-of-scope/`, `docs/conversations/`, `docs/research/`. +- Opening a PR that only touches files in those directories does not trigger a Copilot + code review comment. +- The config file itself is excluded from Copilot review (self-referential exclusion). diff --git a/docs/issues/inbox-collision-check/01-fix-exit-code.md b/docs/issues/inbox-collision-check/01-fix-exit-code.md index d3905f4..7c76305 100644 --- a/docs/issues/inbox-collision-check/01-fix-exit-code.md +++ b/docs/issues/inbox-collision-check/01-fix-exit-code.md @@ -1,6 +1,6 @@ # Fix inverted exit code in `/inbox` collision check -**Status:** resolved +**Status:** closed **Category:** bug > _This was generated by AI during triage._ diff --git a/docs/issues/plugin-unic-prefix/PRD.md b/docs/issues/plugin-unic-prefix/PRD.md new file mode 100644 index 0000000..3f873fe --- /dev/null +++ b/docs/issues/plugin-unic-prefix/PRD.md @@ -0,0 +1,53 @@ +--- +title: Add unic- prefix to all plugin names for consistent namespacing +created: 2026-05-12 +--- + +**Status:** ready-for-agent +**Category:** enhancement + +> _This was generated by AI during triage._ + +## Problem Statement + +Plugin names are inconsistent across the monorepo: + +| Plugin | Current name | Command prefix | +| ---------------------------------- | ----------------- | ---------------------- | +| `apps/claude-code/unic-confluence` | `unic-confluence` | `/unic-confluence:…` ✓ | +| `apps/claude-code/pr-review` | `pr-review` | `/pr-review:…` ✗ | +| `apps/claude-code/auto-format` | `auto-format` | `/auto-format:…` ✗ | + +The `unic-confluence` plugin already follows the desired `unic-` pattern. +`pr-review` and `auto-format` do not, making it visually ambiguous whether a command +belongs to a Unic plugin or to a third-party one. + +## Solution + +Rename the `"name"` field in `.claude-plugin/plugin.json` (and `marketplace.json` +where applicable) for the two affected plugins: + +- `pr-review` → `unic-pr-review` +- `auto-format` → `unic-auto-format` + +Claude Code assembles the slash command from `:`, so +the rename automatically updates all command prefixes: + +- `/pr-review:review-pr` → `/unic-pr-review:review-pr` +- `/auto-format:…` → `/unic-auto-format:…` + +Search the entire repo for references to the old plugin names (README files, CLAUDE.md, +skill files, docs, workflow configs, `enabledPlugins` examples) and update them all. + +**Breaking change note:** any existing install that has `"pr-review@unic": true` in its +`enabledPlugins` will need to update to `"unic-pr-review@unic": true`. This is safe to +do now — the plugins are not yet widely distributed. + +## Acceptance Criteria + +- `plugin.json` `"name"` field reads `unic-pr-review` and `unic-auto-format`. +- `marketplace.json` is regenerated via `pnpm --filter bump` (or equivalent) to + reflect the new names. +- All internal references to the old names are updated. +- A note in each plugin's `CHANGELOG.md` documents the rename as a breaking change. +- Existing CI checks pass after the rename. diff --git a/docs/issues/pr-review-ado-fetcher-reliability/PRD.md b/docs/issues/pr-review-ado-fetcher-reliability/PRD.md new file mode 100644 index 0000000..8456eff --- /dev/null +++ b/docs/issues/pr-review-ado-fetcher-reliability/PRD.md @@ -0,0 +1,206 @@ +# PRD: pr-review — ADO Fetcher reliability + +**Status:** needs-triage +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` + +--- + +## Problem Statement + +When the ADO Fetcher's Azure DevOps reads fail — iterations endpoint down, work-item fetch denied by auth, prior commit missing for an incremental diff — the failures are currently invisible. The bot keeps running and produces output that looks like a normal Review, but signed `Iteration ` (empty), or with `WORK_ITEM_IDS=[]` (indistinguishable from "no work items linked"), or classifying prior threads against the wrong diff range. The reviewer reading the PR has no way to tell that the Review was produced on degraded inputs; the next re-review can be corrupted permanently because Bot Signature drift breaks re-review detection. This is the most consequential class of silent failure surfaced by the PR #29 review. + +## Solution + +Introduce a four-state Notice Tier doctrine across the plugin and apply it first to the ADO Fetcher. Every Fetcher read terminates in one of four tiers — OK, EMPTY-BY-DESIGN, DEGRADED, ABORTED — and emits a structured Notice when the tier is non-OK (with one carve-out: EMPTY-BY-DESIGN is silent except for the Doc Context family). Notices flow from the Fetcher's structured result block through the orchestrator into the Review Summary, where the reviewer sees them. A mandatory end-of-run Trailer line printed in the Claude interface also reports notice counts, so the user invoking the command sees outcome status without opening the PR. + +Failure classification moves to pure JavaScript helpers under `scripts/ado/`, refining the architecture documented in ADR 0013. Three new deep helpers (`classify-http-error`, `notices`, plus per-fetch wrappers) replace the inline bash-and-Node heredocs that today implicitly swallow exit codes. The discriminated-union return shape distinguishes EMPTY-BY-DESIGN from DEGRADED at the helper API level, removing the conflation that today defaults `parseIterations([])` to `{ latestIterationId: 1 }` (silently violating CLAUDE.md's "iterationId=1 is never used" rule). + +The diff-range fallback that the Fetcher already performs (when the prior iteration's commit is unreachable, falling back to the full PR diff) gets a `DIFF_RANGE: full | incremental` sentinel field in `ADO_FETCHER_RESULT` and a DEGRADED Notice. PRD B will consume the sentinel; PRD A only emits it. + +## User Stories + +1. As a PR reviewer, I want a banner at the top of the Review Summary listing any platform failures that occurred during the Review, so that I can tell whether the bot's findings are based on complete or degraded context. +2. As a developer invoking `/pr-review:review-pr`, I want a single end-of-run Trailer line in my Claude interface reporting findings counts, notice counts, and the PR URL, so that I can scan outcome status across many AFK invocations without opening each PR. +3. As a PR reviewer, I want to know when the Review was produced without business context (no work items linked, or work-item fetch failed), so that I can decide whether to re-run with a linked work item or accept the review-without-context. +4. As a Plugin maintainer, I want the Bot Signature to never carry an empty or fabricated Iteration ID, so that re-review detection on the next run is not silently corrupted. +5. As a Plugin maintainer, I want auth or permission failures on the Azure DevOps iterations endpoint to abort the run with a clear stderr message naming `az devops login` as the remedy, so that the user is not left wondering why subsequent re-reviews behave oddly. +6. As a PR reviewer in re-review mode, I want the bot to tell me when it classified prior threads against the full PR diff instead of the incremental diff, so that I can interpret an unexpected `pending` verdict as conservative rather than definitive. +7. As a Plugin maintainer, I want failure classification logic to live in pure JS helpers with unit tests, rather than in bash-and-Node heredocs inside agent prompts, so that I can verify the doctrine is applied consistently without running an end-to-end ADO smoke test. +8. As a developer reading the codebase, I want every ADO write call site to consult one canonical helper that maps HTTP status codes to Notice Tiers, so that 401 means the same thing in every code path and a future contributor cannot accidentally invent a divergent mapping. +9. As a developer running `/pr-review:review-pr` in Pre-PR mode, I want any Fetcher-related infrastructure changes to be invisible to me, because Pre-PR mode does not run the Fetcher. +10. As a Plugin maintainer, I want a Notice that is emitted by multiple agents for the same root cause (e.g. Fetcher and Doc Context Orchestrator both noticing a Confluence outage) to be deduplicated in the orchestrator's merge step, so that the Summary does not list the same problem twice. +11. As a developer maintaining a Re-review on an ADO PR that was merged before the Review completed, I want the Fetcher to still return a usable iteration list (because comments are still useful as a review record), so that the merged-but-reviewable workflow ADR 0013 acknowledges keeps working. +12. As a Plugin maintainer, I want the discriminated-union refactor of `parseIterations` and `parseWorkItemIds` to be purely internal, so that no other plugin or release-tool depends on the old return shape. +13. As a developer running Pre-PR mode, I want the Doc Context EMPTY-BY-DESIGN informational Notice to be emitted only for the Doc Context family (no linked work items), so that other inherently-empty states (first-review having no prior threads, a clean PR having no findings) do not pollute the Summary with redundant `ℹ️` lines. +14. As a Plugin maintainer, I want the ADRs that record the new doctrine (helper-layer split from ADR 0013, canonical HTTP-tier mapping, γ-downgrade rule for diff-range) to be in place before PRD B's consumers start arriving, so that PRD B can reference them rather than re-litigate the decisions. +15. As a CI engineer, I want every new deep helper module to come with `node:test` unit tests in the prior-art style of `packages/release-tools/scripts/verify-changelog.test.mjs`, so that the helpers can be verified without an ADO PR and without Azure CLI installed. + +## Implementation Decisions + +### Notice Tier doctrine + +A four-state classification of every Review operation outcome — **OK**, **EMPTY-BY-DESIGN**, **DEGRADED**, **ABORTED** — captured in `CONTEXT.md` under "Platform-failure handling". The tier choice is the gating decision; there is no fifth ASK tier. AFK invocations never block on user input. Failure modes that tempt an ASK tier are reclassified as ABORTED. + +EMPTY-BY-DESIGN is silent for most states. The Doc Context family is the one exception: when `WORK_ITEM_IDS=[]` the orchestrator emits an `info`-severity Notice in the Summary, because the reviewer cannot tell from the PR alone whether the bot considered linked business context. + +### Notice flow + +Each orchestration agent emits a `NOTICES` JSON array as a new field in its structured result block. The orchestrator parses, merges (with `kind`-based deduplication), and passes the merged array to the ADO Writer alongside `FINDINGS`. The ADO Writer renders a `## Notices` block above the findings in the Review Summary content. The heading stays bare (no emoji) so a mixed `info` + `warning` Notices list does not require the heading emoji to misrepresent one of the tiers; each list item carries its own per-Notice emoji prefix (`ℹ️` for `info`, `⚠` for `warning`). + +Notice shape: `{ severity: "info" | "warning", kind: , message: string }`. `kind` is a small enum (`doc-context`, `diff-range`, `work-items`, `iterations`, `default-branch`, `partial-run-check`, `thread-match`, `thread-classify`, `inline-post`, `summary-post`, `patch-to-fixed`, `diff-parse`); rejected: free-form strings, severity-coded numerics. ABORTED never reaches the Notice channel — its surface is stderr + the Trailer. + +### End-of-run Trailer + +The orchestrator prints a mandatory single-line Trailer to the Claude interface at end-of-run, regardless of mode or outcome: + +- ADO modes: `✅ Review posted: findings ( critical, important) · warning notices · info notices → ` +- Pre-PR mode: `✅ Pre-PR review complete: findings ( critical, important) · warning notices` +- Aborted: `❌ Review aborted: ` + +Designed for AFK skim: the invoker sees outcome status without opening the PR. Same `NOTICES` array drives both the Summary rendering and the Trailer counts. + +### Helper layer (ADR 0014) + +Failure classification moves from inline bash-and-Node heredocs to pure JS helpers under `scripts/ado/`. Agent prompts shrink to "import, call, branch on `result.ok`". This refines ADR 0013 — orchestration still lives in agent prompts, but **failure classification** lives in helpers. + +New helper modules: + +- **`scripts/ado/classify-http-error.mjs`** — pure function taking an HTTP status code, response body excerpt, and process exit code. Returns `{ tier: 'ok' | 'degraded' | 'aborted', kind, message }`. Encodes the canonical HTTP-tier mapping. Consumed by PRD B too. +- **`scripts/ado/notices.mjs`** — pure helpers `createNotice`, `mergeNotices` (dedupe by `kind`), `formatNoticesAsSummaryBlock`, `formatNoticesAsPrePrPreamble`, `formatTrailer`. +- **`scripts/ado/fetch-iterations.mjs`** — wraps the iterations fetch and parse; returns `{ ok: true, latestIterationId, latestCommitSha } | { ok: false, reason }`. Subsumes the existing `parseIterations` helper, refactored to the discriminated-union shape. Empty `value` array on a real PR → `{ ok: false, reason: 'empty-iterations' }` → ABORTED. +- **`scripts/ado/fetch-work-items.mjs`** — wraps the work-items fetch and parse; returns `{ ok: true, ids } | { ok: false, reason }`. Subsumes `parseWorkItemIds`. Empty array (legitimate "no work items linked") → `{ ok: true, ids: [] }`; fetch failure (auth, 5xx, network) → `{ ok: false }`. + +### Canonical HTTP-tier mapping (ADR 0015) + +| HTTP outcome | Tier | Notes | +| --------------------- | -------- | ---------------------------------------------------------- | +| 200 / 201 | OK | No Notice. | +| 404 | OK | Domain "the thing is already gone." | +| 409 | OK | Domain "state already changed." | +| 401 | ABORTED | Token expired or revoked; all subsequent writes will fail. | +| 403 | ABORTED | Permission revoked; same. | +| 5xx | DEGRADED | Transient backend; emit Notice; continue. | +| Other 4xx (400 / 422) | DEGRADED | Malformed request bug; Notice includes body excerpt. | +| Network error | DEGRADED | Treat as 5xx. | + +No retries in v1. Retries add latency, complexity, and a new failure mode (retry storm). The doctrine produces correct behaviour without them; retries can be added later behind the same Notice surface. + +### DIFF_RANGE sentinel and ADR 0004 amendment + +The ADO Fetcher's existing fallback from incremental to full diff (when the prior iteration's commit is unreachable) is currently silent. PRD A introduces: + +- A new `DIFF_RANGE: full | incremental` line in `ADO_FETCHER_RESULT_START/END`. +- A DEGRADED Notice (`kind: diff-range`, message: "Incremental diff unavailable — Coordinator will classify against the full PR diff with conservative downgrades.") when the fallback fires. + +The PRD B Coordinator changes (γ-downgrade rule that remaps `addressed` / `obsolete` to `pending` when `DIFF_RANGE=full`) consume this sentinel. PRD A only emits it. + +ADR 0004 ("incremental diff baseline") is amended in-place with a "Degraded baseline" subsection covering this rule. + +### Agent and orchestrator changes + +- **`.agents/ado-fetcher.md`** — three inline bash heredocs (Steps 2, 4a/work-items, 4-diff) replaced with `await import` calls to the three new helpers. `ADO_FETCHER_RESULT` output block grows two fields: `DIFF_RANGE` and `NOTICES`. +- **`.agents/ado-writer.md`** — accepts a new `NOTICES` input; renders the `## Notices` block above the existing severity-grouped findings in the Summary content. No changes to write call sites in PRD A (those land in PRD B). +- **`commands/review-pr.md`** — parses `NOTICES` and `DIFF_RANGE` from `ADO_FETCHER_RESULT`; merges Notices via the `notices` helper; passes merged Notices to the ADO Writer prompt; emits Doc-Context EMPTY-BY-DESIGN info Notice when `WORK_ITEM_IDS=[]`; prints the mandatory end-of-run Trailer line. The 200-line cap from PRD-orchestrator-split is preserved by leaning on the new helpers (the bash side becomes uniform `if [ "$RESULT_OK" != "true" ]; then ...`). + +### Existing helpers, breaking-change check + +`parseIterations`, `parseWorkItemIds`, and any other affected helpers are verified to have zero consumers outside the `pr-review` plugin (`grep` across `apps/`, `packages/`, `docs/` returns no matches outside `apps/claude-code/pr-review/`). The discriminated-union refactor is therefore safe to land without a deprecation period. + +## Testing Decisions + +### What makes a good test + +Tests assert the external behaviour of each helper given controlled inputs — no implementation-detail inspection, no internal-branching tests. Inputs are plain JavaScript objects or short JSON fixtures. A test reads as a sentence: "given an HTTP 401, classifyHttpError returns the aborted tier." `node:test` built-in, `node:assert/strict`, no external deps. + +### Modules under test + +**New deep helpers (full unit-test coverage):** + +- `scripts/ado/classify-http-error.mjs` — one test per row of the canonical mapping (200, 201, 401, 403, 404, 409, 5xx, 400, 422, network/exit-code paths) plus the case where the body excerpt is malformed JSON. +- `scripts/ado/notices.mjs` — `createNotice` shape, `mergeNotices` dedup behaviour across multiple sources, the three `format…` renderers producing expected markdown / line shapes, `formatTrailer` for first-review / re-review / pre-pr / aborted modes. +- `scripts/ado/fetch-iterations.mjs` — happy path with one iteration, multiple iterations (returns the max), empty `value` array → `{ ok: false, reason: 'empty-iterations' }`, missing `value` key, malformed JSON, an ADO error response. +- `scripts/ado/fetch-work-items.mjs` — empty PR-work-item links → `{ ok: true, ids: [] }`, populated links, dedup of duplicate IDs (existing parseWorkItemIds invariant), null/missing response → `{ ok: false }`, ADO error response. + +The existing test files for `parseIterations` and `parseWorkItemIds` are subsumed — the fetch helpers replace them and inherit their fixtures. + +### Modules NOT under test in PRD A + +- Agent prompt content (`.agents/*.md`, `commands/review-pr.md`): no new string-match assertions. The existing pattern is flagged as brittle in `docs/inbox/pr-review-prompt-content-tests-brittleness.md` and behaviour is verified by integration smoke test against a real ADO PR after merge, per ADR 0013's testing posture. + +### Prior art + +`packages/release-tools/scripts/verify-changelog.test.mjs`, `packages/release-tools/scripts/bump-version.test.mjs`, and `apps/claude-code/pr-review/tests/parse-diff-hunks.test.mjs` (added in PR #29). Same style throughout. + +## Out of Scope + +- Coordinator and Writer changes (DIFF_RANGE consumption, γ-downgrade rule applied, HTTP-tier mapping applied to every write call site, `*.err` retention policy, `parseAdoWriterResult` discriminated-union refactor) — those land in **PRD B**. +- Pre-PR mode changes (`parseChangedFilesFromDiff` suspicious-shape Notice, default-branch fallback chain + Notice) — those land in **PRD B**. +- The integration smoke test against a real ADO PR — verification is manual, post-merge. +- Retries on transient HTTP errors — out of scope per the doctrine. Re-evaluate if 5xx Notices prove painful in practice. +- A canonical thread shape spanning ADO and GitHub — deferred per ADR 0013 until a second platform consumer exists. +- Lifting any helper from `scripts/ado/` to `pr-review-toolkit` — none of these helpers are platform-shared yet. + +## Further Notes + +**ADR 0014** (`apps/claude-code/pr-review/docs/adr/0014-failure-classification-helpers.md`) records the helper-layer refinement to ADR 0013. + +**ADR 0015** (`apps/claude-code/pr-review/docs/adr/0015-canonical-http-tier-mapping.md`) records the HTTP-tier mapping, the 401/403 abort rule, and the no-retries-in-v1 stance. + +**ADR 0004** (`apps/claude-code/pr-review/docs/adr/0004-incremental-diff-baseline.md`) is amended in-place with a "Degraded baseline" subsection covering the γ-downgrade rule that PRD B will implement on the consumer side. + +**`CONTEXT.md`** is already updated with the new terms (Notice, Notice Tier and its four states, Trailer). + +**Source:** the deferred items from the PR #29 multi-agent review, grilled against the domain doctrine over the conversation captured in this session. The originating inbox file (`docs/inbox/pr-review-ado-error-hardening-pass.md`) is removed once PRD A and PRD B are published. + +--- + +## Agent Brief + +> _This was generated by AI during triage._ + +**Category:** enhancement +**Summary:** Apply the four-tier Notice doctrine to the ADO Fetcher. Introduce three new deep helpers in `scripts/ado/` (`classify-http-error`, `notices`, plus `fetch-iterations` and `fetch-work-items` as discriminated-union refactors of the existing parsers). The Fetcher emits a `NOTICES` array and a `DIFF_RANGE` sentinel in its structured result block; the orchestrator merges Notices, passes them to the ADO Writer, prints a mandatory end-of-run Trailer line. The ADO Writer renders a `## Notices` block above findings in the Review Summary. + +**Current behavior:** +ADO Fetcher reads silently swallow exit codes. An iterations-fetch failure produces `LATEST_ITERATION_ID=''`, drifting the Bot Signature to `Iteration ` (empty) and breaking re-review detection forever afterward. A work-item-fetch failure is indistinguishable from "no work items linked" — both produce `WORK_ITEM_IDS=[]`. A diff-range fallback to the full PR diff happens silently, causing the Coordinator to classify prior threads against the wrong range. None of these surface to the reviewer or the invoker. + +**Desired behavior:** +Every Fetcher operation terminates in one of four Notice Tiers (OK, EMPTY-BY-DESIGN, DEGRADED, ABORTED). Tier choice is the gating decision — no user prompts, AFK-friendly. Failures route to: + +- **ABORTED** for state-corrupting failures (empty iterations on a real PR, 401/403 on iteration fetch). Process exits non-zero with stderr message + Trailer aborted line. +- **DEGRADED** for failures the Review can still complete around (work-item fetch failed, diff-range fallback to full diff, 5xx on any read). Emits a `warning` Notice surfaced in the Review Summary. +- **EMPTY-BY-DESIGN** for legitimate empty states. Silent except for the Doc Context family (`WORK_ITEM_IDS=[]` → `info` Notice in the Summary). +- **OK** for normal completion. + +The four new helpers under `scripts/ado/` own the classification logic. Agent prompts shrink to `await import` + branch on `result.ok`. The Bot Signature is never signed with an empty Iteration ID again. + +**Key interfaces:** + +- `classifyHttpError({ status, body, exitCode }) → { tier, kind, message }` — canonical HTTP-tier mapping; consumed by PRD B. +- `createNotice / mergeNotices / formatNoticesAsSummaryBlock / formatNoticesAsPrePrPreamble / formatTrailer` from `scripts/ado/notices.mjs`. +- `fetchIterations(...) → { ok: true, latestIterationId, latestCommitSha } | { ok: false, reason }`. +- `fetchWorkItems(...) → { ok: true, ids } | { ok: false, reason }`. +- `ADO_FETCHER_RESULT` grows `DIFF_RANGE: full | incremental` and `NOTICES: [{severity, kind, message}, …]` fields. + +**Acceptance criteria:** + +- [ ] The four new helpers under `scripts/ado/` exist and pass their unit tests (`pnpm --filter pr-review test`). +- [ ] `parseIterations` / `parseWorkItemIds` are gone — the new fetch helpers fully subsume them; no consumer outside `pr-review` is broken (verified by `grep`). +- [ ] An iterations fetch that returns `value: []` aborts the run with a clear stderr message and a Trailer `❌ Review aborted: empty-iterations — …` line. +- [ ] A work-item fetch that fails with auth/5xx/network emits a DEGRADED Notice (`kind: work-items`); a work-item fetch that returns an empty list emits an `info` Notice (`kind: doc-context`). +- [ ] A diff-range fallback emits `DIFF_RANGE: full` in `ADO_FETCHER_RESULT` and a DEGRADED Notice (`kind: diff-range`). +- [ ] The orchestrator merges Notices, dedupes by `kind`, and passes them to the ADO Writer. +- [ ] The ADO Writer renders a `## Notices` block above findings in first-review and re-review Summaries. +- [ ] Every successful run ends with a Trailer line in the Claude interface listing findings, notices, and the PR URL (ADO modes) or finding counts (Pre-PR mode). +- [ ] `commands/review-pr.md` remains ≤ 200 lines. +- [ ] ADR 0014 (helper layer), ADR 0015 (HTTP-tier mapping), and the in-place ADR 0004 amendment exist. +- [ ] `pnpm test` passes; `pnpm format` produces no diff; `pnpm check` reports zero warnings. + +**Out of scope:** + +- Coordinator and Writer changes — PRD B. +- Pre-PR mode changes — PRD B. +- Retries on transient HTTP errors. +- Integration smoke test (manual, post-merge). +- Lifting helpers to `pr-review-toolkit`. diff --git a/docs/issues/pr-review-ado-fetcher-reliability/done/01-end-to-end-notice-pipeline.md b/docs/issues/pr-review-ado-fetcher-reliability/done/01-end-to-end-notice-pipeline.md new file mode 100644 index 0000000..2808159 --- /dev/null +++ b/docs/issues/pr-review-ado-fetcher-reliability/done/01-end-to-end-notice-pipeline.md @@ -0,0 +1,57 @@ +# A1. End-to-end Notice pipeline via Doc-Context info Notice + ADR-0014 + +**Status:** resolved +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-ado-fetcher-reliability/PRD.md` + +## What to build + +Deliver the foundational Notice pipeline end-to-end with one concrete emission site: the Doc-Context `info` Notice when the PR has no linked work items. + +Implementation cuts through every layer: + +- **New helper** `scripts/ado/notices.mjs` — pure functions `createNotice`, `mergeNotices` (dedupe by `kind`), `formatNoticesAsSummaryBlock`, `formatNoticesAsPrePrPreamble`, `formatTrailer`. With unit tests. +- **ADO Fetcher prompt** — `ADO_FETCHER_RESULT_START/END` block gains a `NOTICES: [...]` field. When `WORK_ITEM_IDS=[]`, the Fetcher appends an `info` Notice (`kind: doc-context`, message: "Reviewed without business context — no work items linked to this PR."). +- **Orchestrator** — parses `NOTICES` from the Fetcher result, merges via the new helper (no-op at this stage but the merge wiring must exist), passes a merged `NOTICES_JSON` to the ADO Writer prompt. +- **ADO Writer prompt** — accepts the new `NOTICES_JSON` input; renders a `## Notices` block (with `ℹ️` for `info`, `⚠` for `warning`) above the severity-grouped findings in the Summary content. +- **Trailer** — orchestrator prints a mandatory end-of-run line in the Claude interface for every run (ADO modes, Pre-PR mode, aborted). Carries findings counts, notice counts, and (for ADO modes) the PR URL. +- **ADR-0014** — new `apps/claude-code/pr-review/docs/adr/0014-notice-tier-doctrine-and-failure-classification-helpers.md`. Records the four-tier doctrine (OK / EMPTY-BY-DESIGN / DEGRADED / ABORTED), the no-fifth-ASK-tier rule, and the JS-helper-layer refinement to ADR 0013. +- **CHANGELOG** — `[Unreleased]` Added entries for the new helper and ADR; Changed entries for the Fetcher result block, the orchestrator merge wiring, and the Summary rendering. +- **`commands/review-pr.md`** stays ≤ 200 lines. + +End-to-end demoable: invoke `/pr-review:review-pr` against an ADO PR with no linked work items. The Summary opens with `ℹ️ Reviewed without business context — no work items linked to this PR.` followed by the findings. The Claude interface ends with `✅ Review posted: findings · 0 warning notices · 1 info notice → `. + +## Acceptance criteria + +- [ ] `scripts/ado/notices.mjs` exists with all five exported functions and passes `pnpm --filter pr-review test`. +- [ ] `ADO_FETCHER_RESULT_START/END` block emits a `NOTICES` field (JSON array; empty array if no notices). +- [ ] Orchestrator parses, merges, and passes `NOTICES` to the ADO Writer prompt. +- [ ] ADO Writer Summary content renders a `## Notices` block above findings when `NOTICES` is non-empty; no block when empty. +- [ ] Doc-Context info Notice appears on a real ADO PR with no linked work items. +- [ ] End-of-run Trailer line is printed in the Claude interface for every run (success, abort, pre-PR). +- [ ] ADR-0014 exists at the documented path. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, and `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +None — can start immediately. + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q2 four-tier doctrine (OK / EMPTY-BY-DESIGN / DEGRADED / ABORTED, no fifth ASK tier), Q3 Doc-Context info-Notice carve-out, Q4 Option A Notice flow (per-agent `NOTICES` array, orchestrator merges with `kind`-based dedup, Writer renders `## Notices` block above findings), and the mandatory Trailer convention. ADR-0014 content is fully specified in PRD A's "Implementation Decisions → Helper layer". No outstanding questions; ready for an AFK agent to implement. + +## Deviations + +- **Version bump = minor (1.0.0 → 1.1.0), not patch.** The issue text doesn't pin the bump size; chose `minor` because the slice introduces two user-visible additions (the `## Notices` block in the Review Summary and the mandatory Trailer line in the Claude interface) — per the project's CHANGELOG convention, "Added — new flag, subcommand, or user-visible feature, anything minor." +- **Bumped `plugin.json` + `marketplace.json` + CHANGELOG by hand, not via `pnpm --filter pr-review bump minor`.** The sandbox environment has Node 20 only and the pnpm `useNodeVersion: 24.15.0` lock forced an outbound `nodejs.org` fetch that the firewall blocks. Manual edits match the bump tool's contract exactly: both manifest version fields updated, `[Unreleased]` entries moved to a dated `[1.1.0] — 2026-05-13` section, fresh empty `[Unreleased]` placeholders restored. +- **Feedback loops run manually, not via `pnpm`.** All 158 `node --test` cases pass under Node 20 (including 14 new `tests/notices.test.mjs` cases). `prettier --check` clean across all `*.md`. `tsc --noEmit --checkJs` clean across modified `.mjs` files. Biome could not run (only `@biomejs/cli-darwin-arm64` is installed; the sandbox is Linux ARM64); flagging here so CI is the source of truth for the Biome layer. The "≤ 200 lines" cap for `commands/review-pr.md` is preserved exactly at 200 lines (`Constants` section folded into the lead paragraph to make room for the new `## Step 8 — End-of-run Trailer` and `NOTICES_JSON` input to the Writer prompt). diff --git a/docs/issues/pr-review-ado-fetcher-reliability/done/02-classify-http-error-and-work-items.md b/docs/issues/pr-review-ado-fetcher-reliability/done/02-classify-http-error-and-work-items.md new file mode 100644 index 0000000..fcbb439 --- /dev/null +++ b/docs/issues/pr-review-ado-fetcher-reliability/done/02-classify-http-error-and-work-items.md @@ -0,0 +1,49 @@ +# A2. `classify-http-error` + `fetch-work-items` refactor → DEGRADED tier + ADR-0015 + +**Status:** resolved +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-ado-fetcher-reliability/PRD.md` + +## What to build + +Wire the DEGRADED tier end-to-end by introducing the canonical HTTP-tier mapping and applying it to the ADO Fetcher's work-item fetch. + +Implementation cuts through every layer: + +- **New helper** `scripts/ado/classify-http-error.mjs` — pure function `({ status, body, exitCode }) → { tier: 'ok' | 'degraded' | 'aborted', kind, message }`. Encodes the canonical mapping (200/201/404/409 → ok; 401/403 → aborted; 5xx/network/4xx → degraded). With unit tests covering every row of the mapping table from PRD A's Implementation Decisions, plus malformed-body and network-exit-code paths. +- **New helper** `scripts/ado/fetch-work-items.mjs` — wraps the work-items fetch (currently inline in the Fetcher prompt). Returns `{ ok: true, ids } | { ok: false, reason, message }`. Subsumes today's `parseWorkItemIds`. Empty array on successful fetch → `{ ok: true, ids: [] }` (legitimate EMPTY-BY-DESIGN). Auth/5xx/network → `{ ok: false }` with a kind and message ready for Notice creation. With unit tests covering the discriminated-union behaviour. +- **ADO Fetcher prompt** — Step 5 (work-item fetch) refactored to `await import(...)` the new helper. On `{ ok: false }`, emit a DEGRADED Notice (`kind: work-items`, message from the helper) into the Fetcher's `NOTICES` array (the channel wired by A1). The Doc-Context info Notice from A1 still fires when `{ ok: true, ids: [] }`. +- **ADR-0015** — new `apps/claude-code/pr-review/docs/adr/0015-canonical-http-tier-mapping.md`. Records the HTTP-tier mapping, the 401/403 abort rule, and the no-retries-in-v1 stance. +- **CHANGELOG** — `[Unreleased]` Added entries for the two new helpers and ADR-0015; Changed entry for the Fetcher prompt's work-item step. + +Subsumption: today's `parseWorkItemIds` is removed (subsumed into `fetch-work-items.mjs`), and its existing test file is replaced by the new helper's tests. + +End-to-end demoable: invoke `/pr-review:review-pr` against an ADO PR while the local Azure CLI is unauthenticated (e.g. revoke the token). The Summary renders `## Notices` containing `⚠ work-items: Failed to fetch linked work items (auth error). Review proceeded without business context.` The Trailer reports `· 1 warning notice`. Restore auth and the same PR shows the Doc-Context info Notice from A1 (or no Notice at all if work items are populated). + +## Acceptance criteria + +- [ ] `scripts/ado/classify-http-error.mjs` exists with full HTTP-tier-mapping unit tests (≥ 10 cases). +- [ ] `scripts/ado/fetch-work-items.mjs` exists with discriminated-union return shape and unit tests; subsumes `parseWorkItemIds`. +- [ ] Old `parseWorkItemIds` exports and tests are removed. +- [ ] ADO Fetcher prompt's work-item step calls the new helper via `await import(...)`. +- [ ] A simulated work-item-fetch failure (e.g. wrong PROJECT name, revoked auth) produces a DEGRADED Notice with `kind: work-items` in the Summary. +- [ ] ADR-0015 exists at the documented path. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q7 canonical HTTP-tier mapping (200/201/404/409 → OK; 401/403 → ABORTED; 5xx/network/4xx → DEGRADED; no retries in v1). Helper-API discriminated-union refactor for `parseWorkItemIds` verified breaking-change-free (zero consumers outside the plugin, confirmed by `grep` across `apps/`, `packages/`, `docs/`). ADR-0015 content is fully specified in PRD A's "Implementation Decisions → Canonical HTTP-tier mapping". No outstanding questions. diff --git a/docs/issues/pr-review-ado-fetcher-reliability/done/03-fetch-iterations-aborted-tier.md b/docs/issues/pr-review-ado-fetcher-reliability/done/03-fetch-iterations-aborted-tier.md new file mode 100644 index 0000000..5e2b86d --- /dev/null +++ b/docs/issues/pr-review-ado-fetcher-reliability/done/03-fetch-iterations-aborted-tier.md @@ -0,0 +1,49 @@ +# A3. `fetch-iterations` refactor → ABORTED tier + +**Status:** resolved +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-ado-fetcher-reliability/PRD.md` + +## What to build + +Wire the ABORTED tier end-to-end by refactoring the iterations fetch and removing the `iterationId=1` default that today silently violates CLAUDE.md. + +Implementation cuts through every layer: + +- **New helper** `scripts/ado/fetch-iterations.mjs` — wraps the iterations fetch and parse. Returns `{ ok: true, latestIterationId, latestCommitSha } | { ok: false, reason: 'empty-iterations' | 'auth' | 'transient' | 'malformed', message }`. Uses `classify-http-error` (from A2) for HTTP failures. Empty `value` array on a real PR → `{ ok: false, reason: 'empty-iterations' }`. 401/403 on the fetch → `{ ok: false, reason: 'auth' }`. With unit tests covering each branch. +- **ADO Fetcher prompt** — Step 2 (iterations fetch) refactored to `await import(...)` the new helper. On `{ ok: false }` of any kind, the prompt exits non-zero with a clear stderr message ("ERROR: . Try `az devops login` to re-authenticate." for auth; "ERROR: iterations endpoint returned empty value array. Cannot sign Review with a valid Iteration ID." for empty). +- **Orchestrator** — recognises a Fetcher non-zero exit and prints the Trailer aborted line in the Claude interface: `❌ Review aborted: `. No Summary is composed (the run never reaches the Writer). +- **Removed:** today's `parseIterations` helper exports and tests are subsumed into `fetch-iterations.mjs`. +- **CHANGELOG** — `[Unreleased]` Added entry for the new helper; Changed entry for the Fetcher prompt's iterations step; Fixed entry calling out the `iterationId=1` default removal. + +Subsumption: the existing `parseIterations` test fixtures move into the new helper's tests. The empty-input case is reclassified from "returns iterationId=1" to "returns `{ ok: false, reason: 'empty-iterations' }`" per the grilling decision. + +End-to-end demoable: invoke `/pr-review:review-pr` against a PR while the local `az devops login` is expired. The Claude interface ends with `❌ Review aborted: auth — Failed to fetch iterations (HTTP 401). Try \`az devops login\` to re-authenticate.` No Summary is posted. Restore auth and the same PR signs comments with the correct latest iteration. + +## Acceptance criteria + +- [ ] `scripts/ado/fetch-iterations.mjs` exists with discriminated-union return shape and unit tests (≥ 6 cases including empty value, 401, 5xx, malformed JSON, single iteration, multiple iterations). +- [ ] Old `parseIterations` exports and tests are removed. +- [ ] ADO Fetcher prompt's iterations step calls the new helper via `await import(...)`. +- [ ] A simulated empty-iterations response causes the run to ABORT with a clear stderr message and a Trailer aborted line. +- [ ] A simulated 401 on the iterations endpoint causes the same abort with `reason: auth` in the Trailer. +- [ ] No comment is ever signed with `Iteration ` (empty) or `Iteration 1` due to the empty-default path. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/02-classify-http-error-and-work-items.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q5 (empty-iterations reclassification — `value: []` on a real PR is ABORTED, including for merged PRs per the explicit "record-keeping reviews are not supported on PRs with missing iteration history" caveat). The `iterationId=1` default that currently violates CLAUDE.md is removed. `parseIterations` is subsumed; verified breaking-change-free. No outstanding questions. diff --git a/docs/issues/pr-review-ado-fetcher-reliability/done/04-diff-range-sentinel.md b/docs/issues/pr-review-ado-fetcher-reliability/done/04-diff-range-sentinel.md new file mode 100644 index 0000000..6a03097 --- /dev/null +++ b/docs/issues/pr-review-ado-fetcher-reliability/done/04-diff-range-sentinel.md @@ -0,0 +1,45 @@ +# A4. `DIFF_RANGE` sentinel + ADR-0004 amendment + +**Status:** resolved +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-ado-fetcher-reliability/PRD.md` + +## What to build + +Emit the `DIFF_RANGE` sentinel and the corresponding Notice when the Fetcher's existing diff-range fallback fires, and amend ADR 0004 in-place with the γ-downgrade rule that PRD B's Coordinator will consume. + +Implementation cuts through every layer: + +- **ADO Fetcher prompt** — Step 4 (raw diff) updated to emit `DIFF_RANGE: full | incremental` as a new field in the `ADO_FETCHER_RESULT_START/END` block. The value reflects which diff range was actually computed: `incremental` when the prior iteration's commit was reachable and the diff ran against `${PRIOR_COMMIT_SHA}..${LATEST_COMMIT_SHA}`; `full` when any fallback fired and the diff ran against `origin/${TARGET_BRANCH}...HEAD`. When `full`, the prompt also appends a DEGRADED Notice (`kind: diff-range`, message: "Incremental diff unavailable — Coordinator will classify against the full PR diff with conservative downgrades.") to the Fetcher's `NOTICES` array. +- **Orchestrator** — parses the new `DIFF_RANGE` field alongside the other Fetcher result fields. PRD A does not yet consume the value; PRD B (issue B3) will. +- **ADR 0004 amendment** — `apps/claude-code/pr-review/docs/adr/0004-incremental-diff-baseline.md` gets a new "Degraded baseline" subsection (in-place, not a separate ADR) documenting the rule: when `DIFF_RANGE=full`, the Coordinator MAY classify against the full diff but MUST downgrade `addressed` / `obsolete` outputs to `pending` and emit a DEGRADED Notice. Status of ADR 0004 stays `Accepted`; the amendment is additive. +- **CHANGELOG** — `[Unreleased]` Changed entry for the Fetcher result-block extension; Fixed entry for the diff-range fallback no longer being silent. + +End-to-end demoable: invoke `/pr-review:review-pr` against a PR where the prior iteration's commit has been force-pushed away (so the Fetcher's `git fetch origin "$PRIOR_COMMIT_SHA"` fails). The Summary opens with `⚠ diff-range: Incremental diff unavailable — Coordinator will classify against the full PR diff with conservative downgrades.` The Trailer reports `· 1 warning notice`. (Without PRD B's B3 landed, the Coordinator does not yet downgrade — that's B3's verification surface.) + +## Acceptance criteria + +- [ ] `ADO_FETCHER_RESULT_START/END` block emits a `DIFF_RANGE: full | incremental` field. +- [ ] When the diff-range fallback fires, the Fetcher's `NOTICES` array contains a `warning`-severity entry with `kind: diff-range`. +- [ ] When the incremental diff succeeds, `DIFF_RANGE=incremental` and no diff-range Notice is emitted. +- [ ] Orchestrator parses the new field (does not yet consume it — PRD B will). +- [ ] ADR 0004 has the "Degraded baseline" subsection appended in-place. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q6 (sentinel naming `DIFF_RANGE: full | incremental` chosen over the boolean alternative for forward-compat with future range types; in-place amendment to ADR 0004 rather than a new ADR-0015a — the amendment is additive). Option γ (the γ-downgrade rule) is implemented in PRD B issue B3, not here; A4 only emits the sentinel and Notice. No outstanding questions. diff --git a/docs/issues/pr-review-doc-context-enrichment/01-confluence-page-client.md b/docs/issues/pr-review-doc-context-enrichment/01-confluence-page-client.md index cd457d2..205c159 100644 --- a/docs/issues/pr-review-doc-context-enrichment/01-confluence-page-client.md +++ b/docs/issues/pr-review-doc-context-enrichment/01-confluence-page-client.md @@ -1,6 +1,6 @@ # Confluence page client script + tests -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-doc-context-enrichment/02-work-item-doc-context-enrichment.md b/docs/issues/pr-review-doc-context-enrichment/02-work-item-doc-context-enrichment.md index a4068a3..abc1850 100644 --- a/docs/issues/pr-review-doc-context-enrichment/02-work-item-doc-context-enrichment.md +++ b/docs/issues/pr-review-doc-context-enrichment/02-work-item-doc-context-enrichment.md @@ -1,6 +1,6 @@ # Work item Doc Context enrichment -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-doc-context-enrichment/03-confluence-page-doc-context-enrichment.md b/docs/issues/pr-review-doc-context-enrichment/03-confluence-page-doc-context-enrichment.md index 5ab325e..a015291 100644 --- a/docs/issues/pr-review-doc-context-enrichment/03-confluence-page-doc-context-enrichment.md +++ b/docs/issues/pr-review-doc-context-enrichment/03-confluence-page-doc-context-enrichment.md @@ -1,6 +1,6 @@ # Confluence page Doc Context enrichment -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-doc-context-enrichment/04-version-bump-and-docs.md b/docs/issues/pr-review-doc-context-enrichment/04-version-bump-and-docs.md index 257be7b..88d3a18 100644 --- a/docs/issues/pr-review-doc-context-enrichment/04-version-bump-and-docs.md +++ b/docs/issues/pr-review-doc-context-enrichment/04-version-bump-and-docs.md @@ -1,6 +1,6 @@ # Version bump + CHANGELOG + docs -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-doc-context-spawn-reliability/01-adr-and-synthesizer-agent.md b/docs/issues/pr-review-doc-context-spawn-reliability/01-adr-and-synthesizer-agent.md index ed5f6e8..307c0a7 100644 --- a/docs/issues/pr-review-doc-context-spawn-reliability/01-adr-and-synthesizer-agent.md +++ b/docs/issues/pr-review-doc-context-spawn-reliability/01-adr-and-synthesizer-agent.md @@ -1,6 +1,6 @@ # ADR-0012 + Doc Context Synthesizer agent -**Status:** resolved +**Status:** closed **Category:** enhancement **Type:** AFK diff --git a/docs/issues/pr-review-doc-context-spawn-reliability/02-orchestrator-agent.md b/docs/issues/pr-review-doc-context-spawn-reliability/02-orchestrator-agent.md index a2a8d0c..c267efa 100644 --- a/docs/issues/pr-review-doc-context-spawn-reliability/02-orchestrator-agent.md +++ b/docs/issues/pr-review-doc-context-spawn-reliability/02-orchestrator-agent.md @@ -1,6 +1,6 @@ # Doc Context Orchestrator agent -**Status:** resolved +**Status:** closed **Category:** enhancement **Type:** AFK diff --git a/docs/issues/pr-review-doc-context-spawn-reliability/03-wire-up-and-housekeeping.md b/docs/issues/pr-review-doc-context-spawn-reliability/03-wire-up-and-housekeeping.md index 15e365f..0d5f78d 100644 --- a/docs/issues/pr-review-doc-context-spawn-reliability/03-wire-up-and-housekeeping.md +++ b/docs/issues/pr-review-doc-context-spawn-reliability/03-wire-up-and-housekeeping.md @@ -1,6 +1,6 @@ # Wire-up: step 4a rewrite + README + CHANGELOG -**Status:** resolved +**Status:** closed **Category:** bug **Type:** AFK diff --git a/docs/issues/pr-review-orchestrator-split/01-create-ado-fetcher-agent.md b/docs/issues/pr-review-orchestrator-split/01-create-ado-fetcher-agent.md new file mode 100644 index 0000000..6e6e2cb --- /dev/null +++ b/docs/issues/pr-review-orchestrator-split/01-create-ado-fetcher-agent.md @@ -0,0 +1,65 @@ +# Create ADO Fetcher agent + +**Status:** resolved +**Category:** enhancement + +## Parent + +`docs/issues/pr-review-orchestrator-split/PRD.md` + +## What to build + +Create a new plugin agent (`pr-review:ado-fetcher`) that encapsulates all Azure DevOps read operations required for a PR review. The agent receives a PR URL (org, project, PR ID) and an optional prior iteration ID (passed in for re-review, empty string for first-review), and returns a structured context block containing: PR metadata (title, description, source/target branches, repo ID), latest iteration ID and its commit SHA, changed files list, raw diff, and work-item IDs linked to the PR. + +This agent replaces the inline ADO shell commands currently scattered across Steps 2–5 of the `review-pr` command. It is invoked by first-review and re-review modes; pre-PR mode never calls it. + +The ADO Fetcher is a prerequisite for the Doc Context Orchestrator — the Fetcher must complete first because its output (work-item IDs) is the input the Doc Context Orchestrator needs. Once the Fetcher returns, the Doc Context Orchestrator and review aspect agents launch concurrently with each other. + +## Acceptance criteria + +- [x] The agent accepts PR URL components (org URL, project, PR ID) and returns a structured context block +- [x] The context block includes PR metadata, latest iteration ID, latest commit SHA, changed files list, and raw diff +- [x] The context block includes the work-item IDs linked to the PR (empty list if none) +- [x] The agent handles the case where no iterations are returned (defaults gracefully) +- [x] The agent handles PRs that are already merged (continues without error) +- [x] The agent contains no write operations — it is purely a read agent + +## Blocked by + +None — can start immediately. + +--- + +## Agent Brief + +> _This was generated by AI during triage._ + +**Category:** enhancement +**Summary:** Create the `pr-review:ado-fetcher` agent that encapsulates all ADO read operations for a PR review. + +**Current behavior:** +ADO read operations (PR metadata, iterations, changed files, diff, work-item IDs) are scattered as inline `az devops invoke` shell commands across multiple steps of the `review-pr` command. There is no dedicated agent for this concern. + +**Desired behavior:** +A new plugin agent (`pr-review:ado-fetcher`) accepts PR URL components and returns a single structured context block. All other agents and the orchestrator consume this block rather than making their own ADO calls. The agent is purely read-only — it performs no write operations. + +**Key interfaces:** + +- Input: org URL, project, PR ID, optional prior iteration ID (passed in for re-review) +- Output: structured context block — PR metadata, latest iteration ID, latest commit SHA, changed files list, raw diff, work-item IDs list +- The agent must handle zero-iteration PRs and already-merged PRs gracefully + +**Acceptance criteria:** + +- [x] The agent accepts PR URL components and returns a structured context block +- [x] The context block includes PR metadata, latest iteration ID, latest commit SHA, changed files list, and raw diff +- [x] The context block includes the work-item IDs linked to the PR (empty list if none) +- [x] The agent handles the case where no iterations are returned (defaults gracefully) +- [x] The agent handles PRs that are already merged (continues without error) +- [x] The agent contains no write operations — it is purely a read agent + +**Out of scope:** + +- Any ADO write operations +- Doc Context fetching (that stays with the Doc Context Orchestrator) +- GitHub or GitLab platform support diff --git a/docs/issues/pr-review-orchestrator-split/02-create-ado-writer-agent.md b/docs/issues/pr-review-orchestrator-split/02-create-ado-writer-agent.md new file mode 100644 index 0000000..ac545a3 --- /dev/null +++ b/docs/issues/pr-review-orchestrator-split/02-create-ado-writer-agent.md @@ -0,0 +1,72 @@ +# Create ADO Writer agent + +**Status:** resolved +**Category:** enhancement + +## Parent + +`docs/issues/pr-review-orchestrator-split/PRD.md` + +## What to build + +Create a new plugin agent (`pr-review:ado-writer`) that encapsulates all Azure DevOps write-back operations for a PR review. The agent receives: PR context (org URL, project, repo ID, PR ID, latest iteration ID, summary thread ID), a list of compact findings, and a mode flag (first-review or re-review). + +For each finding it posts a new Inline Comment thread to ADO. After all findings are posted it posts the Review Summary on first-review, or a delta reply to the existing summary thread on re-review. As its final action it posts the completion marker reply to the summary thread. + +The compact finding schema the agent accepts: `{ severity, filePath, startLine, endLine, title, body }`. Every comment posted must end with the canonical Bot Signature trailer `---\n🤖 *Reviewed by Claude Code* — Iteration N`. + +This agent is used by both first-review and re-review modes. It is not invoked in pre-PR mode. + +## Acceptance criteria + +- [x] The agent posts one Inline Comment thread per finding at the correct file path and line range +- [x] Each posted comment ends with the canonical Bot Signature including the iteration number +- [x] On first-review, the agent posts a full Review Summary as a new general thread +- [x] On re-review with at least one new finding, the agent posts a delta reply to the existing summary thread +- [x] On re-review with zero new findings, the agent skips the summary reply +- [x] The agent posts a completion marker reply (`✅ Review complete — Iteration N`) to the summary thread as its final action +- [x] If `threadContext` is rejected by ADO (file not in diff), the agent retries without `threadContext` (general comment fallback) +- [x] The agent returns the final `SUMMARY_THREAD_ID` and `FINDINGS_POSTED` count to the caller + +## Blocked by + +None — can start immediately. + +--- + +## Agent Brief + +> _This was generated by AI during triage._ + +**Category:** enhancement +**Summary:** Create the `pr-review:ado-writer` agent that encapsulates all ADO write-back operations for a PR review. + +**Current behavior:** +ADO write operations (posting Inline Comment threads, patching thread status, posting the Review Summary, posting the completion marker) are inline shell commands in `review-pr.md`. There is no dedicated agent for write-back. + +**Desired behavior:** +A new plugin agent (`pr-review:ado-writer`) receives a PR context block, a compact findings list, a mode flag, and an optional existing summary thread ID. It posts all Inline Comments, the Review Summary (or delta reply on re-review), and the completion marker. Every comment ends with the canonical Bot Signature. + +**Key interfaces:** + +- Input: PR context (org URL, project, repo ID, PR ID, latest iteration ID, summary thread ID), findings list as `{ severity, filePath, startLine, endLine, title, body }[]`, mode (`first-review` | `re-review`) +- Output: `{ summaryThreadId, findingsPosted }` returned to the caller +- Bot Signature constant: `🤖 *Reviewed by Claude Code*` prefix — must not change +- On `threadContext` rejection by ADO, retries without `threadContext` (general comment fallback) + +**Acceptance criteria:** + +- [ ] The agent posts one Inline Comment thread per finding at the correct file path and line range +- [ ] Each posted comment ends with the canonical Bot Signature including the iteration number +- [ ] On first-review, the agent posts a full Review Summary as a new general thread +- [ ] On re-review with at least one new finding, the agent posts a delta reply to the existing summary thread +- [ ] On re-review with zero new findings, the agent skips the summary reply +- [ ] The agent posts a completion marker reply to the summary thread as its final action +- [ ] If `threadContext` is rejected by ADO, the agent retries without `threadContext` +- [ ] The agent returns the final `summaryThreadId` and `findingsPosted` count + +**Out of scope:** + +- Reply posting for re-review classified threads (that is the Re-review Coordinator's responsibility) +- Pre-PR mode (no ADO calls in that mode) +- Reading or fetching any ADO data diff --git a/docs/issues/pr-review-orchestrator-split/03-create-re-review-coordinator-agent.md b/docs/issues/pr-review-orchestrator-split/03-create-re-review-coordinator-agent.md new file mode 100644 index 0000000..35ed752 --- /dev/null +++ b/docs/issues/pr-review-orchestrator-split/03-create-re-review-coordinator-agent.md @@ -0,0 +1,81 @@ +# Create Re-review Coordinator agent + +**Status:** resolved +**Category:** enhancement + +## Parent + +`docs/issues/pr-review-orchestrator-split/PRD.md` + +## What to build + +Create a new plugin agent (`pr-review:re-review-coordinator`) that owns the full re-review state machine. The agent receives the ADO Fetcher context block (which includes the raw diff) and the raw full PR threads JSON (the unfiltered ADO thread list). It parses the raw diff into diff hunks internally before calling `classify-thread` — the hunks file is a temp artefact managed inside the agent, not an input from the orchestrator. + +It performs in order: + +1. Calls the `detect-prior-review` Node.js module to identify prior bot threads and locate the summary thread. +2. Runs the partial-run check (looks for the completion marker for the prior iteration in the summary thread). Falls back to first-review mode if the marker is absent. +3. If no new revisions exist since the prior review (prior iteration ID equals latest iteration ID), prints outstanding pending threads to the console and exits early — no ADO writes. +4. Calls `classify-thread` on each prior thread against the diff hunks. +5. For each new finding passed in, calls `match-finding` to look for a matching prior thread. +6. Based on classification, posts replies to prior threads: acknowledges disputes, confirms resolutions (and PATCHes thread status to fixed), adds new evidence to pending threads with new information, skips pending threads with no new evidence, ignores obsolete threads. +7. Returns the classification counts (addressed, disputed, pending, obsolete), the fresh findings list (`freshFindings[]` — only unmatched findings; matched findings are consumed and not returned), and an `earlyExit` flag. `earlyExit` is `true` only on the no-new-revisions path (step 3); it is `false` on all other paths including normal completion with zero fresh findings. + +The four Node.js modules (`detect-prior-review`, `classify-thread`, `match-finding`, `parse-signature`) remain in `scripts/re-review/` unchanged. This agent calls them via `node --input-type=module` inline scripts, exactly as the current `review-pr.md` does. + +## Acceptance criteria + +- [ ] The agent correctly detects prior bot threads using the `detect-prior-review` module +- [ ] The agent falls back to first-review mode when no completion marker is found for the prior iteration +- [ ] The agent exits early (console output only, no ADO writes) when prior and latest iteration IDs are identical, and returns `earlyExit: true` +- [ ] The agent classifies all prior threads using the `classify-thread` module +- [ ] The agent matches new findings to prior threads using the `match-finding` module with ±3-line drift tolerance +- [ ] The agent posts a dispute acknowledgement reply to disputed threads including the ADO nudge +- [ ] The agent posts a resolution confirmation reply and PATCHes status to fixed for addressed threads +- [ ] The agent posts a new-evidence reply to pending threads that have new analysis; skips pending threads with no new evidence +- [ ] The agent returns classification counts (addressed, disputed, pending, obsolete), the unmatched (fresh) findings list, and the `earlyExit` flag +- [ ] The existing re-review module unit tests (`detect-prior-review`, `classify-thread`, `match-finding`, `parse-signature`) pass unchanged + +## Blocked by + +None — can start immediately. + +--- + +## Agent Brief + +> _This was generated by AI during triage._ + +**Category:** enhancement +**Summary:** Create the `pr-review:re-review-coordinator` agent that owns the full re-review state machine. + +**Current behavior:** +The re-review state machine (prior thread detection, partial-run check, Thread Classification, finding matching, reply/resolution posting) lives inline in `review-pr.md` across Steps 3.5–10-Path-B. It is loaded on every invocation regardless of mode. + +**Desired behavior:** +A new plugin agent (`pr-review:re-review-coordinator`) receives the ADO Fetcher context block (which includes the raw diff) and the raw full PR threads JSON (unfiltered). It calls `detect-prior-review` internally to identify bot threads, parses the raw diff into diff hunks internally, then runs the full re-review state machine, posts classified replies directly to ADO, and returns classification counts plus the list of unmatched (fresh) findings for the ADO Writer to post as new threads. The four existing Node.js modules (`detect-prior-review`, `classify-thread`, `match-finding`, `parse-signature`) are called from this agent unchanged. + +**Key interfaces:** + +- Input: ADO Fetcher context block (includes raw diff), raw full PR threads JSON (captured by the orchestrator during mode detection via `az repos pr thread list` — not re-fetched; `detect-prior-review` filters this list inside the Coordinator), new findings list, Bot Signature prefix constant +- The Coordinator parses the raw diff into diff hunks internally; this is not an orchestrator concern +- Output: `{ addressed, disputed, pending, obsolete, freshFindings[], earlyExit }` — fresh findings are those with no matching prior thread; `earlyExit: true` signals the no-new-revisions path to the orchestrator +- The agent calls the four Node.js modules via `node --input-type=module` inline scripts (same pattern as current `review-pr.md`) +- Early-exit path: when prior iteration ID equals latest iteration ID, prints pending threads to console and returns `{ earlyExit: true, freshFindings: [], addressed: 0, disputed: 0, pending: N, obsolete: 0 }` — all count fields are always present; the orchestrator must skip the ADO Writer entirely when `earlyExit: true` + +**Acceptance criteria:** + +- [ ] The agent correctly detects prior bot threads using the `detect-prior-review` module +- [ ] The agent falls back to first-review mode when no completion marker is found for the prior iteration +- [ ] The agent exits early when prior and latest iteration IDs are identical (console output only, no ADO writes), returning `earlyExit: true` +- [ ] The agent classifies all prior threads using the `classify-thread` module +- [ ] The agent matches new findings to prior threads using `match-finding` with ±3-line drift tolerance +- [ ] The agent posts dispute acknowledgement, resolution confirmation, and new-evidence replies appropriately +- [ ] The agent returns classification counts (addressed, disputed, pending, obsolete), the unmatched fresh findings list, and the `earlyExit` flag +- [ ] The four re-review module unit tests pass unchanged after this issue is implemented + +**Out of scope:** + +- Posting new Inline Comment threads for fresh findings (ADO Writer's responsibility) +- Posting the Review Summary or completion marker (ADO Writer's responsibility) +- First-review or pre-PR mode logic diff --git a/docs/issues/pr-review-orchestrator-split/04-refactor-orchestrator.md b/docs/issues/pr-review-orchestrator-split/04-refactor-orchestrator.md new file mode 100644 index 0000000..3a150b7 --- /dev/null +++ b/docs/issues/pr-review-orchestrator-split/04-refactor-orchestrator.md @@ -0,0 +1,83 @@ +# Refactor review-pr.md to thin orchestrator + +**Status:** resolved +**Category:** enhancement + +## Parent + +`docs/issues/pr-review-orchestrator-split/PRD.md` + +## What to build + +Refactor `review-pr.md` into a thin orchestrator of approximately 200 lines. The orchestrator: + +1. Validates prerequisites in a mode-aware way: always checks `git` availability and `pr-review-toolkit`; checks Azure CLI and `azure-devops` extension only when a PR URL is present (Pre-PR mode requires no ADO credentials). +2. Parses `$ARGUMENTS` for a PR URL. If absent, sets mode to Pre-PR; if present, proceeds to detection. +3. For PR URL cases: makes a lightweight ADO thread-list call directly (not via the ADO Fetcher) using `az repos pr thread list` (not `az devops invoke`) to check for a prior Bot Signature, determining mode and extracting the prior iteration ID if found. The full thread list from this call is captured and passed forward to the Re-review Coordinator in step 6 — no second ADO thread-list call is made. +4. Logs the detected mode clearly before delegating. +5. For First-review: invokes the ADO Fetcher agent (passing org URL, project, PR ID), then runs Doc Context Orchestrator + review aspect agents in parallel, collects compact findings, delegates write-back to the ADO Writer agent. +6. For Re-review: invokes the ADO Fetcher agent (passing org URL, project, PR ID, and prior iteration ID), then runs Doc Context Orchestrator + review aspect agents in parallel. Once all review aspect agents return their findings, passes the complete findings list and prior-thread data to the Re-review Coordinator agent (which handles replies). If the Coordinator returns `earlyExit: true` (no new revisions), the orchestrator stops — ADO Writer is not called. Otherwise passes fresh findings to the ADO Writer agent. +7. Pre-PR mode is a stub at this slice — it detects the mode and prints a "Pre-PR mode not yet implemented" message. Full Pre-PR behaviour is delivered in issue 05. + +The `review-pr.md` file must contain no `az devops invoke` shell commands after this refactor — the three focused agents own all data-fetch and write-back ADO operations. The one allowed inline ADO call is the mode-detection `az repos pr thread list` in step 3, which is an orchestration concern, not a data-fetch or write-back operation. The Bot Signature constants and detection prefix are unchanged. All existing re-review module unit tests must pass. + +## Acceptance criteria + +- [ ] `review-pr.md` is ≤ 200 lines and contains no `az devops invoke` calls +- [ ] Prerequisite checks are mode-aware: Azure CLI and `azure-devops` extension are not required in Pre-PR mode (no PR URL provided) +- [ ] The orchestrator logs the detected mode (Pre-PR / First-review / Re-review) before delegating +- [ ] First-review mode produces the same ADO comment output as the pre-refactor command (full Review Summary + Inline Comments + completion marker) +- [ ] Re-review mode produces the same ADO comment output as the pre-refactor command (classified replies + fresh findings + delta summary + completion marker) +- [ ] Pre-PR mode prints a clear "not yet implemented" message and exits cleanly +- [ ] The ADO Fetcher and Doc Context Orchestrator still run in the correct order (Fetcher first, then both Doc Context and review agents can overlap) +- [ ] The Bot Signature format and detection prefix are unchanged +- [ ] `pnpm test` passes (all re-review module unit tests green) +- [ ] `pnpm format` produces no diff + +## Blocked by + +- `docs/issues/pr-review-orchestrator-split/01-create-ado-fetcher-agent.md` +- `docs/issues/pr-review-orchestrator-split/02-create-ado-writer-agent.md` +- `docs/issues/pr-review-orchestrator-split/03-create-re-review-coordinator-agent.md` + +--- + +## Agent Brief + +> _This was generated by AI during triage._ + +**Category:** enhancement +**Summary:** Refactor `review-pr.md` into a thin orchestrator (~200 lines) that delegates to the three focused agents. + +**Current behavior:** +`review-pr.md` is ~1000 lines, mixing orchestration logic, ADO shell commands, re-review state machine, and write-back in a single file. Every invocation loads the entire file into context. + +**Desired behavior:** +`review-pr.md` shrinks to ~200 lines containing: mode-aware prerequisite validation (ADO tooling skipped for Pre-PR), argument parsing, mode detection (Pre-PR / First-review / Re-review), and delegation calls to the ADO Fetcher, Re-review Coordinator, and ADO Writer agents. The file contains no `az devops invoke` calls. Pre-PR mode is a stub that prints "not yet implemented" — full implementation is in issue 05. + +**Key interfaces:** + +- Mode detection sequence: no URL → Pre-PR; URL → orchestrator calls `az repos pr thread list` (not `az devops invoke`) → no Bot Signature → First-review; Bot Signature found → extract prior iteration ID → Re-review +- Bot Signature detection prefix: `🤖 *Reviewed by Claude Code*` — must not change +- ADO Fetcher agent invocation: passes org URL, project, PR ID (plus prior iteration ID in re-review) +- Re-review Coordinator agent invocation (re-review only): called after all review aspect agents complete; passes ADO Fetcher context + full PR threads JSON (captured from mode detection in step 3, not re-fetched) + new findings list; returns `{ earlyExit, freshFindings[], addressed, disputed, pending, obsolete }` +- If Coordinator returns `earlyExit: true`, orchestrator stops — ADO Writer is not called +- ADO Writer agent invocation: passes PR context + fresh findings list + mode (`"first-review"` | `"re-review"`); the mode determines whether ADO Writer posts a new Review Summary (first-review) or a delta reply (re-review) + +**Acceptance criteria:** + +- [ ] `review-pr.md` is ≤ 200 lines and contains no `az devops invoke` calls +- [ ] Prerequisite checks are mode-aware: Azure CLI and `azure-devops` extension are not required in Pre-PR mode +- [ ] The orchestrator logs the detected mode before delegating +- [ ] First-review produces the same ADO output as pre-refactor +- [ ] Re-review produces the same ADO output as pre-refactor +- [ ] Pre-PR mode prints "not yet implemented" and exits cleanly +- [ ] ADO Fetcher runs before Doc Context Orchestrator (Fetcher provides work-item IDs); Doc Context and review agents may overlap +- [ ] Bot Signature format and detection prefix unchanged +- [ ] `pnpm test` passes; `pnpm format` produces no diff + +**Out of scope:** + +- Full Pre-PR mode implementation (issue 05) +- Compact sub-agent output guidance (issue 06) +- Version bump (issue 07) diff --git a/docs/issues/pr-review-orchestrator-split/05-add-pre-pr-mode.md b/docs/issues/pr-review-orchestrator-split/05-add-pre-pr-mode.md new file mode 100644 index 0000000..318c345 --- /dev/null +++ b/docs/issues/pr-review-orchestrator-split/05-add-pre-pr-mode.md @@ -0,0 +1,73 @@ +# Add Pre-PR mode + +**Status:** resolved +**Category:** enhancement + +## Parent + +`docs/issues/pr-review-orchestrator-split/PRD.md` + +## What to build + +Implement the Pre-PR operating mode in the orchestrator. When `/pr-review:review-pr` is invoked without a PR URL, the command: + +1. Diffs the current local branch against its upstream target (e.g. `git diff origin/...HEAD`). +2. Reads key changed files (same skip-list as today: generated files, serialization YAMLs, etc.). +3. Launches the same `pr-review-toolkit` review aspect agents as the ADO modes, passing the local diff and file contents. Doc Context is skipped (no work items or Confluence pages to fetch without a PR). +4. Aggregates findings and presents them in the Claude interface as a structured list (severity, file, line, title, body) — no ADO calls are made. +5. Prints a clear completion message when done. + +No ADO credentials are required and no ADO calls are made in this mode. The pre-PR Review uses the same review aspect agent selection logic as ADO modes (aspect filter from `$ARGUMENTS` applies). + +## Acceptance criteria + +- [x] Running the command without a URL enters Pre-PR mode with a console message confirming the mode +- [x] The diff used is the local branch diff against its upstream target +- [x] Review aspect agents receive the local diff and changed file contents +- [x] Findings are presented in the Claude interface with severity, file path, line range, title, and body +- [x] No ADO API calls are made in this mode +- [x] The aspect filter argument (e.g. `code`, `errors`, `all`) is respected in pre-PR mode +- [x] `pnpm test` passes; `pnpm format` produces no diff + +## Blocked by + +- `docs/issues/pr-review-orchestrator-split/04-refactor-orchestrator.md` + +--- + +## Agent Brief + +> _This was generated by AI during triage._ + +**Category:** enhancement +**Summary:** Implement Pre-PR mode — review local branch diff without a PR URL, no ADO write-back. + +**Current behavior:** +The orchestrator stub from issue 04 prints "Pre-PR mode not yet implemented" and exits. No review occurs without a PR URL. + +**Desired behavior:** +When the command is invoked without a URL, it diffs the local branch against its upstream target, runs the same `pr-review-toolkit` review aspect agents as ADO modes, and presents compact structured findings in the Claude interface. No ADO credentials are required or used. Doc Context gathering is skipped (no work items to fetch). The aspect filter argument applies. + +**Key interfaces:** + +- Diff source: `git diff origin/...HEAD` +- File skip-list: same as current (generated files, serialization YAMLs, `*.g.cs`, `swagger.md`) +- Review aspect agents: same selection logic as ADO modes; aspect filter from `$ARGUMENTS` applies +- Finding presentation: compact structured list — severity, file path, line range, title, body — in the Claude interface +- No ADO Fetcher, Re-review Coordinator, or ADO Writer agents are invoked + +**Acceptance criteria:** + +- [ ] Running the command without a URL enters Pre-PR mode with a console message confirming the mode +- [ ] The diff used is the local branch diff against its upstream target +- [ ] Review aspect agents receive the local diff and changed file contents +- [ ] Findings are presented in the Claude interface with severity, file path, line range, title, and body +- [ ] No ADO API calls are made in this mode +- [ ] The aspect filter argument is respected +- [ ] `pnpm test` passes; `pnpm format` produces no diff + +**Out of scope:** + +- Posting findings to ADO +- Doc Context gathering in pre-PR mode +- Any change to first-review or re-review behaviour diff --git a/docs/issues/pr-review-orchestrator-split/06-compact-subagent-output.md b/docs/issues/pr-review-orchestrator-split/06-compact-subagent-output.md new file mode 100644 index 0000000..7eb4292 --- /dev/null +++ b/docs/issues/pr-review-orchestrator-split/06-compact-subagent-output.md @@ -0,0 +1,65 @@ +# Add compact sub-agent output guidance to the review-agent launch step + +**Status:** resolved +**Category:** enhancement + +## Parent + +`docs/issues/pr-review-orchestrator-split/PRD.md` + +## What to build + +Update the step in the thin orchestrator that launches `pr-review-toolkit` review aspect agents to instruct them to return compact structured findings rather than prose with embedded code quotes. (The thin orchestrator produced by issue 04 will have a different step numbering than the pre-refactor monolith — find the step by its purpose: the one that spawns the parallel review agents.) + +The prompt addition instructs each agent to return a JSON array where each element has: `severity` (critical / important / minor), `filePath` (leading `/`, forward slashes), `startLine` (integer), `endLine` (integer), `title` (one line, ≤ 80 chars), `body` (one paragraph — the exact text to post as the ADO or local-interface comment, no code quotes, no repeated context). The reasoning and supporting analysis should stay inside the agent's own context, not appear in the return value. + +No changes are made to `pr-review-toolkit` agent definitions — this guidance lives only in the orchestrator's prompt to the agents. + +## Acceptance criteria + +- [ ] The review-agent launch step explicitly requests structured JSON findings with the six required fields +- [ ] The prompt instructs agents to omit code quotes and prose reasoning from the return value +- [ ] The ADO Writer agent correctly receives and processes the structured finding schema +- [ ] Pre-PR mode findings are also presented using the same structured schema +- [ ] No `pr-review-toolkit` agent definition files are modified +- [ ] `pnpm format` produces no diff + +## Blocked by + +- `docs/issues/pr-review-orchestrator-split/04-refactor-orchestrator.md` + +--- + +## Agent Brief + +> _This was generated by AI during triage._ + +**Category:** enhancement +**Summary:** Update the review-agent launch step in the thin orchestrator to request compact structured findings from review aspect agents. + +**Current behavior:** +Review aspect agents return prose findings with embedded code quotes and explanatory text. The full prose flows back into the parent context as tool call results, contributing to token budget pressure. + +**Desired behavior:** +The step in the thin orchestrator that spawns parallel review agents explicitly instructs each `pr-review-toolkit` review aspect agent to return a JSON array of findings. Locate this step by purpose — it is the one that launches the review agents in parallel — not by number (step numbering changed after the issue 04 refactor). Each element: `severity` (critical / important / minor), `filePath`, `startLine`, `endLine`, `title` (≤ 80 chars), `body` (one paragraph, no code quotes). Reasoning stays inside the agent's own context. No `pr-review-toolkit` agent definition files are modified. + +**Key interfaces:** + +- The structured finding schema: `{ severity, filePath, startLine, endLine, title, body }` +- Guidance location: the review-agent launch step in the orchestrator only — not in toolkit agent definitions +- Both ADO modes and Pre-PR mode use this schema + +**Acceptance criteria:** + +- [ ] The review-agent launch step requests structured JSON findings with all six required fields +- [ ] The prompt instructs agents to omit code quotes and prose reasoning from the return value +- [ ] The ADO Writer agent correctly receives and processes the structured schema +- [ ] Pre-PR mode findings are presented using the same schema +- [ ] No `pr-review-toolkit` agent definition files are modified +- [ ] `pnpm format` produces no diff + +**Out of scope:** + +- Enforcing schema validation on agent output +- Changing the toolkit agent definitions +- Token-budget monitoring or benchmarking diff --git a/docs/issues/pr-review-orchestrator-split/07-version-bump-and-release.md b/docs/issues/pr-review-orchestrator-split/07-version-bump-and-release.md new file mode 100644 index 0000000..44a5b8b --- /dev/null +++ b/docs/issues/pr-review-orchestrator-split/07-version-bump-and-release.md @@ -0,0 +1,63 @@ +# Version bump and CHANGELOG + +**Status:** resolved +**Category:** enhancement + +## Parent + +`docs/issues/pr-review-orchestrator-split/PRD.md` + +## What to build + +Bump the `pr-review` plugin version (minor bump — new features added) and add a dated CHANGELOG entry covering the orchestrator split, the three new agents, pre-PR mode, and compact sub-agent output. + +Run `pnpm --filter pr-review bump minor` to update both `plugin.json` and `marketplace.json`. Add a `[Unreleased]` → versioned entry to `CHANGELOG.md` following the existing format. Run `pnpm --filter pr-review verify:changelog` to confirm the entry passes validation. + +Update `CLAUDE.md` to reflect the new architecture: remove the claim that "the entire behaviour of the plugin lives in `commands/review-pr.md`", add the `.agents/` directory to the repository layout, and update the command conventions section to note that ADO calls now live in the three focused agents (ADO Fetcher, Re-review Coordinator, ADO Writer) rather than inline in the command. + +## Acceptance criteria + +- [ ] `plugin.json` and `marketplace.json` both reflect the new minor version +- [ ] `CHANGELOG.md` has a dated entry for the new version describing the orchestrator split, three new agents, pre-PR mode, and compact output guidance +- [ ] `pnpm --filter pr-review verify:changelog` passes +- [ ] `CLAUDE.md` updated: "entire behaviour lives in `commands/review-pr.md`" claim removed, `.agents/` directory added to layout, command conventions updated to reflect ADO calls live in the focused agents +- [ ] `pnpm format` produces no diff + +## Blocked by + +- `docs/issues/pr-review-orchestrator-split/05-add-pre-pr-mode.md` +- `docs/issues/pr-review-orchestrator-split/06-compact-subagent-output.md` + +--- + +## Agent Brief + +> _This was generated by AI during triage._ + +**Category:** enhancement +**Summary:** Bump `pr-review` to the next minor version and add a CHANGELOG entry for the orchestrator split. + +**Current behavior:** +`plugin.json` and `marketplace.json` carry the current version. `CHANGELOG.md` has no entry for this feature set. + +**Desired behavior:** +Run `pnpm --filter pr-review bump minor` to update both version files atomically. Add a dated entry to `CHANGELOG.md` under the new version number describing: orchestrator split, three new focused agents (ADO Fetcher, Re-review Coordinator, ADO Writer), pre-PR mode, and compact sub-agent output guidance. Verify with `pnpm --filter pr-review verify:changelog`. + +**Key interfaces:** + +- `pnpm --filter pr-review bump minor` — updates `plugin.json` and `marketplace.json` +- `CHANGELOG.md` entry format — must match the existing dated em-dash format enforced by `verify:changelog` +- `pnpm --filter pr-review verify:changelog` — must pass + +**Acceptance criteria:** + +- [ ] `plugin.json` and `marketplace.json` both reflect the new minor version +- [ ] `CHANGELOG.md` has a dated entry for the new version covering all four feature areas +- [ ] `pnpm --filter pr-review verify:changelog` passes +- [ ] `CLAUDE.md` updated to reflect the three-agent architecture +- [ ] `pnpm format` produces no diff + +**Out of scope:** + +- Publishing to the marketplace (manual step) +- Creating the git tag (handled by the release workflow on `main`) diff --git a/docs/issues/pr-review-orchestrator-split/PRD.md b/docs/issues/pr-review-orchestrator-split/PRD.md new file mode 100644 index 0000000..220545e --- /dev/null +++ b/docs/issues/pr-review-orchestrator-split/PRD.md @@ -0,0 +1,179 @@ +# PRD: pr-review — Orchestrator Split + +**Status:** ready-for-agent +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` + +--- + +## Problem Statement + +The `review-pr` command has grown into a ~1000-line monolith that conflates three distinct concerns: orchestration (which operating mode?), ADO platform integration (fetch metadata, post comments), and re-review state management (classify threads, match findings, reply). As a result, every invocation loads the full command file into context, combined with parallel review-agent results flowing back, pushing average PR reviews past 100 K parent-context tokens. Adding the pre-PR mode that developers are requesting would push the file to ~1300 lines and compound the problem further. + +## Solution + +Refactor `review-pr.md` into a thin orchestrator of ~200 lines that detects the operating mode and immediately delegates to focused agents. The three focused agents — ADO Fetcher, Re-review Coordinator, and ADO Writer — live in the plugin's own agent directory and only load when their mode is active. Pre-PR runs never touch ADO at all. Review aspect agents are also asked to return compact structured findings rather than prose, keeping what flows back into the parent context small. + +## User Stories + +1. As a developer running `/pr-review:review-pr` on a first-review PR, I want the command to execute without loading re-review state-machine logic, so that the parent context is not burdened by code paths that do not apply. + +2. As a developer running `/pr-review:review-pr` on a re-review PR, I want the Re-review Coordinator to own all prior-thread detection and classification, so that the orchestrator stays short and readable. + +3. As a developer who wants to review code before opening a PR, I want to run `/pr-review:review-pr` without a PR URL and receive findings in the Claude interface, so that I can catch issues before the PR is even created. + +4. As a developer running a pre-PR Review, I want no comments posted to ADO, so that draft feedback does not pollute the eventual PR conversation. + +5. As a developer, I want the orchestrator to tell me clearly which mode it is entering (Pre-PR, First-review, or Re-review), so that I can understand what will happen before it starts. + +6. As a developer on a large PR, I want review-agent findings returned as compact structured records rather than prose with embedded code quotes, so that the parent context stays within budget. + +7. As a developer, I want the structured finding to include severity, file path, start line, end line, a short title, and one-paragraph comment body, so that the ADO Writer has everything it needs to post the Inline Comment without re-querying the agent. + +8. As a developer, I want the ADO Fetcher to encapsulate all ADO API calls needed to retrieve PR metadata, iterations, changed files, and the raw diff, so that the orchestrator does not contain any platform-specific shell commands. + +9. As a developer, I want the ADO Writer to encapsulate all ADO write-back operations — posting Inline Comments, patching Thread status, and posting the Review Summary or delta reply — so that those operations are not scattered across the orchestrator. + +10. As a developer, I want the Re-review Coordinator to own the partial-run check, so that the orchestrator does not need to know about completion markers or fallback logic. + +11. As a developer on a re-review PR with no new commits, I want the Re-review Coordinator to exit early and list outstanding pending threads in the console, so that no ADO comments are posted unnecessarily. + +12. As a developer, I want adding a future operating mode (e.g. post-merge audit) to require only a new agent and a small branch in the orchestrator, so that the monolith problem does not recur. + +13. As a plugin operator, I want all four re-review Node.js modules (detect-prior-review, classify-thread, match-finding, parse-signature) to remain in the plugin's scripts directory unchanged, so that the split does not alter tested behaviour. + +14. As a plugin operator, I want the orchestrator to validate prerequisites (Azure CLI, `azure-devops` extension, `pr-review-toolkit` availability) before entering any mode, so that failures are surfaced early and consistently. + +15. As a plugin operator, I want the Bot Signature format and detection prefix to remain unchanged after the split, so that existing Review Threads on live PRs are still recognised correctly. + +16. As a developer reading the codebase, I want each agent to have a single clearly named responsibility, so that I know exactly which file to open when debugging an ADO write error versus a thread-classification error. + +17. As a developer running a first-review, I want the ADO Fetcher to complete first (providing work-item IDs), then the Doc Context Orchestrator and review aspect agents to run concurrently with each other, so that the split does not increase wall-clock time. + +18. As a developer, I want the guidance for compact review-agent output to live in the orchestrator's review-agent launch step rather than in the `pr-review-toolkit` agent definitions, so that the toolkit remains an unmodified read-only dependency. + +19. As a plugin operator, I want the existing test suite for the four re-review modules to continue passing after the split with no changes, so that I have confidence the refactor is behaviour-preserving. + +20. As a developer, I want the pre-PR mode to use the same `pr-review-toolkit` review aspect agents as the ADO modes, so that review quality is consistent regardless of whether a PR URL is provided. + +## Implementation Decisions + +### Operating modes + +The orchestrator detects one of three modes on startup: + +- **Pre-PR mode** — no PR URL provided; targets the local branch diff; no ADO write-back. +- **First-review mode** — PR URL provided; no prior Bot Signature found in the PR's threads. +- **Re-review mode** — PR URL provided; prior Bot Signature detected. + +Mode detection happens within the first ~50 lines of the orchestrator. Once detected, the orchestrator delegates entirely. + +### Focused agents + +Three new agents live in the plugin's `.agents/` directory: + +**ADO Fetcher** — encapsulates all ADO read operations: PR metadata, iterations, changed files list, and raw diff. Returns a structured context block consumed by the orchestrator for passing to review agents and the writer. Used by first-review and re-review modes only. + +**Re-review Coordinator** — owns everything in the current re-review path: prior thread detection (calling `detect-prior-review`), partial-run check, early exit for no new commits, Thread Classification (calling `classify-thread`), finding matching (calling `match-finding`), and reply posting to classified threads. The four Node.js modules remain in `scripts/re-review/` and are called from this agent, not inlined. Used only in re-review mode. + +**ADO Writer** — owns all ADO write-back: posting new Inline Comment threads for fresh findings, patching Thread status to fixed for addressed findings, posting reply comments for disputed and pending findings with new evidence, posting the Review Summary on first-review, posting the delta reply on re-review, and posting the completion marker. Used by first-review and re-review modes. + +### Compact sub-agent output contract + +Review aspect agents (`pr-review-toolkit:code-reviewer`, `silent-failure-hunter`, etc.) are instructed via the orchestrator's prompt to return findings as a structured list. Each finding carries: severity, file path, start line, end line, title (one line), and body (one paragraph — the text posted as the ADO comment). No prose reasoning, no code quotes in the return value. This guidance is in the orchestrator's prompt only; the toolkit agent definitions are not modified. + +### pr-review-toolkit as read-only dependency + +No files in `pr-review-toolkit` are created or modified. All new agents live in the `pr-review` plugin's own `.agents/` directory. + +### Re-review module ownership + +The four Node.js modules in `scripts/re-review/` remain in the plugin. Lifting them to `pr-review-toolkit` as a shared library is deferred until a second write-back platform (GitHub) is built, at which point a canonical thread shape can be defined from real constraints. This is documented in ADR 0013 (`apps/claude-code/pr-review/docs/adr/0013-orchestrator-split-for-review-pr.md`). + +### Doc Context integration + +The Doc Context Orchestrator agent and its pipeline (ADO Fetcher fetches work-item IDs, Orchestrator spawns sub-agents, Synthesizer produces `DOC_CONTEXT`) are unchanged. The ADO Fetcher agent absorbs the work-item ID fetch that currently lives inline in Step 4a. + +## Testing Decisions + +### What makes a good test + +Tests assert the external behaviour of each module given controlled inputs — no implementation detail inspection, no internal branching tests. Inputs are plain JavaScript objects or JSON fixtures. A test reads as a sentence: "given a findings list with two items, the writer posts two inline threads." + +### Modules under test + +The four existing re-review modules (`detect-prior-review`, `classify-thread`, `match-finding`, `parse-signature`) already have a test suite and must continue passing unchanged. No new unit tests are required for the three new agents — their behaviour is best verified by integration against a real ADO PR (smoke test). If a new pure function is extracted during the refactor (e.g. mode detection logic), a unit test for that function is appropriate. + +### Prior art + +The existing test structure mirrors `packages/release-tools/scripts/verify-changelog.test.mjs` and `bump-version.test.mjs` — `node:test` built-in, no external deps, fixtures as imported JSON, assertions via `node:assert/strict`. + +## Out of Scope + +- GitHub write-back support (separate future feature). +- Normalising re-review modules to a canonical cross-platform thread shape (deferred + until GitHub write-back is built — see ADR 0013). +- Changes to `pr-review-toolkit` agent definitions. +- Token-budget monitoring or automatic truncation of large diffs. +- Any change to the Bot Signature format or detection prefix. +- Changes to the four re-review Node.js module interfaces. +- Automated performance benchmarking of parent context token usage. + +## Further Notes + +**ADR 0013** (`apps/claude-code/pr-review/docs/adr/0013-orchestrator-split-for-review-pr.md`) records the full rationale and alternatives considered for this decision. + +**CONTEXT.md** has already been updated with the three operating modes, three orchestration agent terms, and their relationships. + +--- + +## Agent Brief + +> _This was generated by AI during triage._ + +**Category:** enhancement +**Summary:** Refactor the `review-pr` command into a thin orchestrator that delegates to three focused agents — ADO Fetcher, Re-review Coordinator, and ADO Writer — and add a pre-PR operating mode. + +**Current behavior:** +`review-pr.md` is a ~1000-line monolith that handles orchestration, ADO platform integration, and re-review state management in a single command file. Every invocation loads the full file into context, and parallel review-agent results flowing back push average PR reviews past 100 K parent-context tokens. There is no mode for reviewing code before a PR exists. + +**Desired behavior:** +`review-pr.md` becomes a thin orchestrator of approximately 200 lines. On startup it detects one of three operating modes: + +- **Pre-PR mode** (no PR URL): diffs the local branch, runs review aspect agents from `pr-review-toolkit`, and presents findings in the Claude interface. No ADO calls are made. +- **First-review mode** (PR URL, no prior Bot Signature detected): delegates ADO reads to the ADO Fetcher agent, runs review aspect agents, delegates all ADO writes to the ADO Writer agent. +- **Re-review mode** (PR URL, prior Bot Signature detected): same as first-review, but additionally invokes the Re-review Coordinator agent to handle prior-thread classification, finding matching, and reply posting to classified threads before the ADO Writer runs. + +Each of the three new agents lives in the plugin's own `.agents/` directory. `pr-review-toolkit` is not modified (it is a read-only dependency). The four existing re-review Node.js modules (`detect-prior-review`, `classify-thread`, `match-finding`, `parse-signature`) remain in the plugin's `scripts/re-review/` directory and are called from the Re-review Coordinator agent. + +Review aspect agents are instructed via the review-agent launch step in the orchestrator to return compact structured findings (severity, file path, start line, end line, one-line title, one-paragraph body) rather than prose with embedded code quotes. This guidance lives in the orchestrator prompt only. + +**Key interfaces:** + +- `review-pr` command orchestrator — validates prerequisites, detects mode within first ~50 lines, delegates entirely; carries no ADO shell commands +- ADO Fetcher agent — accepts org URL, project, PR ID, and optional prior iteration ID (re-review only); returns a structured context block: PR metadata, latest iteration ID, changed files list, raw diff, and work-item IDs for Doc Context +- Re-review Coordinator agent — receives the ADO Fetcher context and prior-threads data; produces classified thread list and executes reply/resolution actions; delegates to `detect-prior-review`, `classify-thread`, and `match-finding` modules; returns `{ addressed, disputed, pending, obsolete, freshFindings[], earlyExit }` — when `earlyExit: true`, the orchestrator skips the ADO Writer entirely +- ADO Writer agent — receives the findings list and PR context; posts all Inline Comment threads, patches thread statuses, posts the Review Summary or delta reply, posts the completion marker +- Compact finding schema: `{ severity, filePath, startLine, endLine, title, body }` +- Bot Signature constant: `🤖 *Reviewed by Claude Code*` prefix — must remain unchanged + +**Acceptance criteria:** + +- [ ] The `review-pr` command file is ≤ 200 lines and contains no `az devops invoke` calls +- [ ] Running the command without a URL enters Pre-PR mode; findings appear in the Claude interface; no ADO threads are posted +- [ ] Running with a URL where no prior Bot Signature exists enters First-review mode and posts a full Review Summary and Inline Comments to ADO +- [ ] Running with a URL where prior Bot Signature exists enters Re-review mode; the Re-review Coordinator correctly classifies threads and posts replies +- [ ] The orchestrator logs the detected mode (Pre-PR / First-review / Re-review) before delegating +- [ ] The four existing re-review module unit tests pass unchanged after the refactor +- [ ] The ADO Fetcher completes before the Doc Context Orchestrator is launched; the Doc Context Orchestrator and review aspect agents then run concurrently with each other (no wall-clock regression for first-review) +- [ ] The Bot Signature format and detection prefix are unchanged +- [ ] `pnpm test` passes; `pnpm format` produces no diff + +**Out of scope:** + +- GitHub write-back support +- Normalising re-review modules to a canonical cross-platform shape (deferred per ADR 0013) +- Any changes to `pr-review-toolkit` agent definitions +- Token-budget monitoring or automatic diff truncation +- Changing the Bot Signature format or detection prefix +- Changing the four re-review Node.js module interfaces diff --git a/docs/issues/pr-review-platform-failure-handling/PRD.md b/docs/issues/pr-review-platform-failure-handling/PRD.md new file mode 100644 index 0000000..57810d5 --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/PRD.md @@ -0,0 +1,202 @@ +# PRD: pr-review — platform-failure handling + +**Status:** needs-triage +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Depends on:** `docs/issues/pr-review-ado-fetcher-reliability/PRD.md` (PRD A — must land first) + +--- + +## Problem Statement + +After PRD A lands, the four-tier Notice doctrine is established and the ADO Fetcher is correct, but the rest of the plugin still swallows platform failures in three places. The Re-review Coordinator silently downgrades to first-review mode when its match-finding helper throws (duplicating every prior thread); its PATCH-to-fixed call has a catch-all that only special-cases HTTP 409 (silently letting 401/403/5xx through as 200-char "warnings" no one reads). The ADO Writer's inline POST path captures stderr to `*.err` files but only logs them on cleanup — the actual failure text never reaches the user, and 401/403 are treated like recoverable per-finding failures. Pre-PR mode's diff parser returns `[]` for both empty diffs and malformed inputs, so a broken pipeline looks like a clean Review. The default-branch fallback chain still hardcodes `main` even though most Unic projects use Gitflow. + +These are not isolated bugs — they are the same doctrine PRD A established, not yet applied across the remaining surfaces. PRD B finishes the job by routing every ADO write call site through PRD A's canonical helpers, extending the Coordinator's classification helpers to honour `DIFF_RANGE`, and giving Pre-PR mode the same Notice surface that the ADO modes get. + +## Solution + +Apply the Notice Tier doctrine + helper-layer doctrine (both from PRD A) to the Re-review Coordinator, ADO Writer, and Pre-PR mode. The shared helpers (`classify-http-error`, `notices`, `parse-write-response`) own all classification; the agent prompts shrink to "call the helper, branch on `result.ok`". + +The Coordinator consumes the `DIFF_RANGE` sentinel PRD A emits: when `full`, the existing `classify-thread` helper downgrades verdicts that depend on diff position (`addressed`, `obsolete`) to the safer `pending`; `disputed` is unaffected. A DEGRADED Notice surfaces the downgrade. The Coordinator's per-finding match call wraps `match-finding` in try/catch and emits a DEGRADED Notice on throw, instead of silently treating the parse failure as "no match" and duplicating the thread. PATCH-to-fixed routes every response through the canonical HTTP-tier mapping — 401/403 abort the whole re-review, 5xx/network/other-4xx emit per-thread DEGRADED Notices. + +The ADO Writer routes every `az devops invoke` POST/PATCH through `parse-write-response` (a new pure helper composing PRD A's `classify-http-error` with response-`id` parsing). The H1 path (inline POST) inherits the canonical mapping retroactively — auth failures no longer log-and-continue, they abort. `*.err` files stream their content to stderr at the moment of failure (so the failure text is adjacent to the Notice that references it), then are unconditionally cleaned up. The `parseAdoWriterResult` helper is refactored to the discriminated-union shape, so the orchestrator can distinguish a missing result block (Writer crashed before printing) from a parsed-with-zero-findings outcome. + +Pre-PR mode gets the same Notice surface that PRD A wired for ADO modes. `parseChangedFilesFromDiff` detects a suspicious shape (non-empty input with `diff --git` headers but zero parsed paths) and emits a DEGRADED Notice via `buildPrePrContext`. The default-branch detection becomes a fallback chain (`git remote show origin` → `origin/develop` → `origin/main` → `origin/master` → ABORTED) implemented in a pure helper `scripts/pre-pr/detect-default-branch.mjs`, with a Notice that names the actually-used branch when any fallback level fires. + +## User Stories + +1. As a PR reviewer in re-review mode, I want the bot to never silently re-post a thread it already opened on a prior iteration, so that my PR's thread list does not accumulate duplicates. +2. As a PR reviewer, I want any HTTP 401 / 403 error from Azure DevOps during write-back to abort the Review with a clear stderr message naming `az devops login` as the remedy, so that the run does not silently complete with most threads missing. +3. As a PR reviewer, I want a per-thread DEGRADED Notice in the Review Summary listing every thread the bot tried to mark as fixed but couldn't (because of a 5xx or network blip), so that I can manually mark them fixed if appropriate. +4. As a PR reviewer in re-review mode with no incremental diff available, I want prior threads that would have been classified `addressed` or `obsolete` to instead be classified `pending`, so that I am never told a comment is resolved when the bot wasn't actually able to verify it. +5. As a developer running Pre-PR mode in a Gitflow-style project, I want the default-branch detection to try `origin/develop` before `origin/main`, so that the local diff is computed against the actual integration branch most of the time. +6. As a developer running Pre-PR mode, I want a Notice telling me which branch the bot diffed against when default-branch detection fell back, so that I can spot the case where it picked the wrong branch. +7. As a Plugin maintainer, I want the ADO Writer's existing H1 inline-POST path (which today logs auth failures and continues) to inherit the canonical HTTP-tier mapping introduced in PRD A, so that 401/403 abort the writer consistently with every other ADO write. +8. As a Plugin maintainer, I want `*.err` file contents to be visible at the moment of failure, not buried in a cleanup step, so that diagnosing a partial-success run does not require reaching for temp files that may have been deleted. +9. As a Plugin maintainer, I want the `parseAdoWriterResult` helper to distinguish "result block missing" (Writer crashed mid-run) from "result block parsed with zero findings" (legitimate zero outcome), so that the orchestrator can fail loud on the first case instead of silently reporting success. +10. As a PR reviewer, I want a Notice telling me when Pre-PR mode's diff parser detected `diff --git` headers but produced zero file paths, so that I can tell the "no files changed" message apart from "the pipeline broke". +11. As a Plugin maintainer, I want the Coordinator's match-finding error path to emit a DEGRADED Notice (`kind: thread-match`) when the helper throws on a parse error, so that the reviewer sees one warning instead of one silent duplicate posting. +12. As a developer reading the codebase, I want every ADO write call site (inline POST, summary POST, delta reply, completion marker, PATCH-to-fixed) to route through one shared helper, so that adding a new write call type in the future inherits the same HTTP-tier mapping for free. +13. As a Plugin maintainer, I want the existing classify-thread and match-finding tests to be extended with the new branches (diffRange parameter; throw on parse error), so that the new behaviour is verified at the helper boundary even though the agent prompts are not unit-tested. +14. As a developer running re-review mode, I want the partial-run check from H4 (already landed in PR #29) to keep its exit-code contract (`0` = found, `1` = not found, `2` = crash); PRD B does not modify that path. + +## Implementation Decisions + +### Foundation (from PRD A) + +All shared helpers (`scripts/ado/classify-http-error.mjs`, `scripts/ado/notices.mjs`) and the Notice flow + Trailer printing in the orchestrator are assumed in place. PRD B only adds consumers and one new shared helper (`parse-write-response.mjs`). The Notice tier doctrine, the canonical HTTP-tier mapping, the four-state classification, and the no-fifth-ASK-tier rule are all documented in PRD A's ADRs (0014, 0015) and the in-place ADR 0004 amendment. + +### New helpers + +- **`scripts/ado/parse-write-response.mjs`** — pure function composing PRD A's `classify-http-error` with response-`id` parsing. Returns `{ ok: true, id } | { ok: false, tier, kind, message }`. Consumed by every ADO write call site (inline POST, threadContext fallback, summary POST, delta reply, completion marker, PATCH-to-fixed). One shape, one classifier. +- **`scripts/pre-pr/detect-default-branch.mjs`** — pure function over an injectable `branchExists(name) → bool` tester and a `remoteHeadBranch` argument (a string parsed by the bash side from the `HEAD branch:` line of `git remote show origin` output, or empty string on failure). Walks the fallback chain `remoteHeadBranch` → `origin/develop` → `origin/main` → `origin/master` → `{ branch: null }`. Returns `{ branch, source: 'remote-show' | 'develop-fallback' | 'main-fallback' | 'master-fallback' | 'none', notice?: Notice }`. The bash side parses `HEAD branch:` from `git remote show origin 2>/dev/null` and wires the `branchExists` tester to `git rev-parse --verify --quiet`. ABORTED when all four fallback levels fail. + +### Modified helpers + +- **`scripts/re-review/classify-thread.mjs`** — adds a `diffRange: 'full' | 'incremental'` parameter. When `diffRange === 'full'`, outputs that would be `addressed` or `obsolete` are remapped to `pending`; `disputed` is unaffected. Default `diffRange === 'incremental'` preserves today's behaviour. Single new branch, ~3 lines. +- **`scripts/re-review/match-finding.mjs`** — today returns `null` on no match. New contract: `null` continues to mean "legitimate no-match"; a thrown `Error` distinguishes a parse failure in the input. The Coordinator's per-finding call wraps in try/catch. +- **`scripts/ado-writer.mjs` (`parseAdoWriterResult`)** — discriminated-union refactor: `{ ok: true, summaryThreadId, findingsPosted } | { ok: false, reason: 'missing-block' | 'malformed' }`. Subsumes today's `null` return. +- **`scripts/pre-pr.mjs` (`buildPrePrContext`, `parseChangedFilesFromDiff`)** — `buildPrePrContext` return shape extends to `{ rawDiff, changedFiles, filteredFiles, notices: Notice[] }`. `parseChangedFilesFromDiff` detects suspicious shape (non-empty input with ≥ 1 `diff --git` header but zero parsed paths) and emits a DEGRADED Notice (`kind: diff-parse`). + +### Agent and orchestrator changes + +- **`.agents/re-review-coordinator.md`**: + - Consume `DIFF_RANGE` from `ADO_FETCHER_RESULT`; pass it to `classify-thread`. + - Wrap per-finding `match-finding` call in try/catch; on throw, push a DEGRADED Notice and continue to the next finding (do NOT add the unclassified prior thread to `freshFindings` — let it fall through naturally to a duplicate posting, but with a Notice surfacing the cause). + - Route PATCH-to-fixed responses through `parse-write-response`. Tier `aborted` → exit non-zero with the abort kind. Tier `degraded` → push a Notice (`kind: patch-to-fixed`) and continue to the next thread. + - Emit `NOTICES` array in `RE_REVIEW_COORDINATOR_RESULT_START/END` for the orchestrator to merge. +- **`.agents/ado-writer.md`**: + - Route every `az devops invoke` POST/PATCH (inline POST, threadContext-fallback, summary POST, delta reply, completion marker) through `parse-write-response`. + - H1 retroactive fix: the inline POST path inherits the canonical mapping — 401/403 abort the writer immediately, 5xx/network/other-4xx push a `warning` Notice and continue. + - Stream the `*.err` file content to stderr at the moment of failure, then unconditional `rm -f` in cleanup. No conditional retention. + - Emit `NOTICES` array in `ADO_WRITER_RESULT_START/END`. +- **`commands/review-pr.md` (Pre-PR mode)**: + - Wire `detect-default-branch.mjs` (via the existing helper-import pattern). On `branch: null`, abort with stderr message and Trailer aborted line. + - Use `buildPrePrContext().notices` to prepend the pre-findings Notices block in the Claude interface. + - Trailer line includes Pre-PR notice counts (already mandatory per PRD A). + - 200-line cap preserved. + +### Test-scope choice + +The user chose "NEW deep modules only" in the test-scope question during the grilling session. Reconciling that choice with the slice acceptance criteria: + +- **New deep helpers** (`parse-write-response`, `detect-default-branch`) receive full unit-test coverage — every documented return-shape branch is asserted. +- **MODIFY helpers** (`classify-thread`, `match-finding`, `parseAdoWriterResult`, `pre-pr.mjs`) receive **minimal branch-verification** test cases — 2–3 cases per helper, each pinning a single new branch introduced by this PRD (the `diffRange='full'` downgrade, the throw-on-parse-error path, the `{ ok: false, reason }` variants, the suspicious-shape Notice). No full new test suites; the existing fixture conventions in those files are reused. +- **Agent prompt content** (`.agents/*.md`, `commands/review-pr.md`) gets no new tests of any kind. End-to-end behaviour change is verified by the integration smoke test against a real ADO PR, per ADR 0013's stated testing posture. + +The intent of the user's "NEW deep modules only" choice was to avoid writing wholly new test files and full coverage for MODIFY helpers; the minimal branch-verification cases above are necessary to confirm the new behaviour landed and do not constitute new test suites. + +## Testing Decisions + +### What makes a good test + +Same as PRD A: tests assert the external behaviour of each helper given controlled inputs. Plain JS object or short JSON fixtures, sentence-shaped test names, `node:test` + `node:assert/strict`, no external deps. + +### Modules under test + +**New deep helpers (full unit-test coverage):** + +- `scripts/ado/parse-write-response.mjs` — happy path (`{ id: 12345 }` response), 401 → `{ ok: false, tier: 'aborted', kind: 'auth' }`, 5xx → `{ ok: false, tier: 'degraded' }`, 404 → `{ ok: true }` (domain-OK), 409 → `{ ok: true }`, malformed JSON body, network exit-code path, missing `id` field on otherwise-200 response. +- `scripts/pre-pr/detect-default-branch.mjs` — non-empty `remoteHeadBranch` (e.g. `'develop'`) → no fallback Notice, empty `remoteHeadBranch` and `branchExists('develop')=true` → `develop-fallback` with Notice, empty `remoteHeadBranch` and only `main` exists → `main-fallback` with Notice, only `master` exists → `master-fallback` with Notice, nothing exists → `{ branch: null, source: 'none' }` (no notice — Trailer carries the abort), `branchExists` thrown exception → propagated. + +### MODIFY helpers — minimal branch-verification cases + +Each of these receives 2–3 new test cases in its existing test file, pinning the new branch introduced by this PRD. No full new coverage. + +- `classify-thread.mjs` — `diffRange='full'` downgrade of `addressed`/`obsolete` → `pending`; `disputed` unaffected (3 cases). +- `match-finding.mjs` — legitimate no-match still returns `null`; malformed input throws (3 cases). +- `parseAdoWriterResult` — `{ ok: true, ... }` for valid block; `{ ok: false, reason: 'missing-block' }`; `{ ok: false, reason: 'malformed' }` (3 cases). +- `pre-pr.mjs` — suspicious-shape detection emits a `kind: diff-parse` Notice through `buildPrePrContext` (2 cases). + +### Modules NOT under test in PRD B + +- All agent prompt content (`.agents/*.md`, `commands/review-pr.md`) — verified end-to-end by the integration smoke test against a real ADO PR, per ADR 0013. + +### Prior art + +Same as PRD A: `packages/release-tools/scripts/verify-changelog.test.mjs`, `bump-version.test.mjs`, `apps/claude-code/pr-review/tests/parse-diff-hunks.test.mjs`. No external deps, no spawnSync, fixtures as inline JS objects. + +## Out of Scope + +- Anything PRD A delivers (helper layer, canonical HTTP mapping, ADRs, Fetcher fixes, orchestrator Notice merging + Trailer). +- **Full new test suites** for MODIFY-only helpers — only the minimal 2–3 branch-verification cases listed in Testing Decisions are in scope. +- Unit tests for agent prompt content. +- Retries on transient HTTP errors. +- The integration smoke test (manual, post-merge). +- A canonical thread shape spanning ADO and GitHub — deferred per ADR 0013. +- Changes to the four pre-existing re-review modules' interfaces (`detect-prior-review`, `parse-signature`) — only `classify-thread` and `match-finding` are modified, and only additively (new parameter / new throw path). +- Pre-PR mode informational notices for inherently-empty states beyond the Doc Context family — PRD A already capped that. + +## Further Notes + +**Dependency on PRD A:** PRD B cannot land before PRD A. The helper imports (`classify-http-error`, `notices`, `formatTrailer`), the orchestrator's Notice-merge step, the ADO Writer's `## Notices` block rendering, and the Trailer line are all PRD A deliverables that PRD B's new consumers and modified call sites rely on. The two PRDs ship together as a coherent "platform-failure handling" feature; PRD A is the foundation, PRD B is the rollout. + +**Inbox file removal:** the originating `docs/inbox/pr-review-ado-error-hardening-pass.md` is deleted once PRD A and PRD B are published (per the inbox graduation flow documented in `docs/inbox/README.md`). + +**Source:** same grilling session as PRD A. See PRD A's "Further Notes" for the doctrine, ADR cross-references, and `CONTEXT.md` term additions. + +--- + +## Agent Brief + +> _This was generated by AI during triage._ + +**Category:** enhancement +**Summary:** Apply PRD A's four-tier Notice doctrine + helper-layer architecture to the remaining surfaces: Re-review Coordinator, ADO Writer (every write call site, including H1 retroactively), and Pre-PR mode. Adds two new deep helpers (`parse-write-response`, `detect-default-branch`), extends two re-review classifier helpers (`classify-thread` gets a `diffRange` parameter for γ-downgrade; `match-finding` throws instead of returning null on parse error), and refactors `parseAdoWriterResult` to the discriminated-union shape. Default-branch detection becomes a Gitflow-aware fallback chain (`develop` → `main` → `master`) with a Notice naming the actually-used branch. + +**Current behavior (after PRD A lands):** + +- Coordinator's per-finding `match-finding` call falls back to "no match" on Node parse error, silently duplicating prior threads. +- Coordinator's PATCH-to-fixed catch-all only special-cases HTTP 409; auth, 5xx, and network failures become 200-char `process.stdout.write` warnings that nothing reads. Threads stay open, the user is not told. +- ADO Writer's inline POST path (H1, from PR #29) treats 401/403 as recoverable per-finding failures — every subsequent inline POST in the same run also fails, but the user only sees "N findings posted" with N=0 or partial. +- ADO Writer's `*.err` files are unconditionally cleaned up at the end, destroying the only diagnostic for partial-success runs. +- `parseAdoWriterResult` returns `null` for both "Writer never printed a result block" and "Writer parsed but block was malformed", conflating crash with empty-success. +- Pre-PR `parseChangedFilesFromDiff` returns `[]` for both empty input and `diff --git`-bearing input that fails to parse — broken pipelines look like clean reviews. +- Pre-PR default-branch detection hardcodes `main` as the fallback, computing the diff against the wrong base on every Gitflow project. +- Coordinator and Writer ignore the new `DIFF_RANGE` sentinel PRD A emits. + +**Desired behavior:** + +- All five ADO write call sites in the Writer and Coordinator route through `parse-write-response` (composing PRD A's `classify-http-error` with response-`id` parsing). One canonical HTTP-tier mapping across the plugin. +- 401/403 anywhere in a Writer or Coordinator run aborts that run with a single stderr message + Trailer aborted line. +- Every per-thread / per-finding write failure that the canonical mapping classifies as DEGRADED pushes a `warning` Notice (`kind` = `inline-post` / `summary-post` / `patch-to-fixed`) that the orchestrator merges and the Writer renders in the Summary. +- Coordinator consumes `DIFF_RANGE: full | incremental`. When `full`, `classify-thread` downgrades `addressed` / `obsolete` outputs to `pending`; `disputed` is unaffected; a DEGRADED Notice (`kind: diff-range` is emitted by the Fetcher, so the Coordinator only consumes — the Notice is already in the merged array). +- Coordinator `match-finding` calls wrap in try/catch; on throw, push DEGRADED Notice (`kind: thread-match`) and let the finding fall through naturally — the reviewer sees one duplicate-and-Notice instead of one silent duplicate. +- Writer streams `*.err` content to stderr at the moment of failure; unconditional cleanup follows. +- `parseAdoWriterResult` returns the discriminated union; orchestrator fails-loud on `{ ok: false, reason: 'missing-block' }`. +- Pre-PR `parseChangedFilesFromDiff` detects suspicious-shape and emits a DEGRADED Notice (`kind: diff-parse`); `buildPrePrContext` returns the Notice array. +- Pre-PR `detect-default-branch.mjs` walks the parsed `HEAD branch:` line from `git remote show origin` → `origin/develop` → `origin/main` → `origin/master`; emits a Notice naming the actually-used branch; aborts when none exists. + +**Key interfaces:** + +- `parseWriteResponse({ httpExit, responseText, errStream }) → { ok: true, id } | { ok: false, tier, kind, message }`. +- `detectDefaultBranch({ branchExists }) → { branch, source, notice? }`. +- `classifyThread({ ..., diffRange }) → { classification }` — new optional parameter with default `'incremental'`. +- `matchFinding(...) → { classification, threadId } | null` — now throws on parse error. +- `parseAdoWriterResult(...) → { ok: true, summaryThreadId, findingsPosted } | { ok: false, reason }`. +- `buildPrePrContext(rawDiff) → { rawDiff, changedFiles, filteredFiles, notices: Notice[] }`. + +**Acceptance criteria:** + +- [ ] PRD A is merged before PRD B starts. +- [ ] Every `az devops invoke` POST/PATCH in `.agents/ado-writer.md` and `.agents/re-review-coordinator.md` is routed through `parse-write-response`. +- [ ] 401 or 403 from any Writer or Coordinator HTTP call aborts the run with a clear stderr message and a Trailer aborted line. +- [ ] 5xx / network / other-4xx from any write call emits a DEGRADED Notice and continues; the Notice appears in the Review Summary. +- [ ] `classify-thread` accepts a `diffRange` parameter; when `'full'`, `addressed` / `obsolete` are remapped to `pending`; `disputed` unaffected. +- [ ] `match-finding` throws on parse error; the Coordinator's call site catches the throw and emits a DEGRADED Notice (`kind: thread-match`). +- [ ] `parseAdoWriterResult` returns a discriminated union; the orchestrator surfaces `{ ok: false, reason: 'missing-block' }` as an ABORTED run. +- [ ] `buildPrePrContext` returns a `notices: Notice[]` field; suspicious-shape diffs emit a DEGRADED Notice (`kind: diff-parse`). +- [ ] `detect-default-branch.mjs` exists, has unit tests covering the four fallback levels + the abort case, and the orchestrator wires it via injectable `branchExists` plus a `remoteHeadBranch` argument parsed from `git remote show origin`'s `HEAD branch:` line. +- [ ] Pre-PR mode aborts with a clear stderr message when none of `develop`, `main`, `master` exist. +- [ ] `*.err` content is streamed to stderr at the moment of failure; cleanup is unconditional. +- [ ] `commands/review-pr.md` remains ≤ 200 lines. +- [ ] `pnpm test` passes; `pnpm format` produces no diff; `pnpm check` reports zero warnings. +- [ ] `docs/inbox/pr-review-ado-error-hardening-pass.md` is removed. + +**Out of scope:** + +- Anything PRD A delivers. +- Retries on transient HTTP errors. +- Integration smoke test (manual, post-merge). +- Lifting helpers to `pr-review-toolkit`. +- Full new test suites for MODIFY helpers — only minimal branch-verification cases (per PRD Testing Decisions) are in scope. diff --git a/docs/issues/pr-review-platform-failure-handling/done/01-writer-http-tier-mapping.md b/docs/issues/pr-review-platform-failure-handling/done/01-writer-http-tier-mapping.md new file mode 100644 index 0000000..bfa411e --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/done/01-writer-http-tier-mapping.md @@ -0,0 +1,62 @@ +# B1. `parse-write-response` helper + ADO Writer applies HTTP-tier mapping to all writes + `*.err` streaming + +**Status:** resolved +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-platform-failure-handling/PRD.md` + +## What to build + +Route every ADO write call site in the ADO Writer through one canonical helper. Apply the HTTP-tier mapping consistently. Fix the H1 retroactive auth gap and the `*.err` retention policy. + +Implementation cuts through every layer: + +- **New helper** `scripts/ado/parse-write-response.mjs` — pure function `({ httpExit, responseText, errStream }) → { ok: true, id } | { ok: false, tier, kind, message }`. Composes `classify-http-error` (from A2) with response-`id` parsing. Used by every ADO write call site. With unit tests covering happy path, 200/201 with valid `id`, 401, 403, 404, 409, 5xx, network exit-code, malformed JSON body, missing `id` field on 200 response. +- **ADO Writer prompt** — every `az devops invoke` POST/PATCH call site routed through the new helper: + - inline POST (Step 1) — including the threadContext-fallback path + - summary POST (Step 2 first-review) + - delta reply POST (Step 2 re-review) + - completion marker POST (Step 3) +- **Tier handling per call site:** + - `ok: true` → record the `id`, increment counters, continue (today's H1 behaviour, now formalised through the helper). + - `ok: false, tier: 'aborted'` (401/403) → emit stderr message ("ERROR: . Try `az devops login` to re-authenticate.") and exit non-zero. The orchestrator surfaces the abort in the Trailer. + - `ok: false, tier: 'degraded'` (5xx/network/4xx) → push a Notice (`kind: inline-post | summary-post | patch-to-fixed`-equivalent for delta/completion marker) to the Writer's `NOTICES` array, continue to next call site. +- **`*.err` retention policy** — at the moment of failure, stream the contents of the per-call `*.err` file to the Writer's stderr (so the failure text is adjacent to the Notice that references it). Cleanup step at the end of the Writer is unconditional — no retention based on counts. +- **Writer result block** — `ADO_WRITER_RESULT_START/END` gains a `NOTICES: [...]` array so the orchestrator can merge Writer-emitted notices with Fetcher-emitted notices for the Summary. +- **Orchestrator** — merges the Writer's `NOTICES` into the combined array passed to subsequent rendering steps if needed; the Trailer notice counts already reflect all merged notices per A1. +- **CHANGELOG** — `[Unreleased]` Added entry for `parse-write-response.mjs`; Changed entries for the Writer call sites; Fixed entry retroactively covering the H1 inline-POST auth gap and the `*.err` streaming policy. + +End-to-end demoable: invoke `/pr-review:review-pr` against a PR while the local `az devops login` token is revoked. The Claude interface ends with `❌ Review aborted: auth — ` after the first failing write. Restore auth, simulate a 5xx (e.g. malformed REPO_ID), and the Summary renders `## Notices` with `⚠ inline-post: Failed to post inline comment at /src/foo.ts:42 (HTTP 503).` plus the `*.err` content visible in stderr above the Notice. + +## Acceptance criteria + +- [x] `scripts/ado/parse-write-response.mjs` exists with full unit-test coverage (≥ 10 cases). +- [x] Every `az devops invoke` POST/PATCH in `.agents/ado-writer.md` routes through the new helper. +- [x] 401 or 403 from any write call aborts the Writer with a clear stderr message; the orchestrator's Trailer line reads `❌ Review aborted: auth — ...`. +- [x] 5xx, network, and other-4xx from any write call emits a DEGRADED Notice; the Writer continues to the next call site. +- [x] `*.err` file content is streamed to stderr at the moment of failure; cleanup at the end is unconditional. +- [x] `ADO_WRITER_RESULT_START/END` emits a `NOTICES` array. +- [x] The H1 inline-POST path (from PR #29) inherits the canonical mapping — auth failures no longer log-and-continue. +- [x] `commands/review-pr.md` is ≤ 200 lines. +- [x] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/02-classify-http-error-and-work-items.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q7 canonical HTTP-tier mapping applied to every Writer call site; Q8(b) `*.err` retention policy (stream-to-stderr at moment of failure, unconditional cleanup, rejected the conditional-retention alternative). H1's per-finding LOG-AND-CONTINUE pattern (from PR #29) is preserved for the per-thread cases but extended with the 401/403 → ABORT escalation — the inbox flagged this consistency gap and grilling confirmed the escalation. No outstanding questions. + +## Deviations + +- Version bumped manually (unic-bump requires a clean working tree; sandbox does not support committing mid-step). `plugin.json` and `marketplace.json` updated by hand to 1.2.3. +- The completion marker POST (Step 3) in `ado-writer.md` is described with inline comments rather than full bash code, as it follows the identical `parseWriteResponse` pattern shown earlier in the same prompt. The agent can fill in the details from the established pattern. diff --git a/docs/issues/pr-review-platform-failure-handling/done/02-parse-ado-writer-result-discriminated-union.md b/docs/issues/pr-review-platform-failure-handling/done/02-parse-ado-writer-result-discriminated-union.md new file mode 100644 index 0000000..7db4784 --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/done/02-parse-ado-writer-result-discriminated-union.md @@ -0,0 +1,46 @@ +# B2. `parseAdoWriterResult` discriminated-union refactor + +**Status:** resolved +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-platform-failure-handling/PRD.md` + +## What to build + +Refactor `parseAdoWriterResult` to a discriminated-union return shape so the orchestrator can distinguish "Writer crashed before printing its result block" from "Writer parsed successfully with zero findings posted." + +Implementation cuts through every layer: + +- **Refactor** `scripts/ado-writer.mjs` (`parseAdoWriterResult`) — return type becomes `{ ok: true, summaryThreadId, findingsPosted } | { ok: false, reason: 'missing-block' | 'malformed' }`. Today's `null` return is subsumed: missing `ADO_WRITER_RESULT_START` / `_END` → `{ ok: false, reason: 'missing-block' }`; present block with garbage inside → `{ ok: false, reason: 'malformed' }`; valid block → `{ ok: true, ... }`. +- **Update existing tests** — the existing `ado-writer.test.mjs` cases that asserted `null` are updated to the new return shape. Add cases for the two `ok: false` reasons explicitly. +- **Orchestrator** — `commands/review-pr.md` parses the Writer's result block via the refactored helper. On `{ ok: false }`, the orchestrator emits a stderr abort message ("ERROR: Writer did not return a valid result block (). The Summary may or may not have been posted; verify on ADO.") and prints a Trailer aborted line. On `{ ok: true }`, behaviour unchanged. +- **ADO Writer prompt** — the existing round-trip validation step (added in PR #29's H2 fix) is updated to assert against the new `{ ok: true }` shape rather than the old "non-null" shape. +- **CHANGELOG** — `[Unreleased]` Changed entry for the helper API; Fixed entry covering the silent-success bug on Writer crash. + +End-to-end demoable: inject a fault that crashes the Writer mid-Summary-post (e.g. corrupt `LATEST_ITERATION_ID` env var). The Claude interface ends with `❌ Review aborted: writer-missing-block — Writer did not return a valid result block. The Summary may or may not have been posted; verify on ADO.` instead of the misleading success Trailer that would have appeared with the old `null` behaviour. + +## Acceptance criteria + +- [ ] `parseAdoWriterResult` returns the discriminated-union shape; existing test file's `null` assertions are migrated to `{ ok: false }` assertions. +- [ ] At least two new test cases cover `reason: 'missing-block'` and `reason: 'malformed'`. +- [ ] Orchestrator branches on `result.ok` and emits an abort message + Trailer aborted line on `{ ok: false }`. +- [ ] ADO Writer prompt's round-trip validation step works against the new shape. +- [ ] No code path in the orchestrator relies on the old `null` return. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q8 (mechanical batch — discriminated-union refactor distinguishes Writer-crash from zero-success; `null` return was conflating the two cases). Verified breaking-change-free: zero consumers outside `apps/claude-code/pr-review/`. No outstanding questions. diff --git a/docs/issues/pr-review-platform-failure-handling/done/03-coordinator-diff-range-gamma-downgrade.md b/docs/issues/pr-review-platform-failure-handling/done/03-coordinator-diff-range-gamma-downgrade.md new file mode 100644 index 0000000..db70b40 --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/done/03-coordinator-diff-range-gamma-downgrade.md @@ -0,0 +1,46 @@ +# B3. Coordinator consumes `DIFF_RANGE` → γ-downgrade in `classify-thread` + +**Status:** resolved +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-platform-failure-handling/PRD.md` + +## What to build + +Extend `classify-thread` with the γ-downgrade rule and have the Re-review Coordinator pass the `DIFF_RANGE` sentinel (emitted by A4) into every thread classification call. + +Implementation cuts through every layer: + +- **`scripts/re-review/classify-thread.mjs`** — adds a `diffRange: 'full' | 'incremental'` parameter (default `'incremental'`, preserving today's behaviour). When `diffRange === 'full'`, the function remaps `addressed` → `pending` and `obsolete` → `pending`; `disputed` is unaffected (its derivation is reviewer-reply-based, not diff-position-based). The downgrade is a single new branch at the end of the existing classification flow. +- **Existing tests** — `scripts/re-review/classify-thread.test.mjs` gets new cases (the user-confirmed test scope for PRD B is "NEW deep modules only", but this is a behaviour change to a MODIFY module that ships with the slice — the new cases are minimal additions, not full new test files). +- **Re-review Coordinator prompt** — parses `DIFF_RANGE` from `ADO_FETCHER_RESULT` (which A4 already emits). Threads the value into every `classify-thread` invocation in Step 5 of the Coordinator. The Notice surfacing the downgrade is already emitted by the Fetcher in A4; the Coordinator does not emit a duplicate. +- **CHANGELOG** — `[Unreleased]` Changed entry for the classify-thread parameter; Fixed entry covering the previously-silent classification against a full-diff fallback. + +End-to-end demoable: trigger A4's diff-range fallback (force-push away the prior iteration's commit on a re-review). The Summary opens with `⚠ diff-range: Incremental diff unavailable...` (emitted by A4), and the thread classifications visibly downgrade — what would have been `addressed` or `obsolete` is now `pending`. The reviewer sees one Notice + one consistently-conservative classification, instead of false-confidence verdicts. + +## Acceptance criteria + +- [ ] `classify-thread` accepts a `diffRange` parameter; default is `'incremental'`. +- [ ] When `diffRange === 'full'`, outputs `addressed` and `obsolete` are remapped to `pending`; `disputed` is unaffected. +- [ ] At least two new test cases in `classify-thread.test.mjs` cover the downgrade branches. +- [ ] Re-review Coordinator parses `DIFF_RANGE` from `ADO_FETCHER_RESULT` and passes it to every classify-thread call. +- [ ] On a synthetic full-diff fallback, no thread is classified as `addressed` or `obsolete` purely from diff position. +- [ ] No duplicate diff-range Notice is emitted by the Coordinator (the Fetcher's Notice from A4 is the only one). +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/04-diff-range-sentinel.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q6 Option γ — when `DIFF_RANGE=full`, `addressed` and `obsolete` outputs from `classify-thread` are remapped to `pending`; `disputed` is unaffected (its derivation is reviewer-reply-based, not diff-position-based). Option α (silent continuation) and Option β (skip classification entirely) were both rejected — γ preserves classifications the Coordinator can still make confidently while defaulting diff-position-derived verdicts to the safer state. No outstanding questions. diff --git a/docs/issues/pr-review-platform-failure-handling/done/04-coordinator-match-finding-throw.md b/docs/issues/pr-review-platform-failure-handling/done/04-coordinator-match-finding-throw.md new file mode 100644 index 0000000..bc98b2b --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/done/04-coordinator-match-finding-throw.md @@ -0,0 +1,47 @@ +# B4. Coordinator `match-finding` throws + DEGRADED Notice on catch + +**Status:** resolved +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-platform-failure-handling/PRD.md` + +## What to build + +Change `match-finding` to throw on parse errors (distinguishable from `null` = legitimate no-match), and have the Re-review Coordinator catch the throw and surface a DEGRADED Notice instead of silently treating it as no-match (which today causes the same finding to be re-posted as a duplicate inline thread). + +Implementation cuts through every layer: + +- **`scripts/re-review/match-finding.mjs`** — today returns `null` on no match. New contract: `null` continues to mean "legitimate no-match"; a thrown `Error` distinguishes a parse failure in the input. Internal `JSON.parse` calls and helper-call boundaries that previously swallowed errors now propagate them. +- **Existing tests** — `scripts/re-review/match-finding.test.mjs` gets new cases asserting that malformed input throws (rather than returns `null`). Existing "no match → null" cases unchanged. +- **Re-review Coordinator prompt** — Step 6a (per-finding match) wraps the `match-finding` call in try/catch. On throw, append a DEGRADED Notice to the Coordinator's `NOTICES` array (`kind: thread-match`, message: "Could not classify finding at : — falling back to no-match.") and let the finding fall through naturally to the unclassified path (which today causes duplicate posting — but now with the Notice surfacing the cause to the reviewer). +- **Coordinator result block** — `RE_REVIEW_COORDINATOR_RESULT_START/END` gains a `NOTICES: [...]` field (similar to the Fetcher's and the Writer's). The orchestrator merges Coordinator notices with Fetcher and Writer notices into the combined Summary block. +- **CHANGELOG** — `[Unreleased]` Changed entry for match-finding's new throw semantics; Fixed entry covering the silent-duplicate-posting bug. + +End-to-end demoable: inject a synthetic match-finding parse failure (e.g. corrupt the prior threads JSON for one specific thread). The reviewer sees the finding re-posted as a new inline thread (today's behaviour) AND a `⚠ thread-match: Could not classify finding at /src/foo.ts:42 — falling back to no-match.` Notice in the Summary, instead of just the silent duplicate. + +## Acceptance criteria + +- [ ] `match-finding` throws on input parse errors; returns `null` only for legitimate no-match. +- [ ] At least two new test cases cover the throw paths. +- [ ] Coordinator wraps the per-finding match call in try/catch and emits a DEGRADED Notice (`kind: thread-match`) on throw. +- [ ] Coordinator's result block emits a `NOTICES` array. +- [ ] Orchestrator merges Coordinator notices into the combined Notice block. +- [ ] On a synthetic match-finding parse failure, both the duplicate posting and the Notice appear in the Summary. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q8(a) — throw-on-parse-error in `match-finding`; try/catch in the Coordinator; on throw the finding falls through to natural duplicate posting but with a DEGRADED Notice (`kind: thread-match`) surfacing the cause. Aborting the whole Coordinator was considered and rejected because parse errors are local to one finding-thread pair. No outstanding questions. diff --git a/docs/issues/pr-review-platform-failure-handling/done/05-coordinator-patch-to-fixed-mapping.md b/docs/issues/pr-review-platform-failure-handling/done/05-coordinator-patch-to-fixed-mapping.md new file mode 100644 index 0000000..0609e56 --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/done/05-coordinator-patch-to-fixed-mapping.md @@ -0,0 +1,48 @@ +# B5. Coordinator PATCH-to-fixed routed through `parse-write-response` + +**Status:** resolved +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-platform-failure-handling/PRD.md` + +## What to build + +Apply the canonical HTTP-tier mapping to the Re-review Coordinator's PATCH-to-fixed call site, replacing the existing 409-only catch-all with the same uniform error handling every other ADO write surface now uses. + +Implementation cuts through every layer: + +- **Re-review Coordinator prompt** — Step 5's PATCH-to-fixed call site (where the Coordinator marks an `addressed` thread as fixed by PATCHing its `status`) is refactored to capture exit code, response body, and stderr, then route them through `parse-write-response.mjs` (from B1). +- **Tier handling for PATCH-to-fixed:** + - `ok: true` → thread successfully marked fixed; continue. + - `ok: false, tier: 'aborted'` (401/403) → emit stderr message and exit the Coordinator non-zero. The orchestrator surfaces the abort in the Trailer. + - `ok: false, tier: 'degraded'` (5xx/network/other-4xx) → push a per-thread Notice (`kind: patch-to-fixed`, message: "Could not mark thread as fixed (HTTP ). Thread remains active and will be re-evaluated on next re-review.") to the Coordinator's `NOTICES` array, continue to the next thread. +- **Special-cases preserved by the canonical mapping** — 404 (thread deleted) and 409 (state already changed) both map to `ok: true` in `classify-http-error`, so the Coordinator continues silently for those, matching today's behaviour and the user's intent. +- **CHANGELOG** — `[Unreleased]` Changed entry for the Coordinator's PATCH-to-fixed call site; Fixed entry covering the silent-failure auth gap (401/403 used to be a "PATCH warning" string on stdout that nothing read). + +End-to-end demoable: run a re-review against a PR whose threads include at least one `addressed` candidate, while the local `az devops login` is revoked. The Claude interface ends with the Trailer aborted line naming the auth failure ("Could not mark thread N as fixed (HTTP 401). Try `az devops login` to re-authenticate."). Restore auth and simulate a 5xx (e.g. throttling), and the Summary's `## Notices` block contains an entry like "⚠ patch-to-fixed: Could not mark thread N as fixed (HTTP 503). Thread remains active and will be re-evaluated on next re-review." — and the rest of the re-review completes normally. + +## Acceptance criteria + +- [ ] Coordinator's PATCH-to-fixed call routes through `parse-write-response.mjs`. +- [ ] 401 or 403 from any PATCH-to-fixed aborts the Coordinator with a clear stderr message; the orchestrator's Trailer line reads `❌ Review aborted: auth — ...`. +- [ ] 5xx / network / other-4xx from any PATCH-to-fixed emits a per-thread DEGRADED Notice; the Coordinator continues to the next thread. +- [ ] 404 and 409 continue silently (canonical OK tier). +- [ ] The old 409-only catch-all is removed from the Coordinator prompt. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-platform-failure-handling/01-writer-http-tier-mapping.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q7 canonical HTTP-tier mapping applied to the Coordinator's PATCH-to-fixed (the only HTTP write surface in the Coordinator). 404 and 409 stay OK per the canonical mapping (today's 409-only catch-all behaviour is preserved; 404 is now also OK because a deleted thread is a domain success). 401/403 abort the Coordinator with the same stderr+Trailer contract as the Writer. No outstanding questions. diff --git a/docs/issues/pr-review-platform-failure-handling/done/06-pre-pr-notice-surface.md b/docs/issues/pr-review-platform-failure-handling/done/06-pre-pr-notice-surface.md new file mode 100644 index 0000000..27a8f55 --- /dev/null +++ b/docs/issues/pr-review-platform-failure-handling/done/06-pre-pr-notice-surface.md @@ -0,0 +1,48 @@ +# B6. Pre-PR Notice surface: suspicious-shape Notice + Gitflow-aware default-branch fallback + +**Status:** resolved +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Type:** AFK + +## Parent + +`docs/issues/pr-review-platform-failure-handling/PRD.md` + +## What to build + +Give Pre-PR mode the same Notice surface as ADO modes. Detect malformed diff inputs. Replace the hardcoded `main` fallback with a Gitflow-aware fallback chain. + +Implementation cuts through two helpers and the orchestrator's Pre-PR steps: + +- **`scripts/pre-pr.mjs` (`buildPrePrContext` + `parseChangedFilesFromDiff`)** — return shape of `buildPrePrContext` extended to `{ rawDiff, changedFiles, filteredFiles, notices: Notice[] }`. `parseChangedFilesFromDiff` detects suspicious shape: non-empty input that contains ≥ 1 `diff --git` header but produces zero parsed paths. When detected, `buildPrePrContext` pushes a DEGRADED Notice (`kind: diff-parse`, message: "Pre-PR diff parsed to zero files but contained diff headers — input may be malformed.") to the returned `notices` array. Existing test file extended with cases covering the suspicious-shape detection. +- **New helper** `scripts/pre-pr/detect-default-branch.mjs` — pure function `({ branchExists, remoteHeadBranch }) → { branch: string | null, source: 'remote-show' | 'develop-fallback' | 'main-fallback' | 'master-fallback' | 'none', notice?: Notice }`. The function tries (in order): the `remoteHeadBranch` argument (parsed by the bash side from the `HEAD branch:` line of `git remote show origin` output) if non-empty; then `origin/develop`; then `origin/main`; then `origin/master`. Returns `{ branch: null, source: 'none' }` when nothing exists. Emits a `warning` Notice (`kind: default-branch`, message: `"Default branch not detected via remote-show; computed diff against origin/ ()."`) when any fallback level fires. With unit tests covering each branch. +- **`commands/review-pr.md` (Pre-PR Step A)** — wires `detect-default-branch.mjs` via the same `await import(...)` pattern as other helpers. The bash side first runs `git remote show origin 2>/dev/null` and parses the `HEAD branch:` line out of its output (passing the resulting branch name, or empty string on failure, as `remoteHeadBranch`); it also passes an injected `branchExists(name)` implementation that runs `git rev-parse --verify --quiet "refs/remotes/origin/$name"`. On `branch: null`, the orchestrator emits a stderr message and prints the Trailer aborted line; on any fallback level, the Notice is pushed into the Pre-PR `notices` array. +- **`commands/review-pr.md` (Pre-PR Step B + E)** — Step B uses `buildPrePrContext().notices` and merges them with the default-branch Notice. Step E prints all Notices before findings (per PRD A's pre-PR contract), then the findings, then the Trailer line (which already includes notice counts). +- **CHANGELOG** — `[Unreleased]` Added entry for `detect-default-branch.mjs`; Changed entry for `buildPrePrContext` return shape; Fixed entries for the suspicious-shape detection and the Gitflow-aware fallback. + +End-to-end demoable: run `/pr-review:review-pr` (no URL) in a Gitflow project where `origin/develop` exists but the local `git remote show origin` is offline (e.g. break the remote). The Claude interface prints `⚠ default-branch: Default branch not detected via remote-show; computed diff against origin/develop (develop-fallback).` then the findings, then the Trailer. Repeat in a trunk-only project (no develop branch) → fallback message names `main`. Repeat in a project with no `develop`, `main`, or `master` → `❌ Review aborted: default-branch — No detectable default branch on origin (tried develop, main, master). Specify a base manually.` + +## Acceptance criteria + +- [ ] `buildPrePrContext` returns a `notices: Notice[]` field. +- [ ] `parseChangedFilesFromDiff` suspicious-shape detection emits a DEGRADED Notice (`kind: diff-parse`) via `buildPrePrContext`. +- [ ] `scripts/pre-pr/detect-default-branch.mjs` exists with full unit-test coverage (≥ 6 cases). +- [ ] The fallback chain order is `remote-show` → `origin/develop` → `origin/main` → `origin/master` → `none`. +- [ ] Any fallback level fires a `warning` Notice (`kind: default-branch`) that names the actually-used branch. +- [ ] `none` aborts the run with a clear stderr message and a Trailer aborted line. +- [ ] Pre-PR mode prints all Notices before findings. +- [ ] `commands/review-pr.md` is ≤ 200 lines. +- [ ] `pnpm format`, `pnpm check`, `pnpm --filter pr-review test`, `pnpm --filter pr-review verify:changelog` all pass. + +## Blocked by + +`docs/issues/pr-review-ado-fetcher-reliability/01-end-to-end-notice-pipeline.md` + +--- + +## Triage Notes + +> _This was generated by AI during triage._ + +Locked during the `/grill-with-docs` session of 2026-05-13: Q8(c) suspicious-shape doctrine (non-empty input with `diff --git` headers but zero parsed paths → DEGRADED Notice); Q8(d) + user follow-up — Gitflow-aware fallback chain (`remote-show` → `origin/develop` → `origin/main` → `origin/master` → ABORT) replacing the hardcoded `main` fallback, with a Notice naming the actually-used branch. `detect-default-branch.mjs` takes an injectable `branchExists` tester for cross-platform unit testability. No outstanding questions. diff --git a/docs/issues/pr-review-pre-pr-default-branch-override/01-env-var-override.md b/docs/issues/pr-review-pre-pr-default-branch-override/01-env-var-override.md new file mode 100644 index 0000000..aec8718 --- /dev/null +++ b/docs/issues/pr-review-pre-pr-default-branch-override/01-env-var-override.md @@ -0,0 +1,25 @@ +--- +title: pre-pr: env var override for default-branch fallback chain +created: 2026-05-14 +--- + +**Status:** needs-triage +**Category:** enhancement +**Plugin:** `apps/claude-code/pr-review` +**Depends on:** orchestrator-split PR merged + +## Problem Statement + +`detect-default-branch.mjs` uses a Gitflow-aware fallback chain (`remote-show → develop → main → master → none`) when the remote is unreachable. Repos that intentionally use `main` as their integration branch but have a stale `origin/develop` ref will silently diff against `develop` with no escape hatch — the user has no way to override the choice. + +## Solution + +Add an environment variable override (e.g. `PR_REVIEW_DEFAULT_BRANCH`) that, when set, short-circuits the fallback chain and uses the specified branch directly. A Notice should still be emitted if the override bypasses a `remote-show` that would have returned a different result. + +## Acceptance criteria + +- `PR_REVIEW_DEFAULT_BRANCH=main` forces the pre-PR diff to use `main` regardless of what `remote-show` or the fallback chain would have returned +- When the env var is set, the fallback chain is not consulted +- An info-level Notice is emitted when the env var is used, so the invoker knows the override is active +- Existing behavior (no env var set) is unchanged +- Unit tests cover the override path diff --git a/docs/issues/pr-review-rereview/01-normalize-bot-signature.md b/docs/issues/pr-review-rereview/01-normalize-bot-signature.md index c214bea..1d473a3 100644 --- a/docs/issues/pr-review-rereview/01-normalize-bot-signature.md +++ b/docs/issues/pr-review-rereview/01-normalize-bot-signature.md @@ -1,6 +1,6 @@ # Normalize bot signature -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-rereview/02-detect-prior-review.md b/docs/issues/pr-review-rereview/02-detect-prior-review.md index 2857bc3..85aa13f 100644 --- a/docs/issues/pr-review-rereview/02-detect-prior-review.md +++ b/docs/issues/pr-review-rereview/02-detect-prior-review.md @@ -1,6 +1,6 @@ # Detect prior review + extract parse-signature and detect-prior-review modules -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-rereview/03-target-latest-iteration.md b/docs/issues/pr-review-rereview/03-target-latest-iteration.md index 0a75d1e..c07aa73 100644 --- a/docs/issues/pr-review-rereview/03-target-latest-iteration.md +++ b/docs/issues/pr-review-rereview/03-target-latest-iteration.md @@ -1,6 +1,6 @@ # Target latest PR iteration -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-rereview/04-incremental-diff-baseline.md b/docs/issues/pr-review-rereview/04-incremental-diff-baseline.md index 4302b85..d4504c2 100644 --- a/docs/issues/pr-review-rereview/04-incremental-diff-baseline.md +++ b/docs/issues/pr-review-rereview/04-incremental-diff-baseline.md @@ -1,6 +1,6 @@ # Incremental diff baseline -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-rereview/05-classify-existing-threads.md b/docs/issues/pr-review-rereview/05-classify-existing-threads.md index d4564ef..95b2a67 100644 --- a/docs/issues/pr-review-rereview/05-classify-existing-threads.md +++ b/docs/issues/pr-review-rereview/05-classify-existing-threads.md @@ -1,6 +1,6 @@ # Classify existing threads + extract classify-thread module -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-rereview/06-reply-to-threads.md b/docs/issues/pr-review-rereview/06-reply-to-threads.md index a05f0d9..71efb80 100644 --- a/docs/issues/pr-review-rereview/06-reply-to-threads.md +++ b/docs/issues/pr-review-rereview/06-reply-to-threads.md @@ -1,6 +1,6 @@ # Reply to threads + extract match-finding module + completion marker -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-rereview/07-summary-comment-policy.md b/docs/issues/pr-review-rereview/07-summary-comment-policy.md index e744178..4a99b88 100644 --- a/docs/issues/pr-review-rereview/07-summary-comment-policy.md +++ b/docs/issues/pr-review-rereview/07-summary-comment-policy.md @@ -1,6 +1,6 @@ # Summary comment policy on re-review -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-rereview/08-test-fixture-suite.md b/docs/issues/pr-review-rereview/08-test-fixture-suite.md index 82e332e..cff6e5e 100644 --- a/docs/issues/pr-review-rereview/08-test-fixture-suite.md +++ b/docs/issues/pr-review-rereview/08-test-fixture-suite.md @@ -1,6 +1,6 @@ # Complete test fixture suite -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-rereview/09-version-bump-and-release.md b/docs/issues/pr-review-rereview/09-version-bump-and-release.md index 63cc22e..b178d89 100644 --- a/docs/issues/pr-review-rereview/09-version-bump-and-release.md +++ b/docs/issues/pr-review-rereview/09-version-bump-and-release.md @@ -1,6 +1,6 @@ # Version bump, README, CLAUDE.md, ADR 0009 -**Status:** resolved +**Status:** closed **Category:** enhancement ## Parent diff --git a/docs/issues/pr-review-suppress-addressed-reply/PRD.md b/docs/issues/pr-review-suppress-addressed-reply/PRD.md new file mode 100644 index 0000000..417f42d --- /dev/null +++ b/docs/issues/pr-review-suppress-addressed-reply/PRD.md @@ -0,0 +1,119 @@ +# PRD: pr-review — suppress cosmetic reply on addressed threads + +**Status:** ready-for-agent +**Plugin:** `apps/claude-code/pr-review` +**Category:** enhancement +**Created:** 2026-05-08 + +--- + +> _This was generated by AI during triage._ + +## Problem Statement + +When a re-review classifies an existing Review Thread as `addressed`, the plugin currently posts a "Resolved as of Iteration N — thanks!" Reply before PATCHing the thread status to `fixed` in Azure DevOps. This Reply is purely cosmetic — it carries no information that the system reads or acts on. + +Developers experience two concrete problems: + +1. **PR conversation noise** — every resolved thread accumulates an extra bot comment that adds no information beyond what the status change already communicates. +2. **Notification spam** — ADO sends an email notification to all thread participants whenever a new comment is added. Developers who already resolved a thread themselves (by marking it `fixed` in ADO) receive an additional notification from the bot commenting on a thread they already closed. + +The second problem is more acute than it appears: the `addressed` classification fires when the ADO thread status is already `fixed`, `wontFix`, `closed`, or `byDesign` — which covers all cases where a developer resolved the thread manually. In practice, most threads are resolved by developers, not by code changes the bot detects. The bot then adds a reply on top of something the human already handled, creating a notification for no reason. + +## Solution + +Remove the Reply POST from the `addressed` branch of the re-review flow. The thread status PATCH to `fixed` (status 2) remains — that is the functional signal the system relies on for subsequent re-reviews. The `ADDRESSED_COUNT` counter also remains so the delta summary in the Review Summary continues to report resolved threads correctly. + +The `disputed` Reply is explicitly kept: it serves a functional purpose (acknowledging the author's perspective and providing the ADO workflow nudge), fires only when a human has actively engaged in the thread, and is out of scope for this change. + +`apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md` (ADR 0006), which currently mandates a reply for `addressed` threads, is revised to remove that requirement. + +## User Stories + +1. As a developer with an open PR, I want the bot to silently close addressed threads rather than adding a comment, so that my ADO notification feed is not flooded with "thanks" messages. +2. As a developer who already marked a thread as fixed myself, I want the bot to not reply to that thread on re-review, so that I do not receive a redundant notification for something I already handled. +3. As a PR author, I want my PR conversation to show only meaningful comments, so that I can find and act on genuine findings quickly. +4. As a PR reviewer, I want addressed threads to automatically close without noise, so that the PR thread list reflects the real state of the review without clutter. +5. As a plugin maintainer, I want `apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md` (ADR 0006) to accurately reflect the current behavior of the addressed-thread branch, so that future contributors do not misread the design intent. +6. As an AFK agent implementing a re-review, I want the `addressed` branch to skip the Reply POST entirely, so that only the PATCH and counter increments are executed for resolved threads. +7. As a developer, I want the Review Summary delta ("N resolved") to still reflect how many threads were addressed, so that I have an accurate high-level picture of re-review progress without individual thread noise. + +## Implementation Decisions + +- **One change site**: the `addressed` branch in Step 10 of `commands/review-pr.md`. Remove the `# 1. Post reply` block (the JSON heredoc and the `az devops invoke … pullRequestThreadComments` POST call). The `# 2. PATCH thread status to fixed` block, the `FINDINGS_POSTED` increment, and the `ADDRESSED_COUNT` increment are all unchanged. +- **Section heading update**: rename the `addressed` section heading from "confirm resolution and mark thread fixed" to "mark thread fixed" to reflect the removed step. +- **ADR 0006 revision** (`apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md`): update the addressed-thread rule from "post a reply confirming the fix and resolve the thread" to "resolve the thread silently (PATCH only)". Add a revision note with the date and reasoning (notification spam; developers self-resolve most threads). +- **No new modules**: this is a behavior removal, not an addition. No extraction or new abstractions are needed. +- **`disputed` branch untouched**: the disputed Reply is functional (ADO nudge + acknowledgement) and is not part of this change. +- **`ADDRESSED_COUNT` still flows into the delta summary**: the Step 11 summary reply ("N resolved") continues to report addressed threads correctly because the counter increment is preserved. + +## Testing Decisions + +Good tests for this change verify observable behavior from the outside — what comments appear (or do not appear) in ADO — not internal implementation details like which JSON file was written to `/tmp`. + +The command file (`review-pr.md`) is a markdown instruction set and cannot be unit tested in isolation. Verification is therefore integration-level: + +- Trigger a re-review against a PR where at least one Review Thread is in `addressed` state (either by ADO status or by code change at those lines). +- Assert: no new Reply comment appears on the addressed thread. +- Assert: the thread status in ADO is `fixed` (status 2). +- Assert: the Review Summary delta correctly reports `ADDRESSED_COUNT ≥ 1`. + +The `classify-thread.mjs` module and its test suite are unaffected — this change does not touch classification logic. + +## Out of Scope + +- `disputed`, `pending`, and `obsolete` reply behavior. +- Any change to how `classifyThread()` determines the `addressed` state (ADO status codes or line-range intersection logic). +- GitHub PR support. +- Suppressing the completion marker reply or the delta summary reply. + +## Further Notes + +The inbox item (`docs/inbox/pr-review-supress-thanks-comments-on-addressed-threads.md`) can be retired once this PRD is actioned. + +The `disputed` reply was evaluated and explicitly kept in scope during grilling. It carries the ADO workflow nudge ("If you consider this resolved, please mark the thread as fixed in Azure DevOps") which is genuinely useful to developers unfamiliar with the ADO review UI. + +## Comments + +> _This was generated by AI during triage._ + +## Agent Brief + +**Category:** enhancement +**Summary:** Remove the cosmetic Reply posted to `addressed` Review Threads during re-review; keep the thread status PATCH. + +**Current behavior:** +When a re-review classifies a Review Thread as `addressed`, the plugin executes two actions: + +1. POSTs a Reply with the text "Resolved as of Iteration N — thanks!" (plus the Bot Signature) to the thread. +2. PATCHes the thread status to `fixed` (ADO status 2). + +The Reply generates an ADO email notification for all thread participants and adds a bot comment to threads that are often already closed by the developer. It carries no information the system reads or acts on. + +**Desired behavior:** +The `addressed` branch executes only the PATCH (status 2). No Reply is posted. The `FINDINGS_POSTED` and `ADDRESSED_COUNT` counters continue to be incremented so the Step 11 delta summary ("N resolved") remains accurate. Every other branch (`pending`, `disputed`, `obsolete`) is unchanged. + +`apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md` (ADR 0006) is updated to remove the requirement to post a reply for `addressed` threads, with a revision note explaining the reason (notification spam; developers self-resolve most threads). + +**Key interfaces:** + +- The `addressed` branch inside the re-review reply flow in the main review command — find the block that handles `addressed` Thread Classification and remove only the Reply POST, leaving the PATCH block and counter increments intact. +- `apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md` — revise the addressed-thread rule from "post a reply confirming the fix and resolve the thread" to "resolve the thread silently via PATCH only"; add a `**Revised:**` note with date and reasoning. +- The section heading for the `addressed` branch — update from "confirm resolution and mark thread fixed" to "mark thread fixed". + +**Acceptance criteria:** + +- [ ] During a re-review, no Reply comment is posted to threads classified as `addressed`. +- [ ] During a re-review, `addressed` threads are still PATCHed to `fixed` (status 2) in ADO. +- [ ] `ADDRESSED_COUNT` is still incremented for each `addressed` thread and reflected correctly in the Step 11 delta summary. +- [ ] `FINDINGS_POSTED` is still incremented for each `addressed` thread. +- [ ] `disputed`, `pending`, and `obsolete` branch behavior is unchanged. +- [ ] ADR 0006 no longer states that a Reply is required for `addressed` threads. +- [ ] The `addressed` branch section heading no longer references "confirm resolution". + +**Out of scope:** + +- `disputed`, `pending`, and `obsolete` reply behavior. +- Any change to `classifyThread()` logic or classification criteria. +- GitHub PR support. +- The completion marker reply or the delta summary reply in Step 11. diff --git a/docs/issues/pr-review-suppress-addressed-reply/done/01-remove-addressed-reply.md b/docs/issues/pr-review-suppress-addressed-reply/done/01-remove-addressed-reply.md new file mode 100644 index 0000000..39fc266 --- /dev/null +++ b/docs/issues/pr-review-suppress-addressed-reply/done/01-remove-addressed-reply.md @@ -0,0 +1,70 @@ +# Remove addressed-thread Reply + revise ADR 0006 + +**Status:** resolved +**Category:** enhancement +**Type:** AFK + +## Parent + +`docs/issues/pr-review-suppress-addressed-reply/PRD.md` + +## What to build + +Remove the Reply POST from the `addressed` branch of the re-review flow in the main review command. The thread status PATCH to `fixed` (status 2), the `FINDINGS_POSTED` increment, and the `ADDRESSED_COUNT` increment must all remain untouched. + +Update the `addressed` branch section heading to no longer reference "confirm resolution". + +Revise `apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md` (ADR 0006) to remove the requirement to post a Reply for `addressed` threads. Add a `**Revised:**` note with the date and the reason: notification spam; developers self-resolve most threads, causing the bot to comment on already-closed threads. + +## Acceptance criteria + +- [ ] During a re-review, no Reply comment is posted to threads classified as `addressed`. +- [ ] During a re-review, `addressed` threads are still PATCHed to `fixed` (status 2) in ADO. +- [ ] `ADDRESSED_COUNT` is still incremented for each `addressed` thread and reflected correctly in the Step 11 delta summary. +- [ ] `FINDINGS_POSTED` is still incremented for each `addressed` thread. +- [ ] `disputed`, `pending`, and `obsolete` branch behavior is unchanged. +- [ ] `apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md` no longer states that a Reply is required for `addressed` threads. +- [ ] The `addressed` branch section heading no longer references "confirm resolution". + +## Blocked by + +None — can start immediately. + +## Comments + +> _This was generated by AI during triage._ + +## Agent Brief + +**Category:** enhancement +**Summary:** Remove the Reply POST from the `addressed` branch of the re-review flow; revise ADR 0006 to match. + +**Current behavior:** +When a re-review classifies a Review Thread as `addressed`, the plugin executes two steps: (1) POSTs a Reply with "Resolved as of Iteration N — thanks!" plus the Bot Signature, then (2) PATCHes the thread status to `fixed` (ADO status 2). The Reply generates an ADO notification for all thread participants and adds a bot comment to threads that developers often already closed themselves. + +**Desired behavior:** +The `addressed` branch executes only the PATCH (status 2). No Reply is posted. The `FINDINGS_POSTED` and `ADDRESSED_COUNT` counters are still incremented so the Step 11 delta summary ("N resolved") remains accurate. The section heading for the `addressed` branch is updated to no longer reference "confirm resolution". Every other branch (`pending`, `disputed`, `obsolete`) is unchanged. + +ADR `0006-reply-not-duplicate-auto-resolve.md` is revised: the addressed-thread rule changes from "post a reply confirming the fix and resolve the thread" to "resolve the thread silently via PATCH only". A `**Revised:**` note is added with the date (2026-05-08) and the reason: notification spam; developers self-resolve most threads, so the bot was commenting on already-closed threads. + +**Key interfaces:** + +- The `addressed` branch inside the re-review reply flow in `commands/review-pr.md` — locate the block under the `addressed` classification label; remove only the Reply POST heredoc and the `az devops invoke … pullRequestThreadComments` call that follows it; leave the PATCH block and both counter increments intact. +- The section heading for the `addressed` branch — remove "confirm resolution and" from the heading. +- `apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md` — update the bullet for `addressed` threads under the Decision section; append a Revised note. + +**Acceptance criteria:** + +- [ ] During a re-review, no Reply comment is posted to threads classified as `addressed`. +- [ ] During a re-review, `addressed` threads are still PATCHed to `fixed` (status 2) in ADO. +- [ ] `ADDRESSED_COUNT` is still incremented for each `addressed` thread and reflected correctly in the delta summary. +- [ ] `FINDINGS_POSTED` is still incremented for each `addressed` thread. +- [ ] `disputed`, `pending`, and `obsolete` branch behavior is unchanged. +- [ ] `apps/claude-code/pr-review/docs/adr/0006-reply-not-duplicate-auto-resolve.md` no longer states that a Reply is required for `addressed` threads, and includes a Revised note. +- [ ] The `addressed` branch section heading no longer references "confirm resolution". + +**Out of scope:** + +- `disputed`, `pending`, and `obsolete` reply behavior. +- Any change to `classifyThread()` logic or classification criteria. +- Version bump and CHANGELOG (covered by issue 02). diff --git a/docs/issues/pr-review-suppress-addressed-reply/done/02-version-bump.md b/docs/issues/pr-review-suppress-addressed-reply/done/02-version-bump.md new file mode 100644 index 0000000..6690c56 --- /dev/null +++ b/docs/issues/pr-review-suppress-addressed-reply/done/02-version-bump.md @@ -0,0 +1,60 @@ +# Version bump + CHANGELOG + +**Status:** resolved +**Category:** enhancement +**Type:** AFK + +## Parent + +`docs/issues/pr-review-suppress-addressed-reply/PRD.md` + +## What to build + +Bump the `pr-review` plugin version by a patch increment and add a dated CHANGELOG entry describing the removal of the cosmetic "thanks" Reply on addressed threads. + +The version must be updated in both `plugin.json` and `marketplace.json`. Use the existing `bump` release-tools command rather than hand-editing. + +## Acceptance criteria + +- [ ] `plugin.json` version is incremented by one patch. +- [ ] `marketplace.json` version matches `plugin.json`. +- [ ] `CHANGELOG.md` has a new dated entry under the new version describing the change (addressed threads are now silently resolved — no Reply comment is posted). + +## Blocked by + +`docs/issues/pr-review-suppress-addressed-reply/01-remove-addressed-reply.md` + +## Comments + +> _This was generated by AI during triage._ + +## Agent Brief + +**Category:** enhancement +**Summary:** Patch-bump the `pr-review` plugin version and add a CHANGELOG entry for the addressed-thread reply removal. + +**Current behavior:** +The plugin version in `plugin.json` and `marketplace.json` reflects the state before the addressed-thread Reply was removed. + +**Desired behavior:** +The plugin version is incremented by one patch. `CHANGELOG.md` has a new dated entry under the new version stating that `addressed` threads are now silently resolved — no Reply comment is posted, only the thread status PATCH. + +Use the `pnpm --filter pr-review bump patch` release-tools command to update the version; do not hand-edit version fields. + +**Key interfaces:** + +- `plugin.json` `version` field — incremented by one patch via the bump command. +- `marketplace.json` `plugins[0].version` field — kept in sync by the bump command. +- `CHANGELOG.md` — new dated entry under the new version number. + +**Acceptance criteria:** + +- [ ] `plugin.json` version is one patch higher than the version at the time issue 01 was completed. +- [ ] `marketplace.json` version matches `plugin.json`. +- [ ] `CHANGELOG.md` has a new dated entry under the new version describing the removal of the cosmetic Reply on `addressed` threads. +- [ ] No other files are modified. + +**Out of scope:** + +- Any code or documentation changes (covered by issue 01). +- Minor or major version bumps. diff --git a/docs/process/ai-development.md b/docs/process/ai-development.md new file mode 100644 index 0000000..a9fb516 --- /dev/null +++ b/docs/process/ai-development.md @@ -0,0 +1,181 @@ +# AI Development in This Repo — Deep Guide + +This guide explains the mental model behind the AI-development workflow, the architectural decisions that make it reliable, and the failure modes to watch for. Read `docs/process/development-workflow.md` first for the quick-reference steps. This document explains the why. + +--- + +## 1. Two runners, not one + +The most important thing to understand is that this repo has two distinct execution loops, and they are not interchangeable. + +| | Spec Runner | Feature Runner | +| --------------------- | --------------------------------------------------------------------------------- | ---------------------------------------------------------- | +| **Input** | `docs/plans/NN-*.md` Spec | `docs/issues//NN-*.md` Issue | +| **Invocation** | `pnpm ralph` | `/implement-feature` | +| **Format** | Prescriptive: before/after snapshots, shell verification commands, explicit steps | Descriptive: `## What to build` + `## Acceptance criteria` | +| **Worker** | Agent follows spec as recipe (or `/tdd` for behavioral specs) | `/tdd` in non-interactive AFK mode | +| **Completion marker** | `**Status: done**` in spec file | `Status: resolved` in issue file | +| **Branch** | Current branch | `feature/afk/` worktree | + +**When to use which:** The Spec Runner is for building and evolving the repo itself — release tooling, CI configuration, monorepo infrastructure. The Feature Runner is for product work on top of a stable system — new plugin capabilities, improvements to existing features. A rough heuristic: if the work would change something under `packages/` or `.github/`, it belongs in a Spec. If it changes something under `apps/claude-code//`, it belongs in a Feature. + +Both runners are backed by the same agent; the difference is in what inputs they receive and how much the agent is expected to figure out on its own. + +--- + +## 2. The pipeline and its quality gates + +Every piece of work passes through a pipeline before an agent executes it. Each stage has a human-review checkpoint: + +``` +docs/inbox/ ← /inbox: raw capture, no review required + ↓ +/grill-with-docs ← human reviews every branch of the design tree + ↓ +/to-prd ← human reviews the synthesized PRD + ↓ +/to-issues ← human reviews the vertical slice breakdown + ↓ +/triage ← human moves issues to ready-for-agent + ↓ +/implement-feature ← agent executes, no human present +``` + +The pipeline is load-bearing. The quality of the autonomous execution at the bottom depends entirely on the quality of the decisions captured at each stage above it. A vague acceptance criterion that slips through triage will produce a vague implementation — and there is no human in the loop to catch it until the PR review. + +--- + +## 3. Why context quality determines AFK quality + +When `/tdd` runs inside the Feature Runner, it runs non-interactively. In a normal interactive session, `/tdd`'s planning phase asks the user to confirm interface changes and approve which behaviours to test before writing any code. In AFK mode there is no user to ask. + +The issue's `## Acceptance criteria` replaces that conversation. The planning phase is not skipped — it was completed during the grilling and issue-writing stages. The Feature Runner simply does not repeat it at runtime. + +This means there is a direct line between **grilling quality → PRD quality → issue acceptance criteria quality → implementation correctness**. If any link in that chain is weak, the agent produces a _correct-but-wrong_ implementation: code that satisfies the literal issue description but diverges from what you actually intended. + +The grilling session (`/grill-with-docs`) is where that chain is forged. It is not a formality — it is the point at which ambiguity is eliminated and architectural constraints are identified. Skipping or shortcutting it shifts the cost downstream, where it is much more expensive to recover from. + +--- + +## 4. The context bundle + +When the Feature Runner invokes `/tdd` for an issue, it does not pass only the issue file. It assembles a **context bundle** from six sources: + +| Input | Why it matters | +| --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | +| **Issue file** | The `## What to build` and `## Acceptance criteria` — the pre-answered plan | +| **PRD** | The "why" behind the feature; the shared vision from grilling. Without it, the agent reasons from a vertical slice with no broader context | +| **Sibling issue files** | Dependency awareness; "what is already resolved" without the runner summarising | +| **Scoped CONTEXT.md** | Domain glossary — ensures test names and interfaces match the project's vocabulary | +| **Scoped ADRs** | Architectural constraints the implementation must respect | +| **Recent commits (last 5)** | The ideation trail — grilling sessions typically modify CONTEXT.md and ADRs, and those changes land in commits before the runner executes | + +### ADR scoping + +Not all ADRs are relevant to all work. Root `docs/adr/` covers monorepo concerns (versioning, tagging, CI) — those are noise for plugin implementation. Per-plugin `docs/adr/` covers domain-specific decisions that directly constrain what the agent builds. + +The runner infers scope from the PRD: if it references paths under `apps/claude-code//`, inject that plugin's ADRs and CONTEXT.md. If it references paths outside `apps/` (`.claude/`, `docs/`, `packages/`), inject the root ADRs and CONTEXT.md. + +This is why CONTEXT.md and ADRs must be kept current. They are not documentation artifacts — they are runtime inputs to every AFK agent execution. + +--- + +## 5. Writing issues that AFK agents can execute + +An issue's `## Acceptance criteria` is doing two jobs: it is the definition of done for the human reviewer, and it is the planning conversation substitute for the AFK agent. It must be specific enough for both audiences. + +**Good acceptance criteria** are checkable without ambiguity: + +```markdown +## Acceptance criteria + +- [ ] `grep -nF '🤖 *Reviewed by Claude Code*' commands/review-pr.md` → matches at every signature location +- [ ] `pnpm --filter pr-review test` passes +- [ ] `pnpm typecheck` passes +``` + +**Bad acceptance criteria** leave the agent to interpret intent: + +```markdown +## Acceptance criteria + +- [ ] The feature works correctly +- [ ] Tests pass +``` + +The `to-issues` skill produces acceptance criteria — but an agent authors them. They are then reviewed by you before the issue reaches `ready-for-agent`. That review is the last human checkpoint before AFK execution. Use it. + +If an issue's acceptance criteria are too vague to verify without judgment, the issue is not `ready-for-agent`. Send it back to `needs-specs`. + +--- + +## 6. Dependency ordering + +Issues produced by `to-issues` are named with a numeric prefix (`01-`, `02-`, etc.) for human readability. The numbers usually reflect dependency order because `to-issues` publishes blockers first. But **numerical order is not the execution contract**. + +The `## Blocked by` field in each issue is the canonical dependency signal. The Feature Runner builds a topological order from `## Blocked by` references before executing anything. If `## Blocked by` and numerical order conflict, the runner halts rather than proceeding silently in the wrong order — because a wrong execution order means downstream issues inherit a broken foundation. + +When writing or reviewing issues: always fill in `## Blocked by` accurately. "None — can start immediately" is a valid and important signal. A missing or incorrect `## Blocked by` is more dangerous than a missing acceptance criterion, because the sequencing error compounds silently across every subsequent issue. + +The dependency graph also reveals which issues are parallelisable (those with no blockers and no dependents). The Feature Runner serialises all execution by design — parallel issue execution is explicitly out of scope — but understanding which issues are independent helps when manually intervening in a failed run. + +--- + +## 7. Running overnight + +The Feature Runner is designed to be composable with `/loop` for unattended overnight execution: + +``` +/loop /implement-feature +``` + +When the queue empties (no qualifying feature exists — see `.claude/skills/implement-feature/SKILL.md` step 0 for the full qualification rule), the runner outputs `LOOP_COMPLETE` and the loop terminates cleanly. This mirrors the Spec Runner's `completion_promise: LOOP_COMPLETE` in `ralph.yml`. + +For overnight runs to succeed, the queue must be in good shape before you start: each target feature must qualify (see SKILL.md step 0) — every issue in `{ready-for-agent, resolved, closed, rejected, ready-for-human}`, no `needs-*` states, no conflicts between `## Blocked by` and numerical order, acceptance criteria specific enough to verify without judgment. A single malformed issue will halt the runner and leave the remainder of the queue unexecuted. + +If a `/tdd` invocation fails mid-feature, the failing issue is flipped to `needs-info` with a failure note appended. The runner stops. Subsequent issues in the same feature do not run — they could inherit a broken foundation. The `needs-info` flip prevents `/loop /implement-feature` from auto-selecting the same feature again until a developer triages the failure. Inspect the failure note, fix the issue or the codebase, set the issue back to `ready-for-agent`, and re-run. (Note: a Ctrl+C interrupt — as opposed to a `/tdd` failure — leaves the issue at `ready-for-agent` so a simple re-run resumes it.) + +--- + +## 8. CONTEXT.md and ADRs as living constraints + +`CONTEXT.md` and `docs/adr/` are not documentation you write once and forget. They are the vocabulary and constraint layer that every agent reads before writing code. Their quality directly affects the quality of every AFK execution. + +**Update CONTEXT.md** when a new domain term is introduced or an existing term is redefined. `/grill-with-docs` does this automatically during a grilling session — terms resolved during grilling are written into CONTEXT.md inline. If a term surfaces outside a grilling session, add it manually. + +**Write an ADR** when a decision is: (a) hard to reverse, (b) surprising without context, and (c) the result of a real trade-off with considered alternatives. An ADR that just restates the obvious adds noise and dilutes the ones that matter. + +**Never let ADRs drift.** An ADR that no longer reflects the codebase is worse than no ADR — it misdirects the agent. If a decision is superseded, update the original ADR's status to `Superseded by ADR-NNNN` and write the new one. + +The commits from your grilling sessions carry this context forward. The Feature Runner injects the last 5 commits into every `/tdd` invocation specifically because grilling sessions modify CONTEXT.md and ADRs — those changes land in commits and the agent needs the ideation trail, not just the final file state. + +--- + +## 9. Keeping docs/plans/ and docs/issues/ in sync + +The Spec Runner and Feature Runner evolved independently. Work that was implemented via the Spec Runner (i.e. a Spec in `docs/plans/` was marked `done`) may have a corresponding directory in `docs/issues//` that was never updated. The Feature Runner will attempt to implement those stale issues if they have `ready-for-agent` status. + +The convention: when a Spec is marked `**Status: done**`, check for a corresponding `docs/issues//` directory. If it exists, mark all issue files in it `closed` and append a note: + +```markdown +## Comments + +> _Closed 2026-05-09 — implemented via Spec Runner (docs/plans/NN-.md marked done)._ +``` + +This is a manual step. There is no automation for it. The `docs/agents/feature-runner.md` reference document records this convention for agents that need to be briefed on it. + +--- + +## Related + +- `docs/process/development-workflow.md` — the 8-phase quick reference +- `docs/process/ralph-loop-guide.md` — Spec Runner invocation and resumption detail +- `docs/process/spec-template.md` — spec file format +- `docs/agents/issue-tracker.md` — issue file conventions +- `docs/agents/triage-labels.md` — 8-state triage vocabulary +- `docs/adr/0023-spec-template-format.md` — why specs are prescriptive +- `docs/adr/0026-tdd-dispatch-by-version-impact.md` — when the Spec Runner uses /tdd +- `docs/adr/0027-feature-runner-context-bundle.md` — what /tdd receives per invocation +- `docs/adr/0028-blocked-by-canonical-sequencing.md` — why ## Blocked by beats filename order +- `docs/adr/0029-feature-runner-afk-invocation.md` — how AFK invocation works diff --git a/docs/process/development-workflow.md b/docs/process/development-workflow.md index 4116b2c..bddc319 100644 --- a/docs/process/development-workflow.md +++ b/docs/process/development-workflow.md @@ -1,6 +1,6 @@ # Development Workflow -This repo follows an adapted 8-phase version of Matt Pocock's 7-phase workflow. version of Matt Pocock's 7-phase AI development workflow. The phases move from raw idea capture through AFK execution to QA, using the tools already available here. +This repo follows an adapted 8-phase version of Matt Pocock's 7-phase AI development workflow. The phases move from raw idea capture through AFK execution to QA, using the tools already available here. Not every phase is required for every piece of work. A typo fix can go straight to execution. A major feature will touch every phase. @@ -68,26 +68,45 @@ Turn the PRD into independently-executable tickets: This creates `docs/issues//-.md` files — vertical slices that cut through all integration layers. Each ticket should be small enough to fit in a single agent context window. -Use the triage labels (`needs-triage` → `ready-for-agent` / `ready-for-human`) to track state. See `docs/agents/triage-labels.md`. +Use the triage labels to track state — see `docs/agents/triage-labels.md` for the full 8-state vocabulary (`needs-triage` → `needs-info` → `needs-specs` → `ready-for-agent` / `ready-for-human` → `resolved` → `closed` / `rejected`). ## Phase 7 — Execute -Work through the tickets. For agent-ready tickets: +There are two execution paths depending on the type of work. Choose based on where the work item lives, not on personal preference — the two runners are not interchangeable. + +### Spec Runner — for `docs/plans/` specs + +Use the Spec Runner when implementing infrastructure, tooling, or repo-level changes captured as Specs in `docs/plans/`: + +``` +pnpm ralph # root specs +pnpm --filter ralph # plugin-specific specs +``` + +Specs follow a prescriptive format (before/after snapshots, shell verification commands, acceptance criteria). The Spec Runner implements one Spec per iteration, commits, and stops. See `docs/process/ralph-loop-guide.md`. + +### Feature Runner — for `docs/issues/` features + +Use the Feature Runner when implementing product features tracked as Issues in `docs/issues//`. Once a feature has at least one `ready-for-agent` issue and no unprepped issues (`needs-triage`, `needs-info`, `needs-specs`): ``` -pnpm ralph # runs the Spec Runner (currently ralph-orchestrator) +/implement-feature # target a specific feature +/implement-feature # auto-select next ready feature ``` -Or for plugin-specific work: +The Feature Runner builds a dependency graph from `## Blocked by` references, invokes `/tdd` non-interactively for each issue in topological order, marks each issue `resolved` on completion, and opens a PR targeting `develop` when all issues are done. + +Compose with `/loop` for overnight queue draining: ``` -cd apps/claude-code/ -pnpm ralph +/loop /implement-feature ``` -For test-driven work, use the `/tdd` skill to enforce a red-green-refactor loop. +The runner outputs `LOOP_COMPLETE` when the queue is empty, which terminates the loop cleanly. + +### Human execution -Tickets that require human judgment (`ready-for-human`) are done by hand following the same steps. +Tickets marked `ready-for-human` require judgment that cannot be delegated to an agent. Work through them by hand, following the same red-green-refactor discipline as `/tdd`. Mark the issue `resolved` when done. ## Phase 8 — QA @@ -99,16 +118,17 @@ Human QA often surfaces new issues or improvement ideas — add them back to the ## Quick reference -| Phase | When | Tool | -| ------------ | -------------------------------- | --------------------------------------------- | -| 1. Capture | Idea surfaces mid-task | `/inbox ` | -| 2. Grill | Before any PRD or spec | `/grill-with-docs` or `/grill-me` | -| 3. Research | Unfamiliar external dependencies | `research.md` (ad hoc) | -| 4. Prototype | Uncertain design or UX | Ad hoc throwaway route | -| 5. PRD | After grilling | `/to-prd` → `docs/issues//PRD.md` | -| 6. Issues | After PRD | `/to-issues` → `docs/issues//-*.md` | -| 7. Execute | Tickets are `ready-for-agent` | `pnpm ralph` or `/tdd` | -| 8. QA | After execution | QA plan (agent-generated, human-verified) | +| Phase | When | Tool | +| --------------------- | ---------------------------------------------- | --------------------------------------------- | +| 1. Capture | Idea surfaces mid-task | `/inbox ` | +| 2. Grill | Before any PRD or spec | `/grill-with-docs` or `/grill-me` | +| 3. Research | Unfamiliar external dependencies | `research.md` (ad hoc) | +| 4. Prototype | Uncertain design or UX | Ad hoc throwaway route | +| 5. PRD | After grilling | `/to-prd` → `docs/issues//PRD.md` | +| 6. Issues | After PRD | `/to-issues` → `docs/issues//-*.md` | +| 7a. Execute (Spec) | Specs in `docs/plans/` are ready | `pnpm ralph` (Spec Runner) | +| 7b. Execute (Feature) | Issues in `docs/issues/` are `ready-for-agent` | `/implement-feature` (Feature Runner) | +| 8. QA | After execution | QA plan (agent-generated, human-verified) | ## Related @@ -117,3 +137,4 @@ Human QA often surfaces new issues or improvement ideas — add them back to the - `docs/agents/triage-labels.md` — 8-state triage vocabulary - `docs/process/ralph-loop-guide.md` — Spec Runner detail - `docs/process/spec-template.md` — spec file format +- `docs/process/ai-development.md` — deep guide: mental model, context quality, AFK trust chain, key decisions diff --git a/package.json b/package.json index 001e43b..fbda189 100644 --- a/package.json +++ b/package.json @@ -5,7 +5,7 @@ "type": "module", "packageManager": "pnpm@10.33.2", "engines": { - "node": ">=24", + "node": ">=22", "pnpm": ">=10" }, "scripts": { diff --git a/pnpm-workspace.yaml b/pnpm-workspace.yaml index 40da5b9..b4f0e0f 100644 --- a/pnpm-workspace.yaml +++ b/pnpm-workspace.yaml @@ -24,4 +24,13 @@ strictDepBuilds: true trustPolicy: no-downgrade -useNodeVersion: 24.15.0 +supportedArchitectures: + os: + - current + - linux + - darwin + - win32 + cpu: + - current + - x64 + - arm64