diff --git a/CHANGELOG.md b/CHANGELOG.md index e7160d0908..4a030e1c2d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,13 @@ ## [Unreleased] ### Added -- **New skill `/ghcp-handoff`.** Bounded delegation to the GitHub Copilot CLI (`copilot`). Generates structured prompts with hard boundaries, NOT-in-scope lists, stream-scoped deliverables, and PR contracts, then invokes `copilot -p` non-interactively and verifies the resulting PR stayed within bounds. Three modes: guided (interactive question flow), `--from-plan ` (extract a `## The GHCP prompt` section from a plan file and wrap it with worktree + metadata), and `verify ` (diff-vs-NOT-in-scope glob check + dependency deviation check + PR body contract check). Sibling to `/codex`: where `/codex` pulls a second opinion in, `/ghcp-handoff` pushes bounded mechanical work out. Verification writes a `gstack-review-log` JSONL entry so results surface in the Plan Status Footer. Adds `yaml` and `minimatch` as dependencies. +- **New skill `/ghcp-handoff`.** Bounded delegation to the GitHub Copilot CLI (`copilot`). Generates structured prompts with hard boundaries, NOT-in-scope lists, stream-scoped deliverables, and PR contracts, then invokes `copilot -p` non-interactively and verifies the resulting PR stayed within bounds. Three modes: guided (interactive question flow), `--from-plan ` (extract a `## The GHCP prompt` section from a plan file and wrap it with worktree + metadata), and `verify ` (diff-vs-NOT-in-scope glob check + dependency deviation check + PR body contract check). Sibling to `/codex`: where `/codex` pulls a second opinion in, `/ghcp-handoff` pushes bounded mechanical work out. Verification writes a `gstack-review-log` JSONL entry so results surface in the Plan Status Footer. Listed in the gstack skill routing rules, README power-tools table, and install-instructions skill list. Adds `yaml` and `minimatch` as dependencies. + +### Fixed +- **`/ghcp-handoff verify` no longer greenlights PRs that only mention required sections in prose.** The PR body check now matches Markdown headings and bold labels (e.g. `## Summary`, `**Summary:**`, `**Summary**:`) instead of any substring — so sentences like "this PR includes a summary of changes" no longer satisfy the contract. +- **Dependency-deviation checks stop flagging package.json metadata as new dependencies.** The manifest diff now parses the actual before/after file contents via `git show` instead of regex-matching `+` diff lines. Top-level `package.json` keys (`author`, `repository`, `engines`, `keywords`, etc.), `pyproject.toml` keys under `[project]` / `[tool.poetry]`, and `Cargo.toml` `[package]` keys are no longer misclassified as deps. Only keys under `dependencies` / `devDependencies` / `peerDependencies` / `optionalDependencies` (JSON), `[*dependencies]` / `[tool.poetry.*dependencies]` / `[tool.poetry.group.*.dependencies]` (TOML), `require` blocks (go.mod), and `gem "..."` lines (Gemfile) count. +- **`ghcp-detect.sh` comment matches behavior.** Removed the misleading "try `gh copilot` as fallback" claim — the extension doesn't support the `-p --allow-all-tools --output-format json --no-ask-user` flags the handoff requires, so only the standalone `copilot` CLI is detected. +- **Prompt template no longer leaks placeholders.** Removed unused `{{project_name}}`, `{{repo_abs_path}}`, `{{user_name}}`, and `{{forbidden_branches}}` variables that the guided-mode flow never collected, so they could render literally in the prompt sent to Copilot. The guided-mode Render step now lists the exact placeholder set and adds a "grep for remaining double-brace" pre-flight check. ## [0.18.3.0] - 2026-04-17 diff --git a/CLAUDE.md b/CLAUDE.md index 074b61221e..4f8a4b8f78 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -97,6 +97,7 @@ gstack/ ├── benchmark/ # /benchmark skill (performance regression detection) ├── canary/ # /canary skill (post-deploy monitoring loop) ├── codex/ # /codex skill (multi-AI second opinion via OpenAI Codex CLI) +├── ghcp-handoff/ # /ghcp-handoff skill (bounded delegation to GitHub Copilot CLI) ├── land-and-deploy/ # /land-and-deploy skill (merge → deploy → canary verify) ├── office-hours/ # /office-hours skill (YC Office Hours — startup diagnostic + builder brainstorm) ├── investigate/ # /investigate skill (systematic root-cause debugging) diff --git a/README.md b/README.md index d0065930ee..f3f89ff5ab 100644 --- a/README.md +++ b/README.md @@ -46,7 +46,7 @@ Fork it. Improve it. Make it yours. And if you want to hate on free open source Open Claude Code and paste this. Claude does the rest. -> Install gstack: run **`git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`** then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /design-shotgun, /design-html, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /connect-chrome, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /cso, /autoplan, /plan-devex-review, /devex-review, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, /learn. Then ask the user if they also want to add gstack to the current project so teammates get it. +> Install gstack: run **`git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`** then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /design-shotgun, /design-html, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /connect-chrome, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /ghcp-handoff, /cso, /autoplan, /plan-devex-review, /devex-review, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, /learn. Then ask the user if they also want to add gstack to the current project so teammates get it. ### Step 2: Team mode — auto-update for shared repos (recommended) @@ -228,6 +228,7 @@ Each skill feeds into the next. `/office-hours` writes a design doc that `/plan- | Skill | What it does | |-------|-------------| | `/codex` | **Second Opinion** — independent code review from OpenAI Codex CLI. Three modes: review (pass/fail gate), adversarial challenge, and open consultation. Cross-model analysis when both `/review` and `/codex` have run. | +| `/ghcp-handoff` | **Bounded Copilot Handoff** — delegate scaffolds, CI wiring, and mechanical refactors to the GitHub Copilot CLI. Generates structured prompts with hard boundaries, NOT-in-scope paths, stream-scoped deliverables, and a PR contract, then verifies the resulting PR stayed within bounds. Sibling to `/codex`: where `/codex` pulls a second opinion in, `/ghcp-handoff` pushes bounded work out. | | `/careful` | **Safety Guardrails** — warns before destructive commands (rm -rf, DROP TABLE, force-push). Say "be careful" to activate. Override any warning. | | `/freeze` | **Edit Lock** — restrict file edits to one directory. Prevents accidental changes outside scope while debugging. | | `/guard` | **Full Safety** — `/careful` + `/freeze` in one command. Maximum safety for prod work. | diff --git a/SKILL.md b/SKILL.md index 70d576cdc1..2cf0a3a10f 100644 --- a/SKILL.md +++ b/SKILL.md @@ -451,6 +451,7 @@ quality gates that produce better results than answering inline. - User asks to update docs after shipping → invoke `/document-release` - User asks for a weekly retro, what did we ship → invoke `/retro` - User asks for a second opinion, codex review → invoke `/codex` +- User asks to hand off to Copilot, delegate scaffold/mechanical work, or says "ghcp-handoff" → invoke `/ghcp-handoff` - User asks for safety mode, careful mode → invoke `/careful` or `/guard` - User asks to restrict edits to a directory → invoke `/freeze` or `/unfreeze` - User asks to upgrade gstack → invoke `/gstack-upgrade` diff --git a/SKILL.md.tmpl b/SKILL.md.tmpl index 3709c97c54..c0f1d6b17a 100644 --- a/SKILL.md.tmpl +++ b/SKILL.md.tmpl @@ -45,6 +45,7 @@ quality gates that produce better results than answering inline. - User asks to update docs after shipping → invoke `/document-release` - User asks for a weekly retro, what did we ship → invoke `/retro` - User asks for a second opinion, codex review → invoke `/codex` +- User asks to hand off to Copilot, delegate scaffold/mechanical work, or says "ghcp-handoff" → invoke `/ghcp-handoff` - User asks for safety mode, careful mode → invoke `/careful` or `/guard` - User asks to restrict edits to a directory → invoke `/freeze` or `/unfreeze` - User asks to upgrade gstack → invoke `/gstack-upgrade` diff --git a/ghcp-handoff/SKILL.md b/ghcp-handoff/SKILL.md index 8aaccd3e2d..e3fb50afa3 100644 --- a/ghcp-handoff/SKILL.md +++ b/ghcp-handoff/SKILL.md @@ -763,10 +763,22 @@ AskUserQuestion: "PR description sections — accept defaults or customize?" ### Render After all answers: 1. Read `${CLAUDE_SKILL_DIR}/templates/prompt-template.md`. -2. Fill all double-brace-style template variables (e.g. `task_summary`, `target_branch`, `stream_blocks`) from collected answers. -3. Write to `.gstack/ghcp-handoff/.md` + `.meta.json`. -4. Show the rendered prompt in a fenced block. -5. Proceed to **Step 2-exec**. +2. Substitute every double-brace placeholder in that file from the collected + answers. The full placeholder set is: + - `task_summary` — answer to Q1 + - `phase_note` + `stub_policy` — answers to Q4 + - `target_branch` — answer to Q2 + - `context_files_bullets` — rendered as `- ` per line from Q3 + - `numbered_boundaries` — one numbered line per entry from Q6 + - `stream_blocks` — each stream rendered as the `STREAM N — ` block (Q5) + - `not_in_scope_paths` / `not_in_scope_reasons` — YAML list items from Q7, + each prefixed with `- ` and indented 4 spaces to sit under the `paths:` / `reasons:` keys + - `forbidden_deps_list` — YAML list items from Q8 indented 2 spaces (or `[]` if none) +3. Before writing, grep the rendered output for any remaining double-brace + placeholders — if found, stop and tell the user which placeholder failed to substitute. +4. Write to `.gstack/ghcp-handoff/.md` + `.meta.json`. +5. Show the rendered prompt in a fenced block. +6. Proceed to **Step 2-exec**. --- diff --git a/ghcp-handoff/SKILL.md.tmpl b/ghcp-handoff/SKILL.md.tmpl index 156e299bbc..e7f69ad4af 100644 --- a/ghcp-handoff/SKILL.md.tmpl +++ b/ghcp-handoff/SKILL.md.tmpl @@ -182,10 +182,22 @@ AskUserQuestion: "PR description sections — accept defaults or customize?" ### Render After all answers: 1. Read `${CLAUDE_SKILL_DIR}/templates/prompt-template.md`. -2. Fill all double-brace-style template variables (e.g. `task_summary`, `target_branch`, `stream_blocks`) from collected answers. -3. Write to `.gstack/ghcp-handoff/.md` + `.meta.json`. -4. Show the rendered prompt in a fenced block. -5. Proceed to **Step 2-exec**. +2. Substitute every double-brace placeholder in that file from the collected + answers. The full placeholder set is: + - `task_summary` — answer to Q1 + - `phase_note` + `stub_policy` — answers to Q4 + - `target_branch` — answer to Q2 + - `context_files_bullets` — rendered as `- ` per line from Q3 + - `numbered_boundaries` — one numbered line per entry from Q6 + - `stream_blocks` — each stream rendered as the `STREAM N — ` block (Q5) + - `not_in_scope_paths` / `not_in_scope_reasons` — YAML list items from Q7, + each prefixed with `- ` and indented 4 spaces to sit under the `paths:` / `reasons:` keys + - `forbidden_deps_list` — YAML list items from Q8 indented 2 spaces (or `[]` if none) +3. Before writing, grep the rendered output for any remaining double-brace + placeholders — if found, stop and tell the user which placeholder failed to substitute. +4. Write to `.gstack/ghcp-handoff/.md` + `.meta.json`. +5. Show the rendered prompt in a fenced block. +6. Proceed to **Step 2-exec**. --- diff --git a/ghcp-handoff/bin/ghcp-detect.sh b/ghcp-handoff/bin/ghcp-detect.sh index 7c70bc60dc..2625317ed7 100644 --- a/ghcp-handoff/bin/ghcp-detect.sh +++ b/ghcp-handoff/bin/ghcp-detect.sh @@ -1,15 +1,18 @@ #!/usr/bin/env bash -# ghcp-detect.sh — Detect copilot CLI binary and print version. -# Exit 0 with version on stdout if found; exit 1 with empty stdout if not. +# ghcp-detect.sh — Detect the standalone GitHub Copilot CLI (`copilot`) and +# print its version. Exits 0 with version on stdout if found; exit 1 with +# empty stdout if not. +# +# Note: the `gh copilot` extension is intentionally NOT a fallback. The +# handoff workflow invokes `copilot -p ... --allow-all-tools --output-format +# json --no-ask-user`, flags that only the standalone CLI supports. set -euo pipefail -# Try 'copilot' first, then 'gh copilot' as fallback if command -v copilot &>/dev/null; then VERSION=$(copilot --version 2>/dev/null | head -1 || echo "unknown") echo "$VERSION" exit 0 fi -# Not found exit 1 diff --git a/ghcp-handoff/bin/ghcp-verify-boundaries.ts b/ghcp-handoff/bin/ghcp-verify-boundaries.ts index 311d48cf3b..25324a670e 100644 --- a/ghcp-handoff/bin/ghcp-verify-boundaries.ts +++ b/ghcp-handoff/bin/ghcp-verify-boundaries.ts @@ -107,55 +107,115 @@ export function checkBoundaries( } /** - * Simple dep manifest diff parser. - * Returns list of added dependency names from a unified diff of a manifest file. + * Dep-section names for each supported manifest. Keys outside these + * sections (e.g. `name`, `version`, `author`, `repository` in package.json; + * `[tool.poetry]` metadata in pyproject.toml) must NOT be flagged as deps. */ -export function extractAddedDeps( - diff: string, - manifestName: string -): string[] { - const added: string[] = []; - - // package.json: lines like + "foo": "^1.0.0" - if (manifestName === "package.json") { - const matches = diff.matchAll(/^\+\s+"([^"]+)":\s*"/gm); - for (const m of matches) { - // Skip metadata keys - if ( - !["name", "version", "private", "scripts", "description"].includes( - m[1] - ) - ) { - added.push(m[1]); +const PACKAGE_JSON_DEP_SECTIONS = [ + "dependencies", + "devDependencies", + "peerDependencies", + "optionalDependencies", +] as const; + +const CARGO_DEP_SECTION_RE = /^(?:dependencies|dev-dependencies|build-dependencies|target\..+?\.dependencies)$/; +const PYPROJECT_DEP_SECTION_RE = /^(?:dependencies|tool\.poetry\.dependencies|tool\.poetry\.dev-dependencies|tool\.poetry\.group\..+?\.dependencies)$/; + +function parsePackageJsonDeps(content: string): Set { + const deps = new Set(); + try { + const parsed = JSON.parse(content); + for (const section of PACKAGE_JSON_DEP_SECTIONS) { + const block = parsed?.[section]; + if (block && typeof block === "object") { + for (const name of Object.keys(block)) deps.add(name); } } + } catch { + // malformed JSON — nothing we can flag with confidence } + return deps; +} - // pyproject.toml: lines like +foo = ">=1.0" - if (manifestName === "pyproject.toml") { - const matches = diff.matchAll(/^\+([a-zA-Z0-9_-]+)\s*=/gm); - for (const m of matches) added.push(m[1]); +/** + * Walk TOML-ish content line by line tracking the active `[section]` header + * and collect `key = ...` entries only when the section matches a dep regex. + * This is intentionally minimal (no full TOML parse) — good enough to catch + * the common case without a new dependency. + */ +function parseTomlDeps(content: string, sectionRe: RegExp): Set { + const deps = new Set(); + let section = ""; + for (const raw of content.split("\n")) { + const line = raw.trim(); + if (!line || line.startsWith("#")) continue; + const header = line.match(/^\[(.+?)\]$/); + if (header) { + section = header[1].trim(); + continue; + } + if (!sectionRe.test(section)) continue; + const kv = line.match(/^([A-Za-z0-9_-]+)\s*=/); + if (kv) deps.add(kv[1]); } + return deps; +} - // go.mod: lines like +\trequire foo v1.0.0 - if (manifestName === "go.mod") { - const matches = diff.matchAll(/^\+\t([^\s]+)\s+v/gm); - for (const m of matches) added.push(m[1]); +function parseGoModDeps(content: string): Set { + const deps = new Set(); + // Matches both `require module v...` (single) and lines inside `require ( ... )` blocks. + for (const raw of content.split("\n")) { + const line = raw.trim(); + const m = line.match(/^(?:require\s+)?([^\s()]+)\s+v\d/); + if (m && m[1] !== "require") deps.add(m[1]); } + return deps; +} - // Cargo.toml: like package.json pattern - if (manifestName === "Cargo.toml") { - const matches = diff.matchAll(/^\+([a-zA-Z0-9_-]+)\s*=/gm); - for (const m of matches) added.push(m[1]); +function parseGemfileDeps(content: string): Set { + const deps = new Set(); + for (const m of content.matchAll(/^\s*gem\s+['"]([^'"]+)['"]/gm)) { + deps.add(m[1]); } + return deps; +} - // Gemfile: lines like +gem "foo" - if (manifestName === "Gemfile") { - const matches = diff.matchAll(/^\+gem\s+"([^"]+)"/gm); - for (const m of matches) added.push(m[1]); +/** + * Parse a manifest's current dependency names. Returns an empty set for + * unsupported manifests (callers should gate on the supported list). + */ +export function parseManifestDeps( + content: string, + manifestName: string +): Set { + switch (manifestName) { + case "package.json": + return parsePackageJsonDeps(content); + case "pyproject.toml": + return parseTomlDeps(content, PYPROJECT_DEP_SECTION_RE); + case "Cargo.toml": + return parseTomlDeps(content, CARGO_DEP_SECTION_RE); + case "go.mod": + return parseGoModDeps(content); + case "Gemfile": + return parseGemfileDeps(content); + default: + return new Set(); } +} - return added; +/** + * Return dependency names present in `after` but not in `before`. + * `before` may be empty string (file added on this branch). + */ +export function diffManifestDeps( + before: string, + after: string, + manifestName: string +): string[] { + const beforeDeps = parseManifestDeps(before, manifestName); + const afterDeps = parseManifestDeps(after, manifestName); + return [...afterDeps].filter((d) => !beforeDeps.has(d)); } /** @@ -235,19 +295,25 @@ if (import.meta.main) { const allAddedDeps: { name: string; manifest: string }[] = []; for (const manifest of manifests) { - if (changedFiles.includes(manifest)) { - const mDiff = Bun.spawnSync([ + if (!changedFiles.includes(manifest)) continue; + + const fetchAt = (ref: string): string => { + const proc = Bun.spawnSync([ "git", - "diff", - `${baseBranch}...${prBranch}`, - "--", - manifest, + "show", + `${ref}:${manifest}`, ]); - const diffText = new TextDecoder().decode(mDiff.stdout); - const added = extractAddedDeps(diffText, manifest); - for (const name of added) { - allAddedDeps.push({ name, manifest }); - } + // Exit code is non-zero when the file did not exist at that ref. + return proc.exitCode === 0 + ? new TextDecoder().decode(proc.stdout) + : ""; + }; + + const before = fetchAt(baseBranch); + const after = fetchAt(prBranch); + const added = diffManifestDeps(before, after, manifest); + for (const name of added) { + allAddedDeps.push({ name, manifest }); } } diff --git a/ghcp-handoff/bin/ghcp-verify-pr-body.ts b/ghcp-handoff/bin/ghcp-verify-pr-body.ts index da1c0686f3..810fd4124f 100644 --- a/ghcp-handoff/bin/ghcp-verify-pr-body.ts +++ b/ghcp-handoff/bin/ghcp-verify-pr-body.ts @@ -26,22 +26,37 @@ export interface PrBodyResult { /** * Check the PR body for required sections. - * Searches for each section name as a case-insensitive substring. + * + * A section counts as present only when its name appears as a Markdown + * heading (`#`, `##`, …) or as a bolded label at the start of a line + * (`**Summary**` or `**Summary:**`). Matching is case-insensitive. Plain + * prose mentions like "this PR includes a summary of changes" do NOT count. */ export function checkPrBody( prBody: string, requiredSections: string[] ): PrBodyResult { - const bodyLower = prBody.toLowerCase(); + const lines = prBody.split("\n"); const missing: string[] = []; const present: string[] = []; for (const section of requiredSections) { - if (bodyLower.includes(section.toLowerCase())) { - present.push(section); - } else { - missing.push(section); - } + const sectionLower = section.toLowerCase(); + const found = lines.some((line) => { + const trimmed = line.trim().toLowerCase(); + // Markdown heading: ^#{1,6}\s+
[optional trailing punctuation]$ + const headingMatch = trimmed.match(/^#{1,6}\s+(.+?)[:.\s]*$/); + if (headingMatch && headingMatch[1].trim() === sectionLower) return true; + // Bold label: ^\*\*
[:]?\*\*[:]? (e.g. **Summary**, **Summary:**, **Summary**:) + const boldMatch = trimmed.match(/^\*\*(.+?)\*\*/); + if (boldMatch) { + const label = boldMatch[1].replace(/[:.\s]+$/, "").trim(); + if (label === sectionLower) return true; + } + return false; + }); + if (found) present.push(section); + else missing.push(section); } return { diff --git a/ghcp-handoff/templates/prompt-template.md b/ghcp-handoff/templates/prompt-template.md index 0d2c6a05b1..689e58fc49 100644 --- a/ghcp-handoff/templates/prompt-template.md +++ b/ghcp-handoff/templates/prompt-template.md @@ -1,11 +1,11 @@ -You are implementing {{task_summary}} for {{project_name}}. {{phase_note}}. +You are implementing {{task_summary}}. {{phase_note}}. DO NOT implement anything beyond {{stub_policy}}. IMPORTANT: Work in a clean git worktree on the branch `{{target_branch}}`. Create the branch from the repo's default branch if it doesn't exist. Do not work directly in the main worktree — other agents may be active there. -Context files to read first (in the repo at {{repo_abs_path}}): +Context files to read first (in the repo root): {{context_files_bullets}} Hard boundaries (IF YOU CROSS THESE, THE PR WILL BE REJECTED): @@ -17,12 +17,12 @@ What to deliver in one PR against branch `{{target_branch}}`: PR description must include: - Summary of what shipped. -- Manual followups required from {{user_name}}. +- Manual followups required from the reviewer. - Any dependency decisions that deviated from the plan. - Any interface signatures you were unsure about (flag them for review). Open the PR against `{{target_branch}}` (create the branch if it doesn't exist). -DO NOT merge to {{forbidden_branches}}. +DO NOT merge to main, master, dev, develop, production, or staging.