diff --git a/API.md b/API.md index 3814bb8..0eade36 100644 --- a/API.md +++ b/API.md @@ -733,7 +733,7 @@ interfacectl summarize-generation-session --session-dir ### `compare-generation-sessions` -Compares one unguided baseline session against one prepared guided session for the same implementation brief. +Compares two tracked generation sessions for the same implementation brief. **Synopsis:** ```bash @@ -742,7 +742,7 @@ interfacectl compare-generation-sessions --baseline-session-dir --guided- **Description:** - Requires both sessions to target the same surface, use the same tool, and freeze the same brief file. -- Requires `guidanceMode=unguided` for the baseline session and `guidanceMode=prepared` for the guided session. +- Works with any valid guidance-strategy pair; the output records each session’s concrete strategy. - Computes first-attempt finding deltas, attempts-to-acceptable-outcome delta, rubric deltas, and goal checks. - Writes `comparison.json` and `comparison.md`. - Canonical schema lives at `packages/interfacectl-cli/schemas/generation-session-comparison.schema.json`. @@ -809,7 +809,7 @@ interfacectl summarize-generation-benchmark --comparisons [--su ``` **Description:** -- Summarizes whether guided sessions reduced first-attempt blocking findings, reached acceptable outcomes no later, and improved rubric dimensions. +- Summarizes whether the compared candidate sessions reduced first-attempt blocking findings, reached acceptable outcomes no later, and improved rubric dimensions. - Aggregates accepted/rejected/proposed suggestion counts across surfaces. - Writes `benchmark-report.json` and `benchmark-report.md`. - Canonical schema lives at `packages/interfacectl-cli/schemas/generation-benchmark-report.schema.json`. diff --git a/README.md b/README.md index 6874337..a9ef865 100644 --- a/README.md +++ b/README.md @@ -199,7 +199,7 @@ interfacectl summarize-generation-session --session-dir ### `compare-generation-sessions` -Compares one unguided baseline session against one guided prepared session for the same brief and writes deterministic comparison artifacts. +Compares two tracked generation sessions for the same brief and writes deterministic comparison artifacts. ```bash interfacectl compare-generation-sessions --baseline-session-dir --guided-session-dir [--out-dir ] diff --git a/docs/ai-generator-adapter-quickstart.md b/docs/ai-generator-adapter-quickstart.md index 710a50c..45ac8af 100644 --- a/docs/ai-generator-adapter-quickstart.md +++ b/docs/ai-generator-adapter-quickstart.md @@ -6,12 +6,12 @@ Use this flow when a local agent or hosted generator needs contract-aware guidan 1. Compile the contract into a generation bundle. 2. For local agents, resolve the bundle into one agent-ready payload with `prepare-generation`. -3. Freeze one tracked session with `init-generation-session` when you want iteration evidence or a guided-vs-unguided benchmark. +3. Freeze one tracked session with `init-generation-session` when you want iteration evidence or a strategy benchmark. 4. Generate or edit UI. 5. Run `record-generation-attempt` for each attempt. 6. Optionally run `review-generation-attempt` when a `warn` result is explicitly acceptable. 7. Run `summarize-generation-session` to aggregate progress. -8. Use `compare-generation-sessions`, `suggest-contract-deltas`, and `summarize-generation-benchmark` when you are proving guided-vs-unguided outcomes. +8. Use `compare-generation-sessions`, `suggest-contract-deltas`, and `summarize-generation-benchmark` when you are proving one guidance strategy against another. 9. Use `validate-generation` directly when you need an ad hoc post-generation check without a tracked session. ## Step 1: compile the bundle @@ -82,7 +82,7 @@ interfacectl init-generation-session \ --bundle-root ./artifacts/generation-bundles/surfaces-web \ --surface surfaces-web \ --workspace-root . \ - --guidance-mode prepared \ + --guidance-strategy prompt-summary \ --brief-file ./artifacts/generation-briefs/surfaces-web.md interfacectl record-generation-attempt \ @@ -100,15 +100,15 @@ interfacectl summarize-generation-session \ This loop freezes the bundle revision, records each assessment, and emits canonical run artifacts for downstream consumers. -For an A/B proof loop, run one session with `--guidance-mode unguided` and the same `--brief-file`, run another with `--guidance-mode prepared`, then compare them: +For an A/B proof loop, run two sessions with the same `--brief-file`, then compare the strategies you want to evaluate. For example, compare `prompt-summary` against `json-primary`: ```bash interfacectl compare-generation-sessions \ - --baseline-session-dir ./artifacts/generation-sessions/surfaces-web/baseline-unguided \ - --guided-session-dir ./artifacts/generation-sessions/surfaces-web/guided-prepared + --baseline-session-dir ./artifacts/generation-sessions/surfaces-web/prompt-summary \ + --guided-session-dir ./artifacts/generation-sessions/surfaces-web/json-primary interfacectl suggest-contract-deltas \ - --session-dir ./artifacts/generation-sessions/surfaces-web/guided-prepared + --session-dir ./artifacts/generation-sessions/surfaces-web/json-primary ``` ## HTTP mode diff --git a/docs/generator-consumption.md b/docs/generator-consumption.md index 6abb7ef..95f4e18 100644 --- a/docs/generator-consumption.md +++ b/docs/generator-consumption.md @@ -17,7 +17,7 @@ For workspace agents: 1. Run `interfacectl compile --contract --out `. 2. Run `interfacectl prepare-generation --bundle-root --surface `. -3. Optionally run `interfacectl init-generation-session --bundle-root --surface --workspace-root --guidance-mode prepared --brief-file ` when you want tracked iteration evidence or a benchmark-ready guided session. +3. Optionally run `interfacectl init-generation-session --bundle-root --surface --workspace-root --guidance-strategy --brief-file ` when you want tracked iteration evidence or a benchmark-ready session. 4. Feed the resulting prepared JSON into the agent. 5. Generate only inside the surface-owned boundary. 6. Either run `interfacectl validate-generation --mode workspace` directly, or run `interfacectl record-generation-attempt` for a tracked session. @@ -80,7 +80,7 @@ When you need auditable iteration history, use the canonical session commands ra 3. `interfacectl review-generation-attempt` when a warning is explicitly acceptable 4. `interfacectl summarize-generation-session` -For the guided-vs-unguided proof loop, compare two sessions that froze the same `--brief-file`: +For the strategy-benchmark loop, compare two sessions that froze the same `--brief-file`, such as `prompt-summary` vs `json-primary` or `unguided` vs a guided strategy: 1. `interfacectl compare-generation-sessions` 2. `interfacectl suggest-contract-deltas` diff --git a/packages/interfacectl-cli/dist/commands/generation-session.d.ts b/packages/interfacectl-cli/dist/commands/generation-session.d.ts index ac58bca..7d0149d 100644 --- a/packages/interfacectl-cli/dist/commands/generation-session.d.ts +++ b/packages/interfacectl-cli/dist/commands/generation-session.d.ts @@ -5,9 +5,18 @@ export interface InitGenerationSessionCommandOptions { tool?: string; sessionId?: string; artifactsRoot?: string; + guidanceStrategy?: string; guidanceMode?: string; briefFile?: string; } +export interface PrepareGenerationHandoffCommandOptions { + sessionDir?: string; + guidanceStrategy?: string; + acceptedSuggestionsFile?: string; + designerNotesFile?: string; + findingCodes?: string; + outPath?: string; +} export interface RecordGenerationAttemptCommandOptions { sessionDir?: string; assessmentFile?: string; @@ -47,6 +56,7 @@ export interface SummarizeGenerationBenchmarkCommandOptions { outDir?: string; } export declare function runInitGenerationSessionCommand(options: InitGenerationSessionCommandOptions): Promise; +export declare function runPrepareGenerationHandoffCommand(options: PrepareGenerationHandoffCommandOptions): Promise; export declare function runRecordGenerationAttemptCommand(options: RecordGenerationAttemptCommandOptions): Promise; export declare function runCaptureGenerationPreviewCommand(options: CaptureGenerationPreviewCommandOptions): Promise; export declare function runReviewGenerationAttemptCommand(options: ReviewGenerationAttemptCommandOptions): Promise; diff --git a/packages/interfacectl-cli/dist/commands/generation-session.d.ts.map b/packages/interfacectl-cli/dist/commands/generation-session.d.ts.map index 7431227..714f39a 100644 --- a/packages/interfacectl-cli/dist/commands/generation-session.d.ts.map +++ b/packages/interfacectl-cli/dist/commands/generation-session.d.ts.map @@ -1 +1 @@ -{"version":3,"file":"generation-session.d.ts","sourceRoot":"","sources":["../../src/commands/generation-session.ts"],"names":[],"mappings":"AA8BA,MAAM,WAAW,mCAAmC;IAClD,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,aAAa,CAAC,EAAE,MAAM,CAAC;IACvB,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,aAAa,CAAC,EAAE,MAAM,CAAC;IACvB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,SAAS,CAAC,EAAE,MAAM,CAAC;CACpB;AAED,MAAM,WAAW,qCAAqC;IACpD,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,cAAc,CAAC,EAAE,MAAM,CAAC;CACzB;AAED,MAAM,WAAW,sCAAsC;IACrD,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,aAAa,CAAC,EAAE,MAAM,GAAG,MAAM,CAAC;IAChC,GAAG,CAAC,EAAE,MAAM,CAAC;IACb,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,gBAAgB,CAAC,EAAE,MAAM,CAAC;CAC3B;AAED,MAAM,WAAW,qCAAqC;IACpD,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,aAAa,CAAC,EAAE,MAAM,GAAG,MAAM,CAAC;IAChC,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,MAAM,WAAW,wCAAwC;IACvD,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,MAAM,WAAW,uCAAuC;IACtD,kBAAkB,CAAC,EAAE,MAAM,CAAC;IAC5B,gBAAgB,CAAC,EAAE,MAAM,CAAC;IAC1B,MAAM,CAAC,EAAE,MAAM,CAAC;CACjB;AAED,MAAM,WAAW,mCAAmC;IAClD,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,OAAO,CAAC,EAAE,MAAM,CAAC;CAClB;AAED,MAAM,WAAW,4CAA4C;IAC3D,eAAe,CAAC,EAAE,MAAM,CAAC;IACzB,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,OAAO,CAAC,EAAE,MAAM,CAAC;CAClB;AAED,MAAM,WAAW,0CAA0C;IACzD,eAAe,CAAC,EAAE,MAAM,CAAC;IACzB,eAAe,CAAC,EAAE,MAAM,CAAC;IACzB,MAAM,CAAC,EAAE,MAAM,CAAC;CACjB;AA6/CD,wBAAsB,+BAA+B,CACnD,OAAO,EAAE,mCAAmC,GAC3C,OAAO,CAAC,MAAM,CAAC,CAyEjB;AAED,wBAAsB,iCAAiC,CACrD,OAAO,EAAE,qCAAqC,GAC7C,OAAO,CAAC,MAAM,CAAC,CA8FjB;AAED,wBAAsB,kCAAkC,CACtD,OAAO,EAAE,sCAAsC,GAC9C,OAAO,CAAC,MAAM,CAAC,CA0GjB;AAED,wBAAsB,iCAAiC,CACrD,OAAO,EAAE,qCAAqC,GAC7C,OAAO,CAAC,MAAM,CAAC,CAuEjB;AAED,wBAAsB,oCAAoC,CACxD,OAAO,EAAE,wCAAwC,GAChD,OAAO,CAAC,MAAM,CAAC,CAmBjB;AAED,wBAAsB,mCAAmC,CACvD,OAAO,EAAE,uCAAuC,GAC/C,OAAO,CAAC,MAAM,CAAC,CA4CjB;AAED,wBAAsB,+BAA+B,CACnD,OAAO,EAAE,mCAAmC,GAC3C,OAAO,CAAC,MAAM,CAAC,CAuCjB;AAED,wBAAsB,wCAAwC,CAC5D,OAAO,EAAE,4CAA4C,GACpD,OAAO,CAAC,MAAM,CAAC,CA8EjB;AAED,wBAAsB,sCAAsC,CAC1D,OAAO,EAAE,0CAA0C,GAClD,OAAO,CAAC,MAAM,CAAC,CA4FjB"} \ No newline at end of file +{"version":3,"file":"generation-session.d.ts","sourceRoot":"","sources":["../../src/commands/generation-session.ts"],"names":[],"mappings":"AA8BA,MAAM,WAAW,mCAAmC;IAClD,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,aAAa,CAAC,EAAE,MAAM,CAAC;IACvB,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,aAAa,CAAC,EAAE,MAAM,CAAC;IACvB,gBAAgB,CAAC,EAAE,MAAM,CAAC;IAC1B,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,SAAS,CAAC,EAAE,MAAM,CAAC;CACpB;AAED,MAAM,WAAW,sCAAsC;IACrD,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,gBAAgB,CAAC,EAAE,MAAM,CAAC;IAC1B,uBAAuB,CAAC,EAAE,MAAM,CAAC;IACjC,iBAAiB,CAAC,EAAE,MAAM,CAAC;IAC3B,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,OAAO,CAAC,EAAE,MAAM,CAAC;CAClB;AAED,MAAM,WAAW,qCAAqC;IACpD,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,cAAc,CAAC,EAAE,MAAM,CAAC;CACzB;AAED,MAAM,WAAW,sCAAsC;IACrD,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,aAAa,CAAC,EAAE,MAAM,GAAG,MAAM,CAAC;IAChC,GAAG,CAAC,EAAE,MAAM,CAAC;IACb,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,gBAAgB,CAAC,EAAE,MAAM,CAAC;CAC3B;AAED,MAAM,WAAW,qCAAqC;IACpD,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,aAAa,CAAC,EAAE,MAAM,GAAG,MAAM,CAAC;IAChC,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,MAAM,WAAW,wCAAwC;IACvD,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,MAAM,WAAW,uCAAuC;IACtD,kBAAkB,CAAC,EAAE,MAAM,CAAC;IAC5B,gBAAgB,CAAC,EAAE,MAAM,CAAC;IAC1B,MAAM,CAAC,EAAE,MAAM,CAAC;CACjB;AAED,MAAM,WAAW,mCAAmC;IAClD,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,OAAO,CAAC,EAAE,MAAM,CAAC;CAClB;AAED,MAAM,WAAW,4CAA4C;IAC3D,eAAe,CAAC,EAAE,MAAM,CAAC;IACzB,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,OAAO,CAAC,EAAE,MAAM,CAAC;CAClB;AAED,MAAM,WAAW,0CAA0C;IACzD,eAAe,CAAC,EAAE,MAAM,CAAC;IACzB,eAAe,CAAC,EAAE,MAAM,CAAC;IACzB,MAAM,CAAC,EAAE,MAAM,CAAC;CACjB;AAohED,wBAAsB,+BAA+B,CACnD,OAAO,EAAE,mCAAmC,GAC3C,OAAO,CAAC,MAAM,CAAC,CA8EjB;AAED,wBAAsB,kCAAkC,CACtD,OAAO,EAAE,sCAAsC,GAC9C,OAAO,CAAC,MAAM,CAAC,CA6DjB;AAED,wBAAsB,iCAAiC,CACrD,OAAO,EAAE,qCAAqC,GAC7C,OAAO,CAAC,MAAM,CAAC,CA+FjB;AAED,wBAAsB,kCAAkC,CACtD,OAAO,EAAE,sCAAsC,GAC9C,OAAO,CAAC,MAAM,CAAC,CA0GjB;AAED,wBAAsB,iCAAiC,CACrD,OAAO,EAAE,qCAAqC,GAC7C,OAAO,CAAC,MAAM,CAAC,CAuEjB;AAED,wBAAsB,oCAAoC,CACxD,OAAO,EAAE,wCAAwC,GAChD,OAAO,CAAC,MAAM,CAAC,CAmBjB;AAED,wBAAsB,mCAAmC,CACvD,OAAO,EAAE,uCAAuC,GAC/C,OAAO,CAAC,MAAM,CAAC,CA4CjB;AAED,wBAAsB,+BAA+B,CACnD,OAAO,EAAE,mCAAmC,GAC3C,OAAO,CAAC,MAAM,CAAC,CAuCjB;AAED,wBAAsB,wCAAwC,CAC5D,OAAO,EAAE,4CAA4C,GACpD,OAAO,CAAC,MAAM,CAAC,CA8EjB;AAED,wBAAsB,sCAAsC,CAC1D,OAAO,EAAE,0CAA0C,GAClD,OAAO,CAAC,MAAM,CAAC,CAuIjB"} \ No newline at end of file diff --git a/packages/interfacectl-cli/dist/commands/generation-session.js b/packages/interfacectl-cli/dist/commands/generation-session.js index 851abf7..0be0b39 100644 --- a/packages/interfacectl-cli/dist/commands/generation-session.js +++ b/packages/interfacectl-cli/dist/commands/generation-session.js @@ -9,7 +9,7 @@ import { emitContractRunArtifact, } from "../utils/run-artifacts.js"; import { writeDeterministicJsonSync } from "../utils/deterministic-json.js"; const VALID_TOOLS = new Set(["codex", "cursor", "local-llm"]); const VALID_GRADES = new Set(["strong", "partial", "weak"]); -const VALID_GUIDANCE_MODES = new Set(["prepared", "unguided"]); +const VALID_GUIDANCE_STRATEGIES = new Set(["prompt-summary", "json-primary", "unguided"]); const VALID_REVIEW_STATUSES = new Set(["accepted", "rejected"]); const VALID_SUGGESTION_STATUSES = new Set(["proposed", "accepted", "rejected"]); const VALID_SUCCESS_RULES = new Set(["pass", "pass-or-reviewed-warn"]); @@ -75,12 +75,13 @@ function ensureSessionTool(tool) { } return normalized; } -function ensureGuidanceMode(guidanceMode) { - const normalized = typeof guidanceMode === "string" ? guidanceMode.trim().toLowerCase() : "prepared"; - if (!VALID_GUIDANCE_MODES.has(normalized)) { - throw new SessionInputError(`Invalid --guidance-mode value "${guidanceMode ?? ""}". Expected prepared|unguided.`); +function ensureGuidanceStrategy(guidanceStrategy) { + const normalized = typeof guidanceStrategy === "string" ? guidanceStrategy.trim().toLowerCase() : "prompt-summary"; + const mapped = normalized === "prepared" ? "prompt-summary" : normalized; + if (!VALID_GUIDANCE_STRATEGIES.has(mapped)) { + throw new SessionInputError(`Invalid guidance strategy "${guidanceStrategy ?? ""}". Expected prompt-summary|json-primary|unguided.`); } - return normalized; + return mapped; } function buildDefaultSessionId() { return new Date().toISOString().replace(/[-:]/g, "").replace(/\.\d{3}Z$/, "Z"); @@ -97,6 +98,7 @@ function getSessionPaths(sessionDir) { sessionPath: path.join(sessionDir, "session.json"), bundleRoot: path.join(sessionDir, "bundle"), preparedInputPath: path.join(sessionDir, "prepared-input.json"), + guidanceHandoffPath: path.join(sessionDir, "guidance-handoff.json"), attemptsDir: path.join(sessionDir, "attempts"), summaryJsonPath: path.join(sessionDir, "summary.json"), summaryMarkdownPath: path.join(sessionDir, "summary.md"), @@ -140,6 +142,33 @@ function normalizeAssessment(payload, filePath, options = {}) { .map((entry) => (typeof entry === "string" ? entry.trim() : "")) .filter(Boolean))].sort((left, right) => left.localeCompare(right)); } + let heuristics; + if (payload.heuristics !== undefined) { + const candidate = asRecord(payload.heuristics); + heuristics = {}; + const numericField = (key, allowNull = false) => { + const value = candidate[key]; + if (value === undefined) { + return; + } + if (value === null && allowNull) { + heuristics[key] = null; + return; + } + if (typeof value !== "number" || !Number.isFinite(value)) { + throw new SessionInputError(`Assessment heuristic "${String(key)}" must be a finite number${allowNull ? " or null" : ""}: ${filePath}.`); + } + heuristics[key] = value; + }; + numericField("unresolvedAcceptedSuggestionCount"); + numericField("unresolvedAcceptedSuggestionRate", true); + numericField("noChangesAfterEditFailureCount"); + numericField("recoverableToolErrorCount"); + numericField("touchedFilesPerResolvedFinding", true); + if (Object.keys(heuristics).length === 0) { + heuristics = undefined; + } + } return { structure: grade("structure"), components: grade("components"), @@ -148,6 +177,7 @@ function normalizeAssessment(payload, filePath, options = {}) { responsiveness: grade("responsiveness"), notes, ...(touchedFiles && touchedFiles.length > 0 ? { touchedFiles } : {}), + ...(heuristics ? { heuristics } : {}), }; } function loadAssessment(assessmentPath) { @@ -272,11 +302,11 @@ function loadSession(sessionDirInput) { } const payload = readJsonFile(paths.sessionPath, "generation session"); const schemaVersion = Number(payload.schemaVersion ?? 1); - if (schemaVersion !== 1 && schemaVersion !== 2) { + if (schemaVersion !== 1 && schemaVersion !== 2 && schemaVersion !== 3) { throw new SessionInputError(`Unsupported generation session schemaVersion "${String(payload.schemaVersion ?? "unknown")}".`); } const tool = ensureSessionTool(asString(payload.tool)); - const guidanceMode = ensureGuidanceMode(asString(payload.guidanceMode) ?? "prepared"); + const guidanceStrategy = ensureGuidanceStrategy(asString(payload.guidanceStrategy) ?? asString(payload.guidanceMode) ?? "prompt-summary"); const finalStatus = asString(asRecord(payload.successRule).finalStatus) ?? "pass"; if (!VALID_SUCCESS_RULES.has(finalStatus)) { throw new SessionInputError(`Unsupported session successRule.finalStatus "${finalStatus}".`); @@ -284,12 +314,13 @@ function loadSession(sessionDirInput) { const briefRecord = asRecord(payload.brief); const briefPath = asString(briefRecord.path); const briefSha256 = asString(briefRecord.sha256); + const guidanceArtifacts = asRecord(payload.guidanceArtifacts); const session = { - schemaVersion: 2, + schemaVersion: 3, surfaceId: asString(payload.surfaceId) ?? "", sessionId: asString(payload.sessionId) ?? "", tool, - guidanceMode, + guidanceStrategy, workspaceRoot: asString(payload.workspaceRoot) ?? "", sourceBundleRoot: asString(payload.sourceBundleRoot) ?? "", sessionDir: asString(payload.sessionDir) ?? sessionDir, @@ -297,13 +328,26 @@ function loadSession(sessionDirInput) { preparedInputPath: typeof payload.preparedInputPath === "string" ? payload.preparedInputPath : null, contractPath: asString(payload.contractPath) ?? "", repairMapPath: asString(payload.repairMapPath) ?? "", + guidanceArtifacts: { + baseHandoffPath: typeof guidanceArtifacts.baseHandoffPath === "string" + ? guidanceArtifacts.baseHandoffPath + : fs.existsSync(paths.guidanceHandoffPath) + ? paths.guidanceHandoffPath + : null, + }, startedAt: asString(payload.startedAt) ?? "", ...(briefPath && briefSha256 ? { brief: { path: briefPath, sha256: briefSha256 } } : {}), successRule: { finalStatus: finalStatus, }, }; - if (!session.surfaceId || !session.sessionId || !session.workspaceRoot || !session.bundleRoot || !session.contractPath || !session.repairMapPath || !session.startedAt) { + if (!session.surfaceId + || !session.sessionId + || !session.workspaceRoot + || !session.bundleRoot + || !session.contractPath + || !session.repairMapPath + || !session.startedAt) { throw new SessionInputError(`Generation session is missing required fields: ${paths.sessionPath}.`); } return { @@ -388,6 +432,31 @@ function buildRecurringCounts(values) { .map(([code, count]) => ({ code, count })) .sort((left, right) => right.count - left.count || left.code.localeCompare(right.code)); } +function repeatedFindingCarryoverCount(recurringFindingCodes) { + return recurringFindingCodes.reduce((total, entry) => total + Math.max(0, entry.count - 1), 0); +} +function rerunsToAcceptableOutcome(firstAcceptableAttempt) { + if (firstAcceptableAttempt === null) { + return null; + } + return Math.max(0, firstAcceptableAttempt - 1); +} +function numericHeuristicDelta(baseline, guided) { + if (baseline === null || baseline === undefined || guided === null || guided === undefined) { + return null; + } + return guided - baseline; +} +function countHeuristicImprovement(values) { + return values.reduce((total, value) => total + (typeof value === "number" && value < 0 ? 1 : 0), 0); +} +function averageNullable(values) { + const filtered = values.filter((value) => typeof value === "number" && Number.isFinite(value)); + if (filtered.length === 0) { + return null; + } + return Math.round((filtered.reduce((sum, value) => sum + value, 0) / filtered.length) * 1000) / 1000; +} function renderSummaryMarkdown(summary) { const lines = [ "# Generation Session Summary", @@ -395,7 +464,7 @@ function renderSummaryMarkdown(summary) { `Surface: ${summary.surfaceId}`, `Session: ${summary.sessionId}`, `Tool: ${summary.tool}`, - `Guidance mode: ${summary.guidanceMode}`, + `Guidance strategy: ${summary.guidanceStrategy}`, `Latest status: ${summary.latestStatus}`, `Latest outcome: ${summary.latestOutcome}`, `Attempts: ${summary.attemptCount}`, @@ -431,10 +500,28 @@ function renderSummaryMarkdown(summary) { if (summary.latestAssessment?.touchedFiles?.length) { lines.push(`- touched files: ${summary.latestAssessment.touchedFiles.join(", ")}`); } + if (summary.latestAssessment?.heuristics) { + if (typeof summary.latestAssessment.heuristics.unresolvedAcceptedSuggestionRate === "number") { + lines.push(`- unresolved accepted suggestion rate: ${summary.latestAssessment.heuristics.unresolvedAcceptedSuggestionRate}`); + } + if (typeof summary.latestAssessment.heuristics.noChangesAfterEditFailureCount === "number") { + lines.push(`- noChanges-after-edit failures: ${summary.latestAssessment.heuristics.noChangesAfterEditFailureCount}`); + } + if (typeof summary.latestAssessment.heuristics.recoverableToolErrorCount === "number") { + lines.push(`- recoverable tool errors: ${summary.latestAssessment.heuristics.recoverableToolErrorCount}`); + } + if (typeof summary.latestAssessment.heuristics.touchedFilesPerResolvedFinding === "number") { + lines.push(`- touched files per resolved finding: ${summary.latestAssessment.heuristics.touchedFilesPerResolvedFinding}`); + } + } if (summary.latestReview) { lines.push(`- latest review: ${summary.latestReview.status} (${summary.latestReview.findingCodes.join(", ")})`); lines.push(`- review rationale: ${summary.latestReview.rationale}`); } + lines.push("", "## Heuristics"); + lines.push(`- repeated finding carryover count: ${summary.heuristics.repeatedFindingCarryoverCount}`); + lines.push(`- reruns to acceptable outcome: ${summary.heuristics.rerunsToAcceptableOutcome ?? "n/a"}`); + lines.push(`- base guidance handoff: ${summary.paths.guidanceHandoffPath ?? "none"}`); return `${lines.join("\n")}\n`; } function getSuccessOutcome(status, review, findingCodes, successRule) { @@ -527,12 +614,17 @@ function buildGenerationSessionSummary(sessionDirInput) { throw new SessionInputError(`Unsupported validate status "${String(latestStatus)}" in ${latestAttempt.validatePath}.`); } const latestOutcome = getSuccessOutcome(latestStatus, latestAttempt.review, parseFindingCodes(latestAttempt.validate), session.successRule.finalStatus); + const heuristics = { + latestAttempt: latestAssessment.heuristics ?? {}, + repeatedFindingCarryoverCount: repeatedFindingCarryoverCount(recurringFindingCodes), + rerunsToAcceptableOutcome: rerunsToAcceptableOutcome(firstAcceptableAttempt), + }; const summary = { - schemaVersion: 3, + schemaVersion: 4, surfaceId: session.surfaceId, sessionId: session.sessionId, tool: session.tool, - guidanceMode: session.guidanceMode, + guidanceStrategy: session.guidanceStrategy, attemptCount: attempts.length, firstPassAttempt, firstAcceptableAttempt, @@ -542,12 +634,14 @@ function buildGenerationSessionSummary(sessionDirInput) { recurringRepairCodes, latestAssessment, latestReview: latestAttempt.review, + heuristics, ...(session.brief ? { brief: session.brief } : {}), successRule: session.successRule, paths: { sessionPath: paths.sessionPath, bundleRoot: session.bundleRoot, preparedInputPath: session.preparedInputPath, + guidanceHandoffPath: session.guidanceArtifacts.baseHandoffPath, }, attempts: attempts.map((attempt) => { const status = attempt.validate.status; @@ -628,19 +722,19 @@ function renderComparisonMarkdown(comparison) { "", `Surface: ${comparison.surfaceId}`, `Tool: ${comparison.tool}`, - `Baseline session: ${comparison.baseline.sessionId}`, - `Guided session: ${comparison.guided.sessionId}`, + `Baseline session: ${comparison.baseline.sessionId} (${comparison.baseline.guidanceStrategy})`, + `Candidate session: ${comparison.guided.sessionId} (${comparison.guided.guidanceStrategy})`, `Meets goal: ${comparison.checks.meetsGoal ? "yes" : "no"}`, "", "## First attempt", `- baseline outcome: ${comparison.baseline.firstAttempt.outcome}`, - `- guided outcome: ${comparison.guided.firstAttempt.outcome}`, + `- candidate outcome: ${comparison.guided.firstAttempt.outcome}`, `- blocking finding delta: ${comparison.delta.firstAttemptBlockingFindingCountDelta}`, `- warning finding delta: ${comparison.delta.firstAttemptWarningFindingCountDelta}`, "", "## Convergence", `- baseline first acceptable attempt: ${comparison.baseline.firstAcceptableAttempt ?? "not reached"}`, - `- guided first acceptable attempt: ${comparison.guided.firstAcceptableAttempt ?? "not reached"}`, + `- candidate first acceptable attempt: ${comparison.guided.firstAcceptableAttempt ?? "not reached"}`, `- attempts-to-acceptable delta: ${comparison.delta.attemptsToAcceptableOutcome.delta ?? "n/a"}`, "", "## Rubric delta", @@ -650,8 +744,15 @@ function renderComparisonMarkdown(comparison) { lines.push(`- ${dimension}: ${rubric.baseline} -> ${rubric.guided} (${rubric.delta})`); } if (comparison.checks.guidedRubricBetterDimensions.length > 0) { - lines.push("", `Guided improved dimensions: ${comparison.checks.guidedRubricBetterDimensions.join(", ")}`); - } + lines.push("", `Candidate improved dimensions: ${comparison.checks.guidedRubricBetterDimensions.join(", ")}`); + } + lines.push("", "## Heuristics"); + lines.push(`- unresolved accepted suggestion rate delta: ${comparison.heuristics.delta.unresolvedAcceptedSuggestionRate ?? "n/a"}`); + lines.push(`- noChanges-after-edit failure delta: ${comparison.heuristics.delta.noChangesAfterEditFailureCount}`); + lines.push(`- recoverable tool error delta: ${comparison.heuristics.delta.recoverableToolErrorCount}`); + lines.push(`- touched files per resolved finding delta: ${comparison.heuristics.delta.touchedFilesPerResolvedFinding ?? "n/a"}`); + lines.push(`- repeated finding carryover delta: ${comparison.heuristics.delta.repeatedFindingCarryoverCount}`); + lines.push(`- reruns to acceptable delta: ${comparison.heuristics.delta.rerunsToAcceptableOutcome ?? "n/a"}`); return `${lines.join("\n")}\n`; } function renderSuggestionsMarkdown(artifact) { @@ -661,7 +762,7 @@ function renderSuggestionsMarkdown(artifact) { `Surface: ${artifact.surfaceId}`, `Session: ${artifact.sessionId}`, `Tool: ${artifact.tool}`, - `Guidance mode: ${artifact.guidanceMode}`, + `Guidance strategy: ${artifact.guidanceStrategy}`, "", ]; if (artifact.suggestions.length === 0) { @@ -691,18 +792,25 @@ function renderBenchmarkReportMarkdown(report) { `Generated at: ${report.generatedAt}`, `Surfaces: ${report.overall.surfaceCount}`, `Surfaces meeting goal: ${report.overall.surfacesMeetingGoal}`, - `Guided fewer first-attempt blocking findings: ${report.overall.guidedFewerFirstAttemptBlockingFindings}`, - `Guided reached acceptable no later: ${report.overall.guidedReachedAcceptableNoLater}`, + `Candidate fewer first-attempt blocking findings: ${report.overall.guidedFewerFirstAttemptBlockingFindings}`, + `Candidate reached acceptable no later: ${report.overall.guidedReachedAcceptableNoLater}`, "", "## Comparisons", ]; for (const comparison of report.comparisons) { - lines.push(`- ${comparison.surfaceId}: meetsGoal=${comparison.meetsGoal}, improved dimensions=${comparison.guidedRubricBetterDimensions.join(", ") || "none"}`); + lines.push(`- ${comparison.surfaceId}: baseline=${comparison.baselineGuidanceStrategy}, candidate=${comparison.guidedGuidanceStrategy}, meetsGoal=${comparison.meetsGoal}, improved dimensions=${comparison.guidedRubricBetterDimensions.join(", ") || "none"}`); } lines.push("", "## Suggestion decisions"); for (const suggestion of report.suggestions) { lines.push(`- ${suggestion.surfaceId}: proposed=${suggestion.proposedCount}, accepted=${suggestion.acceptedCount}, rejected=${suggestion.rejectedCount}`); } + lines.push("", "## Heuristic improvements"); + lines.push(`- lower unresolved accepted suggestion rate: ${report.overall.heuristics.lowerUnresolvedAcceptedSuggestionRate}`); + lines.push(`- lower noChanges-after-edit failures: ${report.overall.heuristics.lowerNoChangesAfterEditFailureCount}`); + lines.push(`- lower recoverable tool errors: ${report.overall.heuristics.lowerRecoverableToolErrorCount}`); + lines.push(`- lower touched files per resolved finding: ${report.overall.heuristics.lowerTouchedFilesPerResolvedFinding}`); + lines.push(`- lower repeated finding carryover count: ${report.overall.heuristics.lowerRepeatedFindingCarryoverCount}`); + lines.push(`- lower reruns to acceptable outcome: ${report.overall.heuristics.lowerRerunsToAcceptableOutcome}`); return `${lines.join("\n")}\n`; } function freezeBriefFile(sessionDir, briefFile) { @@ -726,6 +834,253 @@ function defaultBenchmarkReportDir(comparisonPaths) { const firstPath = comparisonPaths[0]; return path.join(path.dirname(path.dirname(firstPath)), "report"); } +function extractRepairEntries(repairMap) { + if (Array.isArray(repairMap)) { + return repairMap.filter((entry) => isRecord(entry)); + } + const record = asRecord(repairMap); + const repairs = Array.isArray(record.repairs) ? record.repairs : []; + return repairs.filter((entry) => isRecord(entry)); +} +function summarizeContractForSurface(contractPath, surfaceId) { + const payload = readJsonFile(contractPath, "generation session contract"); + const surfaces = Array.isArray(payload.surfaces) ? payload.surfaces.filter((entry) => isRecord(entry)) : []; + const surface = surfaces.find((entry) => asString(entry.id) === surfaceId) ?? surfaces[0] ?? {}; + const sections = Array.isArray(payload.sections) ? payload.sections.filter((entry) => isRecord(entry)) : []; + const color = asRecord(payload.color); + const layout = asRecord(surface.layout); + const requiredSections = asStringArray(surface.requiredSections ?? sections.map((entry) => entry.id)); + const allowedFonts = asStringArray(surface.allowedFonts); + const allowedColors = asStringArray(color.allowedValues); + const maxContentWidth = typeof layout.maxContentWidth === "number" ? layout.maxContentWidth : null; + return [ + `${asString(payload.contractId) ?? surfaceId} v${asString(payload.version) ?? "0.0.0"}`, + asString(payload.description) ?? "Working contract for generation guidance.", + `Required sections: ${requiredSections.join(", ") || "none recorded"}`, + `Fonts: ${allowedFonts.join(", ") || "none recorded"}`, + `Max content width: ${maxContentWidth ?? "not specified"}`, + `Color policy: ${asString(color.policy) ?? "off"}`, + `Allowed colors: ${allowedColors.join(", ") || "none recorded"}`, + ].join("\n"); +} +function buildPreparedPromptSummary(preparedPayload) { + const generation = asRecord(preparedPayload.generation); + const structure = asRecord(generation.structure); + const layout = asRecord(generation.layout); + const visual = asRecord(generation.visual); + const guidance = asRecord(generation.guidance); + const constraints = asRecord(preparedPayload.constraints); + const color = asRecord(constraints.color); + const motion = asRecord(constraints.motion); + const sections = Array.isArray(preparedPayload.sections) ? preparedPayload.sections : []; + const repairs = extractRepairEntries(preparedPayload.repairMap); + const requiredSections = asStringArray(structure.requiredSectionIds); + const focusOrder = asStringArray(guidance.generationFocusOrder); + const allowedFonts = asStringArray(visual.allowedFonts); + const requiredContainers = asStringArray(layout.requiredContainers); + const topRepairs = repairs + .slice(0, 5) + .map((entry) => { + const code = asString(entry.code) ?? "unknown"; + const summary = asString(entry.summary) ?? ""; + return summary ? `${code}: ${summary}` : code; + }); + return [ + `Contract: ${asString(preparedPayload.contract.id) ?? "unknown"} v${asString(preparedPayload.contract.version) ?? "0.0.0"}`, + `Focus order: ${focusOrder.join(", ") || "none"}`, + `Required sections: ${requiredSections.join(", ") || "none"}`, + `Section count: ${sections.length}`, + `Allowed fonts: ${allowedFonts.join(", ") || "none"}`, + `Max content width: ${typeof layout.maxContentWidth === "number" ? `${layout.maxContentWidth}px` : "unspecified"}`, + `Required containers: ${requiredContainers.join(", ") || "none"}`, + `Color policy: ${asString(color.policy) ?? "off"}`, + `Motion durations: ${Array.isArray(motion.allowedDurationsMs) + ? motion.allowedDurationsMs.map((value) => `${String(value)}ms`).join(", ") + : "none"}`, + `Top repair priorities: ${topRepairs.join(", ") || "none"}`, + ].join("\n"); +} +function selectRelevantComponents(preparedPayload) { + const sections = Array.isArray(preparedPayload.sections) + ? preparedPayload.sections.filter((entry) => isRecord(entry)) + : []; + const components = Array.isArray(preparedPayload.components) + ? preparedPayload.components.filter((entry) => isRecord(entry)) + : []; + const referencedIds = new Set(); + for (const section of sections) { + const anatomy = asRecord(section.anatomy); + const defaultComponentId = asString(anatomy.defaultComponentId); + if (defaultComponentId) { + referencedIds.add(defaultComponentId); + } + for (const componentId of asStringArray(anatomy.allowedComponentIds)) { + referencedIds.add(componentId); + } + const slots = Array.isArray(anatomy.slots) ? anatomy.slots : []; + for (const slot of slots) { + const slotRecord = asRecord(slot); + for (const componentId of asStringArray(slotRecord.acceptsComponentIds)) { + referencedIds.add(componentId); + } + } + } + if (referencedIds.size === 0) { + return components.slice(0, 12); + } + return components.filter((component) => referencedIds.has(asString(component.id) ?? "")); +} +function loadPreparedPayloadForSession(session) { + if (session.preparedInputPath && fs.existsSync(session.preparedInputPath)) { + return readJsonFile(session.preparedInputPath, "prepared generation payload"); + } + const bundle = loadCompiledSurfaceBundle(session.bundleRoot, session.surfaceId, process.cwd()); + return buildPreparedGenerationPayload(bundle); +} +function loadRuntimeAcceptedSuggestions(filePath) { + if (!filePath) { + return []; + } + const resolvedPath = path.resolve(filePath); + if (!fs.existsSync(resolvedPath)) { + throw new SessionInputError(`Accepted suggestions file not found at ${resolvedPath}.`); + } + let payload; + try { + payload = JSON.parse(fs.readFileSync(resolvedPath, "utf8")); + } + catch (error) { + throw new SessionInputError(`Accepted suggestions file is not valid JSON: ${resolvedPath} (${error instanceof Error ? error.message : String(error)}).`); + } + const payloadRecord = asRecord(payload); + const suggestions = Array.isArray(payload) + ? payload + : Array.isArray(payloadRecord.suggestions) + ? payloadRecord.suggestions + : []; + return suggestions + .filter((entry) => isRecord(entry)) + .map((entry) => { + const findingCode = asString(entry.findingCode); + const findingMessage = asString(entry.findingMessage); + const summary = asString(entry.summary); + const suggestedPath = asString(entry.suggestedPath); + const rationale = asString(entry.rationale); + if (!findingCode || !findingMessage || !summary || !suggestedPath) { + throw new SessionInputError(`Accepted suggestion entries must include findingCode, findingMessage, summary, and suggestedPath: ${resolvedPath}.`); + } + return { + findingCode, + findingMessage, + summary, + suggestedPath, + ...(rationale ? { rationale } : {}), + }; + }); +} +function loadRuntimeDesignerNotes(filePath) { + if (!filePath) { + return []; + } + const resolvedPath = path.resolve(filePath); + if (!fs.existsSync(resolvedPath)) { + throw new SessionInputError(`Designer notes file not found at ${resolvedPath}.`); + } + let payload; + try { + payload = JSON.parse(fs.readFileSync(resolvedPath, "utf8")); + } + catch (error) { + throw new SessionInputError(`Designer notes file is not valid JSON: ${resolvedPath} (${error instanceof Error ? error.message : String(error)}).`); + } + const payloadRecord = asRecord(payload); + const rawNotes = Array.isArray(payload) + ? payload + : Array.isArray(payloadRecord.designerNotes) + ? payloadRecord.designerNotes + : Array.isArray(payloadRecord.notes) + ? payloadRecord.notes + : []; + return [...new Set(rawNotes + .map((entry) => { + if (typeof entry === "string") { + return entry.trim(); + } + if (isRecord(entry)) { + return asString(entry.content) ?? ""; + } + return ""; + }) + .filter(Boolean))]; +} +function parseRuntimeFindingCodes(value) { + if (!value) { + return []; + } + return [...new Set(value + .split(",") + .map((entry) => entry.trim()) + .filter(Boolean))].sort((left, right) => left.localeCompare(right)); +} +function buildGuidanceHandoff(session, paths, guidanceStrategy, options = {}) { + const acceptedSuggestions = options.acceptedSuggestions ?? []; + const designerNotes = options.designerNotes ?? []; + const findingCodes = [...new Set([ + ...(options.findingCodes ?? []), + ...acceptedSuggestions.map((entry) => entry.findingCode), + ])].sort((left, right) => left.localeCompare(right)); + const preparedPayload = guidanceStrategy === "unguided" ? null : loadPreparedPayloadForSession(session); + const repairMap = preparedPayload ? extractRepairEntries(preparedPayload.repairMap) : []; + const matchedRepairs = repairMap.filter((entry) => findingCodes.includes(asString(entry.code) ?? "")); + const brief = session.brief && fs.existsSync(session.brief.path) + ? { + ...session.brief, + text: fs.readFileSync(session.brief.path, "utf8").trim(), + } + : null; + return { + schemaVersion: 1, + surfaceId: session.surfaceId, + sessionId: session.sessionId, + tool: session.tool, + guidanceStrategy, + generatedAt: session.startedAt, + brief, + session: { + sessionPath: paths.sessionPath, + preparedInputPath: session.preparedInputPath, + contractPath: session.contractPath, + repairMapPath: session.repairMapPath, + }, + runtimeGuidance: { + findingCodes, + matchedRepairCodes: matchedRepairs.map((entry) => asString(entry.code) ?? "").filter(Boolean), + acceptedSuggestions, + designerNotes, + }, + promptSummary: guidanceStrategy === "prompt-summary" + ? { + effectiveContractSummary: summarizeContractForSurface(session.contractPath, session.surfaceId), + preparedGuidanceSummary: buildPreparedPromptSummary(preparedPayload), + } + : null, + jsonPrimary: guidanceStrategy === "json-primary" + ? { + surface: asRecord(preparedPayload.surface), + contract: asRecord(preparedPayload.contract), + summary: asRecord(preparedPayload.summary), + generation: asRecord(preparedPayload.generation), + constraints: asRecord(preparedPayload.constraints), + sections: Array.isArray(preparedPayload.sections) + ? preparedPayload.sections.filter((entry) => isRecord(entry)) + : [], + components: selectRelevantComponents(preparedPayload), + repairMap, + matchedRepairs, + } + : null, + }; +} function parseCsvPaths(value) { if (!value) return []; @@ -744,12 +1099,6 @@ function buildComparisonArtifact(baselineSessionDir, guidedSessionDir) { if (baselineBuilt.session.tool !== guidedBuilt.session.tool) { throw new SessionInputError("Baseline and guided sessions must use the same tool."); } - if (baselineBuilt.session.guidanceMode !== "unguided") { - throw new SessionInputError("Baseline session must use guidanceMode=unguided."); - } - if (guidedBuilt.session.guidanceMode !== "prepared") { - throw new SessionInputError("Guided session must use guidanceMode=prepared."); - } if (!baselineBuilt.session.brief || !guidedBuilt.session.brief) { throw new SessionInputError("Both sessions must freeze the same implementation brief before comparison."); } @@ -779,8 +1128,23 @@ function buildComparisonArtifact(baselineSessionDir, guidedSessionDir) { : guidedBuilt.summary.firstAcceptableAttempt !== null && guidedBuilt.summary.firstAcceptableAttempt <= baselineBuilt.summary.firstAcceptableAttempt; const guidedFewerFirstAttemptBlockingFindings = guidedFirstAttempt.blockingFindingCount < baselineFirstAttempt.blockingFindingCount; + const heuristics = { + baseline: baselineBuilt.summary.heuristics, + guided: guidedBuilt.summary.heuristics, + delta: { + unresolvedAcceptedSuggestionRate: numericHeuristicDelta(baselineBuilt.summary.heuristics.latestAttempt.unresolvedAcceptedSuggestionRate, guidedBuilt.summary.heuristics.latestAttempt.unresolvedAcceptedSuggestionRate), + noChangesAfterEditFailureCount: (guidedBuilt.summary.heuristics.latestAttempt.noChangesAfterEditFailureCount ?? 0) - + (baselineBuilt.summary.heuristics.latestAttempt.noChangesAfterEditFailureCount ?? 0), + recoverableToolErrorCount: (guidedBuilt.summary.heuristics.latestAttempt.recoverableToolErrorCount ?? 0) - + (baselineBuilt.summary.heuristics.latestAttempt.recoverableToolErrorCount ?? 0), + touchedFilesPerResolvedFinding: numericHeuristicDelta(baselineBuilt.summary.heuristics.latestAttempt.touchedFilesPerResolvedFinding, guidedBuilt.summary.heuristics.latestAttempt.touchedFilesPerResolvedFinding), + repeatedFindingCarryoverCount: guidedBuilt.summary.heuristics.repeatedFindingCarryoverCount - + baselineBuilt.summary.heuristics.repeatedFindingCarryoverCount, + rerunsToAcceptableOutcome: numericHeuristicDelta(baselineBuilt.summary.heuristics.rerunsToAcceptableOutcome, guidedBuilt.summary.heuristics.rerunsToAcceptableOutcome), + }, + }; return { - schemaVersion: 2, + schemaVersion: 3, surfaceId: baselineBuilt.session.surfaceId, tool: baselineBuilt.session.tool, brief: { @@ -791,7 +1155,7 @@ function buildComparisonArtifact(baselineSessionDir, guidedSessionDir) { baseline: { sessionId: baselineBuilt.session.sessionId, sessionDir: baselineBuilt.session.sessionDir, - guidanceMode: baselineBuilt.session.guidanceMode, + guidanceStrategy: baselineBuilt.session.guidanceStrategy, attemptCount: baselineBuilt.summary.attemptCount, firstAcceptableAttempt: baselineBuilt.summary.firstAcceptableAttempt, latestOutcome: baselineBuilt.summary.latestOutcome, @@ -799,11 +1163,12 @@ function buildComparisonArtifact(baselineSessionDir, guidedSessionDir) { latestAttempt: baselineLatestAttempt, recurringFindingCodes: baselineBuilt.summary.recurringFindingCodes, recurringRepairCodes: baselineBuilt.summary.recurringRepairCodes, + heuristics: baselineBuilt.summary.heuristics, }, guided: { sessionId: guidedBuilt.session.sessionId, sessionDir: guidedBuilt.session.sessionDir, - guidanceMode: guidedBuilt.session.guidanceMode, + guidanceStrategy: guidedBuilt.session.guidanceStrategy, attemptCount: guidedBuilt.summary.attemptCount, firstAcceptableAttempt: guidedBuilt.summary.firstAcceptableAttempt, latestOutcome: guidedBuilt.summary.latestOutcome, @@ -811,6 +1176,7 @@ function buildComparisonArtifact(baselineSessionDir, guidedSessionDir) { latestAttempt: guidedLatestAttempt, recurringFindingCodes: guidedBuilt.summary.recurringFindingCodes, recurringRepairCodes: guidedBuilt.summary.recurringRepairCodes, + heuristics: guidedBuilt.summary.heuristics, }, delta: { firstAttemptVerdict: { @@ -831,6 +1197,7 @@ function buildComparisonArtifact(baselineSessionDir, guidedSessionDir) { }, rubric, }, + heuristics, checks: { guidedFewerFirstAttemptBlockingFindings, guidedReachedAcceptableNoLater, @@ -917,8 +1284,8 @@ function getSuggestionSortKey(left, right) { } function buildSuggestionArtifact(sessionDir) { const built = buildGenerationSessionSummary(sessionDir); - if (built.session.guidanceMode !== "prepared") { - throw new SessionInputError("Contract delta suggestions require a guided prepared session."); + if (built.session.guidanceStrategy === "unguided") { + throw new SessionInputError("Contract delta suggestions require a guided session."); } const repairMapDoc = readJsonFile(built.session.repairMapPath, "repair map"); const repairs = Array.isArray(repairMapDoc.repairs) ? repairMapDoc.repairs : []; @@ -981,11 +1348,11 @@ function buildSuggestionArtifact(sessionDir) { }; }).sort(getSuggestionSortKey); return { - schemaVersion: 1, + schemaVersion: 2, surfaceId: built.session.surfaceId, sessionId: built.session.sessionId, tool: built.session.tool, - guidanceMode: built.session.guidanceMode, + guidanceStrategy: built.session.guidanceStrategy, generatedAt: asString(latestAttempt.metadata.createdAt) ?? asString(latestAttempt.validate.provenance && asRecord(latestAttempt.validate.provenance).evaluatedAt) ?? built.session.startedAt, @@ -1042,7 +1409,7 @@ export async function runInitGenerationSessionCommand(options) { throw new SessionInputError("--workspace-root is required."); } const tool = ensureSessionTool(options.tool); - const guidanceMode = ensureGuidanceMode(options.guidanceMode); + const guidanceStrategy = ensureGuidanceStrategy(options.guidanceStrategy ?? options.guidanceMode); const workspaceRoot = path.resolve(options.workspaceRoot); if (!fs.existsSync(workspaceRoot) || !fs.statSync(workspaceRoot).isDirectory()) { throw new SessionInputError(`Workspace root directory not found at ${workspaceRoot}.`); @@ -1059,17 +1426,17 @@ export async function runInitGenerationSessionCommand(options) { fs.cpSync(loadedBundle.root, paths.bundleRoot, { recursive: true }); const sessionBundle = loadCompiledSurfaceBundle(paths.bundleRoot, options.surfaceId, process.cwd()); let preparedInputPath = null; - if (guidanceMode === "prepared") { + if (guidanceStrategy !== "unguided") { const preparedPayload = buildPreparedGenerationPayload(sessionBundle); writeDeterministicJsonSync(paths.preparedInputPath, preparedPayload); preparedInputPath = paths.preparedInputPath; } const session = { - schemaVersion: 2, + schemaVersion: 3, surfaceId: options.surfaceId, sessionId, tool, - guidanceMode, + guidanceStrategy, workspaceRoot, sourceBundleRoot: loadedBundle.root, sessionDir: paths.sessionDir, @@ -1077,14 +1444,72 @@ export async function runInitGenerationSessionCommand(options) { preparedInputPath, contractPath: sessionBundle.contract.path, repairMapPath: sessionBundle.surface.repairMap.path, + guidanceArtifacts: { + baseHandoffPath: paths.guidanceHandoffPath, + }, startedAt: new Date().toISOString(), ...(options.briefFile ? { brief: freezeBriefFile(paths.sessionDir, options.briefFile) } : {}), successRule: { finalStatus: "pass-or-reviewed-warn", }, }; + const handoff = buildGuidanceHandoff(session, paths, guidanceStrategy); + writeDeterministicJsonSync(paths.guidanceHandoffPath, handoff); writeDeterministicJsonSync(paths.sessionPath, session); - process.stdout.write(`${JSON.stringify({ ok: true, session, paths }, null, 2)}\n`); + process.stdout.write(`${JSON.stringify({ ok: true, session, handoff, paths }, null, 2)}\n`); + return 0; + } + catch (error) { + if (error instanceof SessionInputError || error instanceof AdapterInputError) { + writeError(error, error.code); + return 10; + } + writeError(error instanceof Error ? error : new Error(String(error)), "generation-session.internal"); + return 1; + } +} +export async function runPrepareGenerationHandoffCommand(options) { + try { + if (!options.sessionDir) { + throw new SessionInputError("--session-dir is required."); + } + const { session, paths } = loadSession(options.sessionDir); + const guidanceStrategy = ensureGuidanceStrategy(options.guidanceStrategy ?? session.guidanceStrategy); + let preparedInputPath = session.preparedInputPath; + if (guidanceStrategy !== "unguided" && !preparedInputPath) { + const bundle = loadCompiledSurfaceBundle(session.bundleRoot, session.surfaceId, process.cwd()); + const preparedPayload = buildPreparedGenerationPayload(bundle); + writeDeterministicJsonSync(paths.preparedInputPath, preparedPayload); + preparedInputPath = paths.preparedInputPath; + } + const sessionForHandoff = { + ...session, + guidanceStrategy, + preparedInputPath, + guidanceArtifacts: { + baseHandoffPath: options.outPath ? path.resolve(options.outPath) : paths.guidanceHandoffPath, + }, + }; + const handoff = buildGuidanceHandoff(sessionForHandoff, paths, guidanceStrategy, { + acceptedSuggestions: loadRuntimeAcceptedSuggestions(options.acceptedSuggestionsFile), + designerNotes: loadRuntimeDesignerNotes(options.designerNotesFile), + findingCodes: parseRuntimeFindingCodes(options.findingCodes), + }); + const handoffPath = sessionForHandoff.guidanceArtifacts.baseHandoffPath ?? paths.guidanceHandoffPath; + writeDeterministicJsonSync(handoffPath, handoff); + const updatedSession = { + ...sessionForHandoff, + }; + writeDeterministicJsonSync(paths.sessionPath, updatedSession); + process.stdout.write(`${JSON.stringify({ + ok: true, + handoff, + session: updatedSession, + paths: { + handoffPath, + sessionPath: paths.sessionPath, + }, + }, null, 2)}\n`); return 0; } catch (error) { @@ -1138,12 +1563,12 @@ export async function runRecordGenerationAttemptCommand(options) { idempotencyKey: `${session.surfaceId}:${session.sessionId}:${formatAttemptNumber(attemptNumber)}`, }); const metadata = { - schemaVersion: 2, + schemaVersion: 3, surfaceId: session.surfaceId, sessionId: session.sessionId, attemptNumber, tool: session.tool, - guidanceMode: session.guidanceMode, + guidanceStrategy: session.guidanceStrategy, createdAt: new Date().toISOString(), validateStatus: response.status, validateExitCode: response.status === "block" ? 30 : 0, @@ -1151,6 +1576,7 @@ export async function runRecordGenerationAttemptCommand(options) { assessmentPath: attemptPaths.assessmentPath, validatePath: attemptPaths.validatePath, touchedFiles: assessment.touchedFiles ?? [], + guidanceHandoffPath: session.guidanceArtifacts.baseHandoffPath, contractRun, }; writeDeterministicJsonSync(attemptPaths.metadataPath, metadata); @@ -1512,16 +1938,19 @@ export async function runSummarizeGenerationBenchmarkCommand(options) { value: readJsonFile(suggestionPath, "contract delta suggestions artifact"), })); const report = { - schemaVersion: 1, + schemaVersion: 2, generatedAt: new Date().toISOString(), comparisons: comparisons.map(({ path: comparisonPath, value }) => ({ surfaceId: value.surfaceId, tool: value.tool, comparisonPath, meetsGoal: value.checks.meetsGoal, + baselineGuidanceStrategy: value.baseline.guidanceStrategy, + guidedGuidanceStrategy: value.guided.guidanceStrategy, guidedFewerFirstAttemptBlockingFindings: value.checks.guidedFewerFirstAttemptBlockingFindings, guidedReachedAcceptableNoLater: value.checks.guidedReachedAcceptableNoLater, guidedRubricBetterDimensions: value.checks.guidedRubricBetterDimensions, + heuristics: value.heuristics.delta, })), suggestions: suggestions.map(({ path: suggestionsPath, value }) => ({ surfaceId: value.surfaceId, @@ -1539,6 +1968,22 @@ export async function runSummarizeGenerationBenchmarkCommand(options) { acceptedSuggestionCount: suggestions.reduce((total, entry) => total + entry.value.suggestions.filter((suggestion) => suggestion.status === "accepted").length, 0), rejectedSuggestionCount: suggestions.reduce((total, entry) => total + entry.value.suggestions.filter((suggestion) => suggestion.status === "rejected").length, 0), proposedSuggestionCount: suggestions.reduce((total, entry) => total + entry.value.suggestions.filter((suggestion) => suggestion.status === "proposed").length, 0), + heuristics: { + lowerUnresolvedAcceptedSuggestionRate: countHeuristicImprovement(comparisons.map(({ value }) => value.heuristics.delta.unresolvedAcceptedSuggestionRate)), + lowerNoChangesAfterEditFailureCount: comparisons.filter(({ value }) => value.heuristics.delta.noChangesAfterEditFailureCount < 0).length, + lowerRecoverableToolErrorCount: comparisons.filter(({ value }) => value.heuristics.delta.recoverableToolErrorCount < 0).length, + lowerTouchedFilesPerResolvedFinding: countHeuristicImprovement(comparisons.map(({ value }) => value.heuristics.delta.touchedFilesPerResolvedFinding)), + lowerRepeatedFindingCarryoverCount: comparisons.filter(({ value }) => value.heuristics.delta.repeatedFindingCarryoverCount < 0).length, + lowerRerunsToAcceptableOutcome: countHeuristicImprovement(comparisons.map(({ value }) => value.heuristics.delta.rerunsToAcceptableOutcome)), + averageDelta: { + unresolvedAcceptedSuggestionRate: averageNullable(comparisons.map(({ value }) => value.heuristics.delta.unresolvedAcceptedSuggestionRate)), + noChangesAfterEditFailureCount: averageNullable(comparisons.map(({ value }) => value.heuristics.delta.noChangesAfterEditFailureCount)), + recoverableToolErrorCount: averageNullable(comparisons.map(({ value }) => value.heuristics.delta.recoverableToolErrorCount)), + touchedFilesPerResolvedFinding: averageNullable(comparisons.map(({ value }) => value.heuristics.delta.touchedFilesPerResolvedFinding)), + repeatedFindingCarryoverCount: averageNullable(comparisons.map(({ value }) => value.heuristics.delta.repeatedFindingCarryoverCount)), + rerunsToAcceptableOutcome: averageNullable(comparisons.map(({ value }) => value.heuristics.delta.rerunsToAcceptableOutcome)), + }, + }, }, }; const outDir = options.outDir diff --git a/packages/interfacectl-cli/dist/index.js b/packages/interfacectl-cli/dist/index.js index 73932f2..8fd2efc 100755 --- a/packages/interfacectl-cli/dist/index.js +++ b/packages/interfacectl-cli/dist/index.js @@ -13,7 +13,7 @@ import { runPrepareGenerationCommand } from "./commands/prepare-generation.js"; import { runValidateGenerationCommand } from "./commands/validate-generation.js"; import { runServeGenerationAdapterCommand } from "./commands/serve-generation-adapter.js"; import { runEmitRunArtifactCommand } from "./commands/emit-run-artifact.js"; -import { runCaptureGenerationPreviewCommand, runCompareGenerationSessionsCommand, runInitGenerationSessionCommand, runRecordGenerationAttemptCommand, runReviewContractDeltaSuggestionsCommand, runReviewGenerationAttemptCommand, runSuggestContractDeltasCommand, runSummarizeGenerationSessionCommand, runSummarizeGenerationBenchmarkCommand, } from "./commands/generation-session.js"; +import { runCaptureGenerationPreviewCommand, runCompareGenerationSessionsCommand, runInitGenerationSessionCommand, runPrepareGenerationHandoffCommand, runRecordGenerationAttemptCommand, runReviewContractDeltaSuggestionsCommand, runReviewGenerationAttemptCommand, runSuggestContractDeltasCommand, runSummarizeGenerationSessionCommand, runSummarizeGenerationBenchmarkCommand, } from "./commands/generation-session.js"; import { runInitCommand } from "./commands/init.js"; import { runAnalyzeCommand } from "./commands/analyze.js"; import { runAuthCaptureCommand, runAuthClearCommand, runAuthListCommandWithOptions, runAuthTestCommand, } from "./commands/auth.js"; @@ -214,7 +214,8 @@ program .requiredOption("--surface ", "Surface identifier") .requiredOption("--workspace-root ", "Workspace root for emitted run artifacts") .option("--tool ", "Generation tool identifier (codex|cursor|local-llm)") - .option("--guidance-mode ", "Session guidance mode (prepared|unguided)") + .option("--guidance-strategy ", "Session guidance strategy (prompt-summary|json-primary|unguided)") + .option("--guidance-mode ", "Legacy alias for --guidance-strategy (prepared|unguided)") .option("--brief-file ", "Optional implementation brief file to freeze into the session") .option("--session ", "Optional session identifier") .option("--artifacts-root ", "Optional session artifacts root (defaults under workspaceRoot/artifacts/generation-sessions)") @@ -224,12 +225,32 @@ program surfaceId: options.surface, workspaceRoot: options.workspaceRoot, tool: options.tool, + guidanceStrategy: options.guidanceStrategy, guidanceMode: options.guidanceMode, briefFile: options.briefFile, sessionId: options.session, artifactsRoot: options.artifactsRoot, }); }); +program + .command("prepare-generation-handoff") + .description("Build one canonical strategy-aware guidance handoff artifact for a tracked generation session") + .requiredOption("--session-dir ", "Path to the generation session directory") + .option("--guidance-strategy ", "Optional guidance strategy override (prompt-summary|json-primary|unguided)") + .option("--accepted-suggestions ", "Optional accepted suggestions JSON file") + .option("--designer-notes ", "Optional designer notes JSON file") + .option("--finding-codes ", "Optional comma-separated finding codes to match against repair guidance") + .option("--out ", "Write the handoff JSON to the provided file") + .action(async (options) => { + process.exitCode = await runPrepareGenerationHandoffCommand({ + sessionDir: options.sessionDir, + guidanceStrategy: options.guidanceStrategy, + acceptedSuggestionsFile: options.acceptedSuggestions, + designerNotesFile: options.designerNotes, + findingCodes: options.findingCodes, + outPath: options.out, + }); +}); program .command("record-generation-attempt") .description("Validate and record one generation attempt for a tracked session") @@ -282,9 +303,9 @@ program }); program .command("compare-generation-sessions") - .description("Compare one unguided session against one prepared guided session") - .requiredOption("--baseline-session-dir ", "Path to the unguided baseline session directory") - .requiredOption("--guided-session-dir ", "Path to the prepared guided session directory") + .description("Compare two generation sessions for the same frozen brief") + .requiredOption("--baseline-session-dir ", "Path to the baseline generation session directory") + .requiredOption("--guided-session-dir ", "Path to the candidate generation session directory") .option("--out-dir ", "Output directory for comparison artifacts") .action(async (options) => { process.exitCode = await runCompareGenerationSessionsCommand({ @@ -295,7 +316,7 @@ program }); program .command("suggest-contract-deltas") - .description("Generate evidence-backed contract refinement suggestions from one guided session") + .description("Generate evidence-backed contract refinement suggestions from one guided generation session") .requiredOption("--session-dir ", "Path to the guided generation session directory") .option("--out ", "Write suggestion JSON to the provided file") .action(async (options) => { diff --git a/packages/interfacectl-cli/schemas/contract-delta-suggestions.schema.json b/packages/interfacectl-cli/schemas/contract-delta-suggestions.schema.json index 0327ceb..6ae46f3 100644 --- a/packages/interfacectl-cli/schemas/contract-delta-suggestions.schema.json +++ b/packages/interfacectl-cli/schemas/contract-delta-suggestions.schema.json @@ -4,11 +4,11 @@ "title": "ContractDeltaSuggestionsArtifact", "type": "object", "additionalProperties": false, - "required": ["schemaVersion", "surfaceId", "sessionId", "tool", "guidanceMode", "generatedAt", "contract", "session", "suggestions"], + "required": ["schemaVersion", "surfaceId", "sessionId", "tool", "guidanceStrategy", "generatedAt", "contract", "session", "suggestions"], "properties": { "schemaVersion": { "type": "number", - "const": 1 + "const": 2 }, "surfaceId": { "type": "string", @@ -22,9 +22,9 @@ "type": "string", "enum": ["codex", "cursor", "local-llm"] }, - "guidanceMode": { + "guidanceStrategy": { "type": "string", - "enum": ["prepared", "unguided"] + "enum": ["prompt-summary", "json-primary", "unguided"] }, "generatedAt": { "type": "string", diff --git a/packages/interfacectl-cli/schemas/generation-assessment.schema.json b/packages/interfacectl-cli/schemas/generation-assessment.schema.json index 508d1e0..b9cf311 100644 --- a/packages/interfacectl-cli/schemas/generation-assessment.schema.json +++ b/packages/interfacectl-cli/schemas/generation-assessment.schema.json @@ -36,6 +36,32 @@ "type": "string", "minLength": 1 } + }, + "heuristics": { + "type": "object", + "additionalProperties": false, + "properties": { + "unresolvedAcceptedSuggestionCount": { + "type": "number", + "minimum": 0 + }, + "unresolvedAcceptedSuggestionRate": { + "type": ["number", "null"], + "minimum": 0 + }, + "noChangesAfterEditFailureCount": { + "type": "number", + "minimum": 0 + }, + "recoverableToolErrorCount": { + "type": "number", + "minimum": 0 + }, + "touchedFilesPerResolvedFinding": { + "type": ["number", "null"], + "minimum": 0 + } + } } } } diff --git a/packages/interfacectl-cli/schemas/generation-benchmark-report.schema.json b/packages/interfacectl-cli/schemas/generation-benchmark-report.schema.json index 248a0ae..dc5a84a 100644 --- a/packages/interfacectl-cli/schemas/generation-benchmark-report.schema.json +++ b/packages/interfacectl-cli/schemas/generation-benchmark-report.schema.json @@ -8,7 +8,7 @@ "properties": { "schemaVersion": { "type": "number", - "const": 1 + "const": 2 }, "generatedAt": { "type": "string", @@ -19,7 +19,18 @@ "items": { "type": "object", "additionalProperties": false, - "required": ["surfaceId", "tool", "comparisonPath", "meetsGoal", "guidedFewerFirstAttemptBlockingFindings", "guidedReachedAcceptableNoLater", "guidedRubricBetterDimensions"], + "required": [ + "surfaceId", + "tool", + "comparisonPath", + "meetsGoal", + "baselineGuidanceStrategy", + "guidedGuidanceStrategy", + "guidedFewerFirstAttemptBlockingFindings", + "guidedReachedAcceptableNoLater", + "guidedRubricBetterDimensions", + "heuristics" + ], "properties": { "surfaceId": { "type": "string", @@ -36,6 +47,14 @@ "meetsGoal": { "type": "boolean" }, + "baselineGuidanceStrategy": { + "type": "string", + "enum": ["prompt-summary", "json-primary", "unguided"] + }, + "guidedGuidanceStrategy": { + "type": "string", + "enum": ["prompt-summary", "json-primary", "unguided"] + }, "guidedFewerFirstAttemptBlockingFindings": { "type": "boolean" }, @@ -48,6 +67,9 @@ "type": "string", "enum": ["structure", "components", "boundary", "visual", "responsiveness"] } + }, + "heuristics": { + "$ref": "#/$defs/comparisonHeuristicDelta" } } } @@ -89,7 +111,16 @@ "overall": { "type": "object", "additionalProperties": false, - "required": ["surfaceCount", "surfacesMeetingGoal", "guidedFewerFirstAttemptBlockingFindings", "guidedReachedAcceptableNoLater", "acceptedSuggestionCount", "rejectedSuggestionCount", "proposedSuggestionCount"], + "required": [ + "surfaceCount", + "surfacesMeetingGoal", + "guidedFewerFirstAttemptBlockingFindings", + "guidedReachedAcceptableNoLater", + "acceptedSuggestionCount", + "rejectedSuggestionCount", + "proposedSuggestionCount", + "heuristics" + ], "properties": { "surfaceCount": { "type": "number", @@ -118,6 +149,117 @@ "proposedSuggestionCount": { "type": "number", "minimum": 0 + }, + "heuristics": { + "$ref": "#/$defs/benchmarkHeuristicsSummary" + } + } + } + }, + "$defs": { + "comparisonHeuristicDelta": { + "type": "object", + "additionalProperties": false, + "required": [ + "unresolvedAcceptedSuggestionRate", + "noChangesAfterEditFailureCount", + "recoverableToolErrorCount", + "touchedFilesPerResolvedFinding", + "repeatedFindingCarryoverCount", + "rerunsToAcceptableOutcome" + ], + "properties": { + "unresolvedAcceptedSuggestionRate": { + "type": ["number", "null"] + }, + "noChangesAfterEditFailureCount": { + "type": "number" + }, + "recoverableToolErrorCount": { + "type": "number" + }, + "touchedFilesPerResolvedFinding": { + "type": ["number", "null"] + }, + "repeatedFindingCarryoverCount": { + "type": "number" + }, + "rerunsToAcceptableOutcome": { + "type": ["number", "null"] + } + } + }, + "benchmarkAverageDelta": { + "type": "object", + "additionalProperties": false, + "required": [ + "unresolvedAcceptedSuggestionRate", + "noChangesAfterEditFailureCount", + "recoverableToolErrorCount", + "touchedFilesPerResolvedFinding", + "repeatedFindingCarryoverCount", + "rerunsToAcceptableOutcome" + ], + "properties": { + "unresolvedAcceptedSuggestionRate": { + "type": ["number", "null"] + }, + "noChangesAfterEditFailureCount": { + "type": ["number", "null"] + }, + "recoverableToolErrorCount": { + "type": ["number", "null"] + }, + "touchedFilesPerResolvedFinding": { + "type": ["number", "null"] + }, + "repeatedFindingCarryoverCount": { + "type": ["number", "null"] + }, + "rerunsToAcceptableOutcome": { + "type": ["number", "null"] + } + } + }, + "benchmarkHeuristicsSummary": { + "type": "object", + "additionalProperties": false, + "required": [ + "lowerUnresolvedAcceptedSuggestionRate", + "lowerNoChangesAfterEditFailureCount", + "lowerRecoverableToolErrorCount", + "lowerTouchedFilesPerResolvedFinding", + "lowerRepeatedFindingCarryoverCount", + "lowerRerunsToAcceptableOutcome", + "averageDelta" + ], + "properties": { + "lowerUnresolvedAcceptedSuggestionRate": { + "type": "number", + "minimum": 0 + }, + "lowerNoChangesAfterEditFailureCount": { + "type": "number", + "minimum": 0 + }, + "lowerRecoverableToolErrorCount": { + "type": "number", + "minimum": 0 + }, + "lowerTouchedFilesPerResolvedFinding": { + "type": "number", + "minimum": 0 + }, + "lowerRepeatedFindingCarryoverCount": { + "type": "number", + "minimum": 0 + }, + "lowerRerunsToAcceptableOutcome": { + "type": "number", + "minimum": 0 + }, + "averageDelta": { + "$ref": "#/$defs/benchmarkAverageDelta" } } } diff --git a/packages/interfacectl-cli/schemas/generation-guidance-handoff.schema.json b/packages/interfacectl-cli/schemas/generation-guidance-handoff.schema.json new file mode 100644 index 0000000..38f447c --- /dev/null +++ b/packages/interfacectl-cli/schemas/generation-guidance-handoff.schema.json @@ -0,0 +1,225 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://surfaces.dev/schemas/generation-guidance-handoff.schema.json", + "title": "GenerationGuidanceHandoff", + "type": "object", + "additionalProperties": false, + "required": [ + "schemaVersion", + "surfaceId", + "sessionId", + "tool", + "guidanceStrategy", + "generatedAt", + "brief", + "session", + "runtimeGuidance", + "promptSummary", + "jsonPrimary" + ], + "properties": { + "schemaVersion": { + "type": "number", + "const": 1 + }, + "surfaceId": { + "type": "string", + "minLength": 1 + }, + "sessionId": { + "type": "string", + "minLength": 1 + }, + "tool": { + "type": "string", + "enum": ["codex", "cursor", "local-llm"] + }, + "guidanceStrategy": { + "type": "string", + "enum": ["prompt-summary", "json-primary", "unguided"] + }, + "generatedAt": { + "type": "string", + "format": "date-time" + }, + "brief": { + "oneOf": [ + { + "type": "null" + }, + { + "type": "object", + "additionalProperties": false, + "required": ["path", "sha256", "text"], + "properties": { + "path": { + "type": "string", + "minLength": 1 + }, + "sha256": { + "type": "string", + "minLength": 1 + }, + "text": { + "type": "string" + } + } + } + ] + }, + "session": { + "type": "object", + "additionalProperties": false, + "required": ["sessionPath", "preparedInputPath", "contractPath", "repairMapPath"], + "properties": { + "sessionPath": { + "type": "string", + "minLength": 1 + }, + "preparedInputPath": { + "type": ["string", "null"], + "minLength": 1 + }, + "contractPath": { + "type": "string", + "minLength": 1 + }, + "repairMapPath": { + "type": "string", + "minLength": 1 + } + } + }, + "runtimeGuidance": { + "type": "object", + "additionalProperties": false, + "required": ["findingCodes", "matchedRepairCodes", "acceptedSuggestions", "designerNotes"], + "properties": { + "findingCodes": { + "type": "array", + "items": { + "type": "string", + "minLength": 1 + } + }, + "matchedRepairCodes": { + "type": "array", + "items": { + "type": "string", + "minLength": 1 + } + }, + "acceptedSuggestions": { + "type": "array", + "items": { + "type": "object", + "additionalProperties": false, + "required": ["findingCode", "findingMessage", "summary", "suggestedPath"], + "properties": { + "findingCode": { + "type": "string", + "minLength": 1 + }, + "findingMessage": { + "type": "string", + "minLength": 1 + }, + "summary": { + "type": "string", + "minLength": 1 + }, + "suggestedPath": { + "type": "string", + "minLength": 1 + }, + "rationale": { + "type": "string", + "minLength": 1 + } + } + } + }, + "designerNotes": { + "type": "array", + "items": { + "type": "string", + "minLength": 1 + } + } + } + }, + "promptSummary": { + "oneOf": [ + { + "type": "null" + }, + { + "type": "object", + "additionalProperties": false, + "required": ["effectiveContractSummary", "preparedGuidanceSummary"], + "properties": { + "effectiveContractSummary": { + "type": "string" + }, + "preparedGuidanceSummary": { + "type": "string" + } + } + } + ] + }, + "jsonPrimary": { + "oneOf": [ + { + "type": "null" + }, + { + "type": "object", + "additionalProperties": false, + "required": ["surface", "contract", "summary", "generation", "constraints", "sections", "components", "repairMap", "matchedRepairs"], + "properties": { + "surface": { + "type": "object" + }, + "contract": { + "type": "object" + }, + "summary": { + "type": "object" + }, + "generation": { + "type": "object" + }, + "constraints": { + "type": "object" + }, + "sections": { + "type": "array", + "items": { + "type": "object" + } + }, + "components": { + "type": "array", + "items": { + "type": "object" + } + }, + "repairMap": { + "type": "array", + "items": { + "type": "object" + } + }, + "matchedRepairs": { + "type": "array", + "items": { + "type": "object" + } + } + } + } + ] + } + } +} diff --git a/packages/interfacectl-cli/schemas/generation-session-comparison.schema.json b/packages/interfacectl-cli/schemas/generation-session-comparison.schema.json index a181f6e..a5f807c 100644 --- a/packages/interfacectl-cli/schemas/generation-session-comparison.schema.json +++ b/packages/interfacectl-cli/schemas/generation-session-comparison.schema.json @@ -4,11 +4,11 @@ "title": "GenerationSessionComparison", "type": "object", "additionalProperties": false, - "required": ["schemaVersion", "surfaceId", "tool", "brief", "baseline", "guided", "delta", "checks", "paths"], + "required": ["schemaVersion", "surfaceId", "tool", "brief", "baseline", "guided", "delta", "heuristics", "checks", "paths"], "properties": { "schemaVersion": { "type": "number", - "const": 2 + "const": 3 }, "surfaceId": { "type": "string", @@ -115,6 +115,9 @@ } } }, + "heuristics": { + "$ref": "#/$defs/comparisonHeuristics" + }, "checks": { "type": "object", "additionalProperties": false, @@ -158,6 +161,95 @@ } }, "$defs": { + "assessmentHeuristics": { + "type": "object", + "additionalProperties": false, + "properties": { + "unresolvedAcceptedSuggestionCount": { + "type": "number", + "minimum": 0 + }, + "unresolvedAcceptedSuggestionRate": { + "type": ["number", "null"], + "minimum": 0 + }, + "noChangesAfterEditFailureCount": { + "type": "number", + "minimum": 0 + }, + "recoverableToolErrorCount": { + "type": "number", + "minimum": 0 + }, + "touchedFilesPerResolvedFinding": { + "type": ["number", "null"], + "minimum": 0 + } + } + }, + "sessionHeuristics": { + "type": "object", + "additionalProperties": false, + "required": ["latestAttempt", "repeatedFindingCarryoverCount", "rerunsToAcceptableOutcome"], + "properties": { + "latestAttempt": { + "$ref": "#/$defs/assessmentHeuristics" + }, + "repeatedFindingCarryoverCount": { + "type": "number", + "minimum": 0 + }, + "rerunsToAcceptableOutcome": { + "type": ["number", "null"], + "minimum": 0 + } + } + }, + "comparisonHeuristics": { + "type": "object", + "additionalProperties": false, + "required": ["baseline", "guided", "delta"], + "properties": { + "baseline": { + "$ref": "#/$defs/sessionHeuristics" + }, + "guided": { + "$ref": "#/$defs/sessionHeuristics" + }, + "delta": { + "type": "object", + "additionalProperties": false, + "required": [ + "unresolvedAcceptedSuggestionRate", + "noChangesAfterEditFailureCount", + "recoverableToolErrorCount", + "touchedFilesPerResolvedFinding", + "repeatedFindingCarryoverCount", + "rerunsToAcceptableOutcome" + ], + "properties": { + "unresolvedAcceptedSuggestionRate": { + "type": ["number", "null"] + }, + "noChangesAfterEditFailureCount": { + "type": "number" + }, + "recoverableToolErrorCount": { + "type": "number" + }, + "touchedFilesPerResolvedFinding": { + "type": ["number", "null"] + }, + "repeatedFindingCarryoverCount": { + "type": "number" + }, + "rerunsToAcceptableOutcome": { + "type": ["number", "null"] + } + } + } + } + }, "assessment": { "type": "object", "additionalProperties": false, @@ -193,6 +285,9 @@ "type": "string", "minLength": 1 } + }, + "heuristics": { + "$ref": "#/$defs/assessmentHeuristics" } } }, @@ -243,7 +338,7 @@ "sessionSnapshot": { "type": "object", "additionalProperties": false, - "required": ["sessionId", "sessionDir", "guidanceMode", "attemptCount", "firstAcceptableAttempt", "latestOutcome", "firstAttempt", "latestAttempt", "recurringFindingCodes", "recurringRepairCodes"], + "required": ["sessionId", "sessionDir", "guidanceStrategy", "attemptCount", "firstAcceptableAttempt", "latestOutcome", "firstAttempt", "latestAttempt", "recurringFindingCodes", "recurringRepairCodes", "heuristics"], "properties": { "sessionId": { "type": "string", @@ -253,9 +348,9 @@ "type": "string", "minLength": 1 }, - "guidanceMode": { + "guidanceStrategy": { "type": "string", - "enum": ["prepared", "unguided"] + "enum": ["prompt-summary", "json-primary", "unguided"] }, "attemptCount": { "type": "number", @@ -277,50 +372,59 @@ "recurringFindingCodes": { "type": "array", "items": { - "type": "object", - "additionalProperties": false, - "required": ["code", "count"], - "properties": { - "code": { - "type": "string", - "minLength": 1 - }, - "count": { - "type": "number", - "minimum": 2 - } - } + "$ref": "#/$defs/recurringCount" } }, "recurringRepairCodes": { "type": "array", "items": { - "type": "object", - "additionalProperties": false, - "required": ["code", "count", "priority", "category", "actionType"], - "properties": { - "code": { - "type": "string", - "minLength": 1 - }, - "count": { - "type": "number", - "minimum": 2 - }, - "priority": { - "type": "string", - "minLength": 1 - }, - "category": { - "type": "string", - "minLength": 1 - }, - "actionType": { - "type": "string", - "minLength": 1 - } - } + "$ref": "#/$defs/recurringRepair" } + }, + "heuristics": { + "$ref": "#/$defs/sessionHeuristics" + } + } + }, + "recurringCount": { + "type": "object", + "additionalProperties": false, + "required": ["code", "count"], + "properties": { + "code": { + "type": "string", + "minLength": 1 + }, + "count": { + "type": "number", + "minimum": 2 + } + } + }, + "recurringRepair": { + "type": "object", + "additionalProperties": false, + "required": ["code", "count", "priority", "category", "actionType"], + "properties": { + "code": { + "type": "string", + "minLength": 1 + }, + "count": { + "type": "number", + "minimum": 2 + }, + "priority": { + "type": "string", + "minLength": 1 + }, + "category": { + "type": "string", + "minLength": 1 + }, + "actionType": { + "type": "string", + "minLength": 1 } } }, diff --git a/packages/interfacectl-cli/schemas/generation-session-summary.schema.json b/packages/interfacectl-cli/schemas/generation-session-summary.schema.json index cc67ea4..ea99d19 100644 --- a/packages/interfacectl-cli/schemas/generation-session-summary.schema.json +++ b/packages/interfacectl-cli/schemas/generation-session-summary.schema.json @@ -9,7 +9,7 @@ "surfaceId", "sessionId", "tool", - "guidanceMode", + "guidanceStrategy", "attemptCount", "firstPassAttempt", "firstAcceptableAttempt", @@ -19,6 +19,7 @@ "recurringRepairCodes", "latestAssessment", "latestReview", + "heuristics", "successRule", "paths", "attempts" @@ -26,7 +27,7 @@ "properties": { "schemaVersion": { "type": "number", - "const": 3 + "const": 4 }, "surfaceId": { "type": "string", @@ -40,9 +41,9 @@ "type": "string", "enum": ["codex", "cursor", "local-llm"] }, - "guidanceMode": { + "guidanceStrategy": { "type": "string", - "enum": ["prepared", "unguided"] + "enum": ["prompt-summary", "json-primary", "unguided"] }, "attemptCount": { "type": "number", @@ -65,49 +66,13 @@ "recurringFindingCodes": { "type": "array", "items": { - "type": "object", - "additionalProperties": false, - "required": ["code", "count"], - "properties": { - "code": { - "type": "string", - "minLength": 1 - }, - "count": { - "type": "number", - "minimum": 2 - } - } + "$ref": "#/$defs/recurringCount" } }, "recurringRepairCodes": { "type": "array", "items": { - "type": "object", - "additionalProperties": false, - "required": ["code", "count", "priority", "category", "actionType"], - "properties": { - "code": { - "type": "string", - "minLength": 1 - }, - "count": { - "type": "number", - "minimum": 2 - }, - "priority": { - "type": "string", - "minLength": 1 - }, - "category": { - "type": "string", - "minLength": 1 - }, - "actionType": { - "type": "string", - "minLength": 1 - } - } + "$ref": "#/$defs/recurringRepair" } }, "latestAssessment": { @@ -116,42 +81,7 @@ "type": "null" }, { - "type": "object", - "additionalProperties": false, - "required": ["structure", "components", "boundary", "visual", "responsiveness", "notes"], - "properties": { - "structure": { - "type": "string", - "enum": ["strong", "partial", "weak"] - }, - "components": { - "type": "string", - "enum": ["strong", "partial", "weak"] - }, - "boundary": { - "type": "string", - "enum": ["strong", "partial", "weak"] - }, - "visual": { - "type": "string", - "enum": ["strong", "partial", "weak"] - }, - "responsiveness": { - "type": "string", - "enum": ["strong", "partial", "weak"] - }, - "notes": { - "type": "string", - "minLength": 1 - }, - "touchedFiles": { - "type": "array", - "items": { - "type": "string", - "minLength": 1 - } - } - } + "$ref": "#/$defs/assessment" } ] }, @@ -161,79 +91,23 @@ "type": "null" }, { - "type": "object", - "additionalProperties": false, - "required": ["schemaVersion", "surfaceId", "sessionId", "attemptNumber", "status", "findingCodes", "rationale", "reviewedAt"], - "properties": { - "schemaVersion": { - "type": "number", - "const": 1 - }, - "surfaceId": { - "type": "string", - "minLength": 1 - }, - "sessionId": { - "type": "string", - "minLength": 1 - }, - "attemptNumber": { - "type": "number", - "minimum": 1 - }, - "status": { - "type": "string", - "enum": ["accepted", "rejected"] - }, - "findingCodes": { - "type": "array", - "items": { - "type": "string", - "minLength": 1 - } - }, - "rationale": { - "type": "string", - "minLength": 1 - }, - "reviewedAt": { - "type": "string", - "format": "date-time" - } - } + "$ref": "#/$defs/review" } ] }, + "heuristics": { + "$ref": "#/$defs/sessionHeuristics" + }, "brief": { - "type": "object", - "additionalProperties": false, - "required": ["path", "sha256"], - "properties": { - "path": { - "type": "string", - "minLength": 1 - }, - "sha256": { - "type": "string", - "minLength": 1 - } - } + "$ref": "#/$defs/brief" }, "successRule": { - "type": "object", - "additionalProperties": false, - "required": ["finalStatus"], - "properties": { - "finalStatus": { - "type": "string", - "const": "pass-or-reviewed-warn" - } - } + "$ref": "#/$defs/successRule" }, "paths": { "type": "object", "additionalProperties": false, - "required": ["sessionPath", "bundleRoot", "preparedInputPath"], + "required": ["sessionPath", "bundleRoot", "preparedInputPath", "guidanceHandoffPath"], "properties": { "sessionPath": { "type": "string", @@ -246,6 +120,10 @@ "preparedInputPath": { "type": ["string", "null"], "minLength": 1 + }, + "guidanceHandoffPath": { + "type": ["string", "null"], + "minLength": 1 } } }, @@ -316,6 +194,201 @@ } }, "$defs": { + "brief": { + "type": "object", + "additionalProperties": false, + "required": ["path", "sha256"], + "properties": { + "path": { + "type": "string", + "minLength": 1 + }, + "sha256": { + "type": "string", + "minLength": 1 + } + } + }, + "successRule": { + "type": "object", + "additionalProperties": false, + "required": ["finalStatus"], + "properties": { + "finalStatus": { + "type": "string", + "const": "pass-or-reviewed-warn" + } + } + }, + "assessmentHeuristics": { + "type": "object", + "additionalProperties": false, + "properties": { + "unresolvedAcceptedSuggestionCount": { + "type": "number", + "minimum": 0 + }, + "unresolvedAcceptedSuggestionRate": { + "type": ["number", "null"], + "minimum": 0 + }, + "noChangesAfterEditFailureCount": { + "type": "number", + "minimum": 0 + }, + "recoverableToolErrorCount": { + "type": "number", + "minimum": 0 + }, + "touchedFilesPerResolvedFinding": { + "type": ["number", "null"], + "minimum": 0 + } + } + }, + "assessment": { + "type": "object", + "additionalProperties": false, + "required": ["structure", "components", "boundary", "visual", "responsiveness", "notes"], + "properties": { + "structure": { + "type": "string", + "enum": ["strong", "partial", "weak"] + }, + "components": { + "type": "string", + "enum": ["strong", "partial", "weak"] + }, + "boundary": { + "type": "string", + "enum": ["strong", "partial", "weak"] + }, + "visual": { + "type": "string", + "enum": ["strong", "partial", "weak"] + }, + "responsiveness": { + "type": "string", + "enum": ["strong", "partial", "weak"] + }, + "notes": { + "type": "string", + "minLength": 1 + }, + "touchedFiles": { + "type": "array", + "items": { + "type": "string", + "minLength": 1 + } + }, + "heuristics": { + "$ref": "#/$defs/assessmentHeuristics" + } + } + }, + "review": { + "type": "object", + "additionalProperties": false, + "required": ["schemaVersion", "surfaceId", "sessionId", "attemptNumber", "status", "findingCodes", "rationale", "reviewedAt"], + "properties": { + "schemaVersion": { + "type": "number", + "const": 1 + }, + "surfaceId": { + "type": "string", + "minLength": 1 + }, + "sessionId": { + "type": "string", + "minLength": 1 + }, + "attemptNumber": { + "type": "number", + "minimum": 1 + }, + "status": { + "type": "string", + "enum": ["accepted", "rejected"] + }, + "findingCodes": { + "type": "array", + "items": { + "type": "string", + "minLength": 1 + } + }, + "rationale": { + "type": "string", + "minLength": 1 + }, + "reviewedAt": { + "type": "string", + "format": "date-time" + } + } + }, + "recurringCount": { + "type": "object", + "additionalProperties": false, + "required": ["code", "count"], + "properties": { + "code": { + "type": "string", + "minLength": 1 + }, + "count": { + "type": "number", + "minimum": 2 + } + } + }, + "recurringRepair": { + "type": "object", + "additionalProperties": false, + "required": ["code", "count", "priority", "category", "actionType"], + "properties": { + "code": { + "type": "string", + "minLength": 1 + }, + "count": { + "type": "number", + "minimum": 2 + }, + "priority": { + "type": "string", + "minLength": 1 + }, + "category": { + "type": "string", + "minLength": 1 + }, + "actionType": { + "type": "string", + "minLength": 1 + } + } + }, + "sessionHeuristics": { + "type": "object", + "additionalProperties": false, + "required": ["latestAttempt", "repeatedFindingCarryoverCount", "rerunsToAcceptableOutcome"], + "properties": { + "latestAttempt": { + "$ref": "#/$defs/assessmentHeuristics" + }, + "repeatedFindingCarryoverCount": { + "type": "number", + "minimum": 0 + }, + "rerunsToAcceptableOutcome": { + "type": ["number", "null"], + "minimum": 0 + } + } + }, "previewRef": { "type": "object", "additionalProperties": false, diff --git a/packages/interfacectl-cli/schemas/generation-session.schema.json b/packages/interfacectl-cli/schemas/generation-session.schema.json index a98367b..b73d7e9 100644 --- a/packages/interfacectl-cli/schemas/generation-session.schema.json +++ b/packages/interfacectl-cli/schemas/generation-session.schema.json @@ -9,7 +9,7 @@ "surfaceId", "sessionId", "tool", - "guidanceMode", + "guidanceStrategy", "workspaceRoot", "sourceBundleRoot", "sessionDir", @@ -17,13 +17,14 @@ "preparedInputPath", "contractPath", "repairMapPath", + "guidanceArtifacts", "startedAt", "successRule" ], "properties": { "schemaVersion": { "type": "number", - "const": 2 + "const": 3 }, "surfaceId": { "type": "string", @@ -37,9 +38,9 @@ "type": "string", "enum": ["codex", "cursor", "local-llm"] }, - "guidanceMode": { + "guidanceStrategy": { "type": "string", - "enum": ["prepared", "unguided"] + "enum": ["prompt-summary", "json-primary", "unguided"] }, "workspaceRoot": { "type": "string", @@ -69,6 +70,17 @@ "type": "string", "minLength": 1 }, + "guidanceArtifacts": { + "type": "object", + "additionalProperties": false, + "required": ["baseHandoffPath"], + "properties": { + "baseHandoffPath": { + "type": ["string", "null"], + "minLength": 1 + } + } + }, "startedAt": { "type": "string", "format": "date-time" diff --git a/packages/interfacectl-cli/src/commands/generation-session.ts b/packages/interfacectl-cli/src/commands/generation-session.ts index d99971a..2793a46 100644 --- a/packages/interfacectl-cli/src/commands/generation-session.ts +++ b/packages/interfacectl-cli/src/commands/generation-session.ts @@ -20,7 +20,7 @@ import { stringifyDeterministicJson, writeDeterministicJsonSync } from "../utils type SessionTool = "codex" | "cursor" | "local-llm"; type AssessmentGrade = "strong" | "partial" | "weak"; type ValidateStatus = "pass" | "warn" | "block"; -type GuidanceMode = "prepared" | "unguided"; +type GuidanceStrategy = "prompt-summary" | "json-primary" | "unguided"; type SessionSuccessRule = "pass" | "pass-or-reviewed-warn"; type AttemptReviewStatus = "accepted" | "rejected"; type AttemptOutcome = ValidateStatus | "accepted-warn"; @@ -35,10 +35,20 @@ export interface InitGenerationSessionCommandOptions { tool?: string; sessionId?: string; artifactsRoot?: string; + guidanceStrategy?: string; guidanceMode?: string; briefFile?: string; } +export interface PrepareGenerationHandoffCommandOptions { + sessionDir?: string; + guidanceStrategy?: string; + acceptedSuggestionsFile?: string; + designerNotesFile?: string; + findingCodes?: string; + outPath?: string; +} + export interface RecordGenerationAttemptCommandOptions { sessionDir?: string; assessmentFile?: string; @@ -93,6 +103,7 @@ interface GenerationAssessment { responsiveness: AssessmentGrade; notes: string; touchedFiles?: string[]; + heuristics?: GenerationAssessmentHeuristics; } interface GenerationBrief { @@ -101,11 +112,11 @@ interface GenerationBrief { } interface GenerationSession { - schemaVersion: 2; + schemaVersion: 3; surfaceId: string; sessionId: string; tool: SessionTool; - guidanceMode: GuidanceMode; + guidanceStrategy: GuidanceStrategy; workspaceRoot: string; sourceBundleRoot: string; sessionDir: string; @@ -113,6 +124,9 @@ interface GenerationSession { preparedInputPath: string | null; contractPath: string; repairMapPath: string; + guidanceArtifacts: { + baseHandoffPath: string | null; + }; startedAt: string; brief?: GenerationBrief; successRule: { @@ -149,12 +163,12 @@ interface GenerationAttemptPreview { } interface GenerationSessionAttemptMetadata { - schemaVersion: 2; + schemaVersion: 3; surfaceId: string; sessionId: string; attemptNumber: number; tool: SessionTool; - guidanceMode: GuidanceMode; + guidanceStrategy: GuidanceStrategy; createdAt: string; validateStatus: ValidateStatus; validateExitCode: number; @@ -162,6 +176,7 @@ interface GenerationSessionAttemptMetadata { assessmentPath: string; validatePath: string; touchedFiles: string[]; + guidanceHandoffPath: string | null; contractRun: { deduped: boolean; runId: string; @@ -172,11 +187,11 @@ interface GenerationSessionAttemptMetadata { } interface GenerationSessionSummary { - schemaVersion: 3; + schemaVersion: 4; surfaceId: string; sessionId: string; tool: SessionTool; - guidanceMode: GuidanceMode; + guidanceStrategy: GuidanceStrategy; attemptCount: number; firstPassAttempt: number | null; firstAcceptableAttempt: number | null; @@ -192,6 +207,7 @@ interface GenerationSessionSummary { }>; latestAssessment: GenerationAssessment | null; latestReview: GenerationAttemptReview | null; + heuristics: GenerationSessionHeuristics; brief?: GenerationBrief; successRule: { finalStatus: SessionSuccessRule; @@ -200,6 +216,7 @@ interface GenerationSessionSummary { sessionPath: string; bundleRoot: string; preparedInputPath: string | null; + guidanceHandoffPath: string | null; }; attempts: Array<{ attemptNumber: number; @@ -226,7 +243,7 @@ interface GenerationSessionSummary { interface SessionComparisonSnapshot { sessionId: string; sessionDir: string; - guidanceMode: GuidanceMode; + guidanceStrategy: GuidanceStrategy; attemptCount: number; firstAcceptableAttempt: number | null; latestOutcome: AttemptOutcome; @@ -240,6 +257,7 @@ interface SessionComparisonSnapshot { category: string; actionType: string; }>; + heuristics: GenerationSessionHeuristics; } interface ComparisonAttemptSnapshot { @@ -262,7 +280,7 @@ interface ComparisonAttemptSnapshot { } interface GenerationSessionComparison { - schemaVersion: 2; + schemaVersion: 3; surfaceId: string; tool: SessionTool; brief: { @@ -286,12 +304,13 @@ interface GenerationSessionComparison { guided: number | null; delta: number | null; }; - rubric: Record; }; + heuristics: GenerationComparisonHeuristics; checks: { guidedFewerFirstAttemptBlockingFindings: boolean; guidedReachedAcceptableNoLater: boolean; @@ -331,11 +350,11 @@ interface ContractDeltaSuggestion { } interface ContractDeltaSuggestionsArtifact { - schemaVersion: 1; + schemaVersion: 2; surfaceId: string; sessionId: string; tool: SessionTool; - guidanceMode: GuidanceMode; + guidanceStrategy: GuidanceStrategy; generatedAt: string; contract: { path: string; @@ -349,16 +368,19 @@ interface ContractDeltaSuggestionsArtifact { } interface GenerationBenchmarkReport { - schemaVersion: 1; + schemaVersion: 2; generatedAt: string; comparisons: Array<{ surfaceId: string; tool: SessionTool; comparisonPath: string; meetsGoal: boolean; + baselineGuidanceStrategy: GuidanceStrategy; + guidedGuidanceStrategy: GuidanceStrategy; guidedFewerFirstAttemptBlockingFindings: boolean; guidedReachedAcceptableNoLater: boolean; guidedRubricBetterDimensions: AssessmentDimension[]; + heuristics: GenerationComparisonHeuristics["delta"]; }>; suggestions: Array<{ surfaceId: string; @@ -376,7 +398,99 @@ interface GenerationBenchmarkReport { acceptedSuggestionCount: number; rejectedSuggestionCount: number; proposedSuggestionCount: number; + heuristics: GenerationBenchmarkHeuristicsSummary; + }; +} + +interface GenerationAssessmentHeuristics { + unresolvedAcceptedSuggestionCount?: number; + unresolvedAcceptedSuggestionRate?: number | null; + noChangesAfterEditFailureCount?: number; + recoverableToolErrorCount?: number; + touchedFilesPerResolvedFinding?: number | null; +} + +interface GenerationSessionHeuristics { + latestAttempt: GenerationAssessmentHeuristics; + repeatedFindingCarryoverCount: number; + rerunsToAcceptableOutcome: number | null; +} + +interface GenerationComparisonHeuristics { + baseline: GenerationSessionHeuristics; + guided: GenerationSessionHeuristics; + delta: { + unresolvedAcceptedSuggestionRate: number | null; + noChangesAfterEditFailureCount: number; + recoverableToolErrorCount: number; + touchedFilesPerResolvedFinding: number | null; + repeatedFindingCarryoverCount: number; + rerunsToAcceptableOutcome: number | null; + }; +} + +interface GenerationBenchmarkHeuristicsSummary { + lowerUnresolvedAcceptedSuggestionRate: number; + lowerNoChangesAfterEditFailureCount: number; + lowerRecoverableToolErrorCount: number; + lowerTouchedFilesPerResolvedFinding: number; + lowerRepeatedFindingCarryoverCount: number; + lowerRerunsToAcceptableOutcome: number; + averageDelta: { + unresolvedAcceptedSuggestionRate: number | null; + noChangesAfterEditFailureCount: number | null; + recoverableToolErrorCount: number | null; + touchedFilesPerResolvedFinding: number | null; + repeatedFindingCarryoverCount: number | null; + rerunsToAcceptableOutcome: number | null; + }; +} + +interface RuntimeAcceptedSuggestion { + findingCode: string; + findingMessage: string; + summary: string; + suggestedPath: string; + rationale?: string; +} + +interface GenerationGuidanceHandoff { + schemaVersion: 1; + surfaceId: string; + sessionId: string; + tool: SessionTool; + guidanceStrategy: GuidanceStrategy; + generatedAt: string; + brief: ({ + text: string; + } & GenerationBrief) | null; + session: { + sessionPath: string; + preparedInputPath: string | null; + contractPath: string; + repairMapPath: string; + }; + runtimeGuidance: { + findingCodes: string[]; + matchedRepairCodes: string[]; + acceptedSuggestions: RuntimeAcceptedSuggestion[]; + designerNotes: string[]; }; + promptSummary: { + effectiveContractSummary: string; + preparedGuidanceSummary: string; + } | null; + jsonPrimary: { + surface: Record; + contract: Record; + summary: Record; + generation: Record; + constraints: Record; + sections: Array>; + components: Array>; + repairMap: Array>; + matchedRepairs: Array>; + } | null; } interface LoadedAttempt { @@ -395,7 +509,7 @@ interface LoadedAttempt { const VALID_TOOLS = new Set(["codex", "cursor", "local-llm"]); const VALID_GRADES = new Set(["strong", "partial", "weak"]); -const VALID_GUIDANCE_MODES = new Set(["prepared", "unguided"]); +const VALID_GUIDANCE_STRATEGIES = new Set(["prompt-summary", "json-primary", "unguided"]); const VALID_REVIEW_STATUSES = new Set(["accepted", "rejected"]); const VALID_SUGGESTION_STATUSES = new Set(["proposed", "accepted", "rejected"]); const VALID_SUCCESS_RULES = new Set(["pass", "pass-or-reviewed-warn"]); @@ -476,14 +590,15 @@ function ensureSessionTool(tool?: string): SessionTool { return normalized as SessionTool; } -function ensureGuidanceMode(guidanceMode?: string): GuidanceMode { - const normalized = typeof guidanceMode === "string" ? guidanceMode.trim().toLowerCase() : "prepared"; - if (!VALID_GUIDANCE_MODES.has(normalized as GuidanceMode)) { +function ensureGuidanceStrategy(guidanceStrategy?: string): GuidanceStrategy { + const normalized = typeof guidanceStrategy === "string" ? guidanceStrategy.trim().toLowerCase() : "prompt-summary"; + const mapped = normalized === "prepared" ? "prompt-summary" : normalized; + if (!VALID_GUIDANCE_STRATEGIES.has(mapped as GuidanceStrategy)) { throw new SessionInputError( - `Invalid --guidance-mode value "${guidanceMode ?? ""}". Expected prepared|unguided.`, + `Invalid guidance strategy "${guidanceStrategy ?? ""}". Expected prompt-summary|json-primary|unguided.`, ); } - return normalized as GuidanceMode; + return mapped as GuidanceStrategy; } function buildDefaultSessionId(): string { @@ -503,6 +618,7 @@ function getSessionPaths(sessionDir: string) { sessionPath: path.join(sessionDir, "session.json"), bundleRoot: path.join(sessionDir, "bundle"), preparedInputPath: path.join(sessionDir, "prepared-input.json"), + guidanceHandoffPath: path.join(sessionDir, "guidance-handoff.json"), attemptsDir: path.join(sessionDir, "attempts"), summaryJsonPath: path.join(sessionDir, "summary.json"), summaryMarkdownPath: path.join(sessionDir, "summary.md"), @@ -559,6 +675,36 @@ function normalizeAssessment( )].sort((left, right) => left.localeCompare(right)); } + let heuristics: GenerationAssessmentHeuristics | undefined; + if (payload.heuristics !== undefined) { + const candidate = asRecord(payload.heuristics); + heuristics = {}; + const numericField = (key: keyof GenerationAssessmentHeuristics, allowNull = false) => { + const value = candidate[key]; + if (value === undefined) { + return; + } + if (value === null && allowNull) { + (heuristics as Record)[key] = null; + return; + } + if (typeof value !== "number" || !Number.isFinite(value)) { + throw new SessionInputError(`Assessment heuristic "${String(key)}" must be a finite number${allowNull ? " or null" : ""}: ${filePath}.`); + } + (heuristics as Record)[key] = value; + }; + + numericField("unresolvedAcceptedSuggestionCount"); + numericField("unresolvedAcceptedSuggestionRate", true); + numericField("noChangesAfterEditFailureCount"); + numericField("recoverableToolErrorCount"); + numericField("touchedFilesPerResolvedFinding", true); + + if (Object.keys(heuristics).length === 0) { + heuristics = undefined; + } + } + return { structure: grade("structure"), components: grade("components"), @@ -567,6 +713,7 @@ function normalizeAssessment( responsiveness: grade("responsiveness"), notes, ...(touchedFiles && touchedFiles.length > 0 ? { touchedFiles } : {}), + ...(heuristics ? { heuristics } : {}), }; } @@ -714,12 +861,14 @@ function loadSession(sessionDirInput: string): { session: GenerationSession; pat const payload = readJsonFile(paths.sessionPath, "generation session"); const schemaVersion = Number(payload.schemaVersion ?? 1); - if (schemaVersion !== 1 && schemaVersion !== 2) { + if (schemaVersion !== 1 && schemaVersion !== 2 && schemaVersion !== 3) { throw new SessionInputError(`Unsupported generation session schemaVersion "${String(payload.schemaVersion ?? "unknown")}".`); } const tool = ensureSessionTool(asString(payload.tool)); - const guidanceMode = ensureGuidanceMode(asString(payload.guidanceMode) ?? "prepared"); + const guidanceStrategy = ensureGuidanceStrategy( + asString(payload.guidanceStrategy) ?? asString(payload.guidanceMode) ?? "prompt-summary", + ); const finalStatus = asString(asRecord(payload.successRule).finalStatus) ?? "pass"; if (!VALID_SUCCESS_RULES.has(finalStatus as SessionSuccessRule)) { throw new SessionInputError(`Unsupported session successRule.finalStatus "${finalStatus}".`); @@ -728,12 +877,13 @@ function loadSession(sessionDirInput: string): { session: GenerationSession; pat const briefRecord = asRecord(payload.brief); const briefPath = asString(briefRecord.path); const briefSha256 = asString(briefRecord.sha256); + const guidanceArtifacts = asRecord(payload.guidanceArtifacts); const session: GenerationSession = { - schemaVersion: 2, + schemaVersion: 3, surfaceId: asString(payload.surfaceId) ?? "", sessionId: asString(payload.sessionId) ?? "", tool, - guidanceMode, + guidanceStrategy, workspaceRoot: asString(payload.workspaceRoot) ?? "", sourceBundleRoot: asString(payload.sourceBundleRoot) ?? "", sessionDir: asString(payload.sessionDir) ?? sessionDir, @@ -741,6 +891,13 @@ function loadSession(sessionDirInput: string): { session: GenerationSession; pat preparedInputPath: typeof payload.preparedInputPath === "string" ? payload.preparedInputPath : null, contractPath: asString(payload.contractPath) ?? "", repairMapPath: asString(payload.repairMapPath) ?? "", + guidanceArtifacts: { + baseHandoffPath: typeof guidanceArtifacts.baseHandoffPath === "string" + ? guidanceArtifacts.baseHandoffPath + : fs.existsSync(paths.guidanceHandoffPath) + ? paths.guidanceHandoffPath + : null, + }, startedAt: asString(payload.startedAt) ?? "", ...(briefPath && briefSha256 ? { brief: { path: briefPath, sha256: briefSha256 } } : {}), successRule: { @@ -748,7 +905,15 @@ function loadSession(sessionDirInput: string): { session: GenerationSession; pat }, }; - if (!session.surfaceId || !session.sessionId || !session.workspaceRoot || !session.bundleRoot || !session.contractPath || !session.repairMapPath || !session.startedAt) { + if ( + !session.surfaceId + || !session.sessionId + || !session.workspaceRoot + || !session.bundleRoot + || !session.contractPath + || !session.repairMapPath + || !session.startedAt + ) { throw new SessionInputError(`Generation session is missing required fields: ${paths.sessionPath}.`); } @@ -858,6 +1023,42 @@ function buildRecurringCounts(values: string[]) { .sort((left, right) => right.count - left.count || left.code.localeCompare(right.code)); } +function repeatedFindingCarryoverCount(recurringFindingCodes: Array<{ code: string; count: number }>): number { + return recurringFindingCodes.reduce((total, entry) => total + Math.max(0, entry.count - 1), 0); +} + +function rerunsToAcceptableOutcome(firstAcceptableAttempt: number | null): number | null { + if (firstAcceptableAttempt === null) { + return null; + } + return Math.max(0, firstAcceptableAttempt - 1); +} + +function numericHeuristicDelta( + baseline: number | null | undefined, + guided: number | null | undefined, +): number | null { + if (baseline === null || baseline === undefined || guided === null || guided === undefined) { + return null; + } + return guided - baseline; +} + +function countHeuristicImprovement(values: Array): number { + return values.reduce( + (total, value) => total + (typeof value === "number" && value < 0 ? 1 : 0), + 0, + ); +} + +function averageNullable(values: Array): number | null { + const filtered = values.filter((value): value is number => typeof value === "number" && Number.isFinite(value)); + if (filtered.length === 0) { + return null; + } + return Math.round((filtered.reduce((sum, value) => sum + value, 0) / filtered.length) * 1000) / 1000; +} + function renderSummaryMarkdown(summary: GenerationSessionSummary): string { const lines = [ "# Generation Session Summary", @@ -865,7 +1066,7 @@ function renderSummaryMarkdown(summary: GenerationSessionSummary): string { `Surface: ${summary.surfaceId}`, `Session: ${summary.sessionId}`, `Tool: ${summary.tool}`, - `Guidance mode: ${summary.guidanceMode}`, + `Guidance strategy: ${summary.guidanceStrategy}`, `Latest status: ${summary.latestStatus}`, `Latest outcome: ${summary.latestOutcome}`, `Attempts: ${summary.attemptCount}`, @@ -903,11 +1104,30 @@ function renderSummaryMarkdown(summary: GenerationSessionSummary): string { if (summary.latestAssessment?.touchedFiles?.length) { lines.push(`- touched files: ${summary.latestAssessment.touchedFiles.join(", ")}`); } + if (summary.latestAssessment?.heuristics) { + if (typeof summary.latestAssessment.heuristics.unresolvedAcceptedSuggestionRate === "number") { + lines.push(`- unresolved accepted suggestion rate: ${summary.latestAssessment.heuristics.unresolvedAcceptedSuggestionRate}`); + } + if (typeof summary.latestAssessment.heuristics.noChangesAfterEditFailureCount === "number") { + lines.push(`- noChanges-after-edit failures: ${summary.latestAssessment.heuristics.noChangesAfterEditFailureCount}`); + } + if (typeof summary.latestAssessment.heuristics.recoverableToolErrorCount === "number") { + lines.push(`- recoverable tool errors: ${summary.latestAssessment.heuristics.recoverableToolErrorCount}`); + } + if (typeof summary.latestAssessment.heuristics.touchedFilesPerResolvedFinding === "number") { + lines.push(`- touched files per resolved finding: ${summary.latestAssessment.heuristics.touchedFilesPerResolvedFinding}`); + } + } if (summary.latestReview) { lines.push(`- latest review: ${summary.latestReview.status} (${summary.latestReview.findingCodes.join(", ")})`); lines.push(`- review rationale: ${summary.latestReview.rationale}`); } + lines.push("", "## Heuristics"); + lines.push(`- repeated finding carryover count: ${summary.heuristics.repeatedFindingCarryoverCount}`); + lines.push(`- reruns to acceptable outcome: ${summary.heuristics.rerunsToAcceptableOutcome ?? "n/a"}`); + lines.push(`- base guidance handoff: ${summary.paths.guidanceHandoffPath ?? "none"}`); + return `${lines.join("\n")}\n`; } @@ -1023,13 +1243,18 @@ function buildGenerationSessionSummary(sessionDirInput: string) { parseFindingCodes(latestAttempt.validate), session.successRule.finalStatus, ); + const heuristics: GenerationSessionHeuristics = { + latestAttempt: latestAssessment.heuristics ?? {}, + repeatedFindingCarryoverCount: repeatedFindingCarryoverCount(recurringFindingCodes), + rerunsToAcceptableOutcome: rerunsToAcceptableOutcome(firstAcceptableAttempt), + }; const summary: GenerationSessionSummary = { - schemaVersion: 3, + schemaVersion: 4, surfaceId: session.surfaceId, sessionId: session.sessionId, tool: session.tool, - guidanceMode: session.guidanceMode, + guidanceStrategy: session.guidanceStrategy, attemptCount: attempts.length, firstPassAttempt, firstAcceptableAttempt, @@ -1039,12 +1264,14 @@ function buildGenerationSessionSummary(sessionDirInput: string) { recurringRepairCodes, latestAssessment, latestReview: latestAttempt.review, + heuristics, ...(session.brief ? { brief: session.brief } : {}), successRule: session.successRule, paths: { sessionPath: paths.sessionPath, bundleRoot: session.bundleRoot, preparedInputPath: session.preparedInputPath, + guidanceHandoffPath: session.guidanceArtifacts.baseHandoffPath, }, attempts: attempts.map((attempt) => { const status = attempt.validate.status; @@ -1133,19 +1360,19 @@ function renderComparisonMarkdown(comparison: GenerationSessionComparison): stri "", `Surface: ${comparison.surfaceId}`, `Tool: ${comparison.tool}`, - `Baseline session: ${comparison.baseline.sessionId}`, - `Guided session: ${comparison.guided.sessionId}`, + `Baseline session: ${comparison.baseline.sessionId} (${comparison.baseline.guidanceStrategy})`, + `Candidate session: ${comparison.guided.sessionId} (${comparison.guided.guidanceStrategy})`, `Meets goal: ${comparison.checks.meetsGoal ? "yes" : "no"}`, "", "## First attempt", `- baseline outcome: ${comparison.baseline.firstAttempt.outcome}`, - `- guided outcome: ${comparison.guided.firstAttempt.outcome}`, + `- candidate outcome: ${comparison.guided.firstAttempt.outcome}`, `- blocking finding delta: ${comparison.delta.firstAttemptBlockingFindingCountDelta}`, `- warning finding delta: ${comparison.delta.firstAttemptWarningFindingCountDelta}`, "", "## Convergence", `- baseline first acceptable attempt: ${comparison.baseline.firstAcceptableAttempt ?? "not reached"}`, - `- guided first acceptable attempt: ${comparison.guided.firstAcceptableAttempt ?? "not reached"}`, + `- candidate first acceptable attempt: ${comparison.guided.firstAcceptableAttempt ?? "not reached"}`, `- attempts-to-acceptable delta: ${comparison.delta.attemptsToAcceptableOutcome.delta ?? "n/a"}`, "", "## Rubric delta", @@ -1159,10 +1386,18 @@ function renderComparisonMarkdown(comparison: GenerationSessionComparison): stri if (comparison.checks.guidedRubricBetterDimensions.length > 0) { lines.push( "", - `Guided improved dimensions: ${comparison.checks.guidedRubricBetterDimensions.join(", ")}`, + `Candidate improved dimensions: ${comparison.checks.guidedRubricBetterDimensions.join(", ")}`, ); } + lines.push("", "## Heuristics"); + lines.push(`- unresolved accepted suggestion rate delta: ${comparison.heuristics.delta.unresolvedAcceptedSuggestionRate ?? "n/a"}`); + lines.push(`- noChanges-after-edit failure delta: ${comparison.heuristics.delta.noChangesAfterEditFailureCount}`); + lines.push(`- recoverable tool error delta: ${comparison.heuristics.delta.recoverableToolErrorCount}`); + lines.push(`- touched files per resolved finding delta: ${comparison.heuristics.delta.touchedFilesPerResolvedFinding ?? "n/a"}`); + lines.push(`- repeated finding carryover delta: ${comparison.heuristics.delta.repeatedFindingCarryoverCount}`); + lines.push(`- reruns to acceptable delta: ${comparison.heuristics.delta.rerunsToAcceptableOutcome ?? "n/a"}`); + return `${lines.join("\n")}\n`; } @@ -1173,7 +1408,7 @@ function renderSuggestionsMarkdown(artifact: ContractDeltaSuggestionsArtifact): `Surface: ${artifact.surfaceId}`, `Session: ${artifact.sessionId}`, `Tool: ${artifact.tool}`, - `Guidance mode: ${artifact.guidanceMode}`, + `Guidance strategy: ${artifact.guidanceStrategy}`, "", ]; @@ -1207,15 +1442,15 @@ function renderBenchmarkReportMarkdown(report: GenerationBenchmarkReport): strin `Generated at: ${report.generatedAt}`, `Surfaces: ${report.overall.surfaceCount}`, `Surfaces meeting goal: ${report.overall.surfacesMeetingGoal}`, - `Guided fewer first-attempt blocking findings: ${report.overall.guidedFewerFirstAttemptBlockingFindings}`, - `Guided reached acceptable no later: ${report.overall.guidedReachedAcceptableNoLater}`, + `Candidate fewer first-attempt blocking findings: ${report.overall.guidedFewerFirstAttemptBlockingFindings}`, + `Candidate reached acceptable no later: ${report.overall.guidedReachedAcceptableNoLater}`, "", "## Comparisons", ]; for (const comparison of report.comparisons) { lines.push( - `- ${comparison.surfaceId}: meetsGoal=${comparison.meetsGoal}, improved dimensions=${comparison.guidedRubricBetterDimensions.join(", ") || "none"}`, + `- ${comparison.surfaceId}: baseline=${comparison.baselineGuidanceStrategy}, candidate=${comparison.guidedGuidanceStrategy}, meetsGoal=${comparison.meetsGoal}, improved dimensions=${comparison.guidedRubricBetterDimensions.join(", ") || "none"}`, ); } @@ -1226,6 +1461,14 @@ function renderBenchmarkReportMarkdown(report: GenerationBenchmarkReport): strin ); } + lines.push("", "## Heuristic improvements"); + lines.push(`- lower unresolved accepted suggestion rate: ${report.overall.heuristics.lowerUnresolvedAcceptedSuggestionRate}`); + lines.push(`- lower noChanges-after-edit failures: ${report.overall.heuristics.lowerNoChangesAfterEditFailureCount}`); + lines.push(`- lower recoverable tool errors: ${report.overall.heuristics.lowerRecoverableToolErrorCount}`); + lines.push(`- lower touched files per resolved finding: ${report.overall.heuristics.lowerTouchedFilesPerResolvedFinding}`); + lines.push(`- lower repeated finding carryover count: ${report.overall.heuristics.lowerRepeatedFindingCarryoverCount}`); + lines.push(`- lower reruns to acceptable outcome: ${report.overall.heuristics.lowerRerunsToAcceptableOutcome}`); + return `${lines.join("\n")}\n`; } @@ -1262,6 +1505,284 @@ function defaultBenchmarkReportDir(comparisonPaths: string[]): string { ); } +function extractRepairEntries(repairMap: unknown): JsonRecord[] { + if (Array.isArray(repairMap)) { + return repairMap.filter((entry): entry is JsonRecord => isRecord(entry)); + } + const record = asRecord(repairMap); + const repairs = Array.isArray(record.repairs) ? record.repairs : []; + return repairs.filter((entry): entry is JsonRecord => isRecord(entry)); +} + +function summarizeContractForSurface(contractPath: string, surfaceId: string): string { + const payload = readJsonFile(contractPath, "generation session contract"); + const surfaces = Array.isArray(payload.surfaces) ? payload.surfaces.filter((entry): entry is JsonRecord => isRecord(entry)) : []; + const surface = surfaces.find((entry) => asString(entry.id) === surfaceId) ?? surfaces[0] ?? {}; + const sections = Array.isArray(payload.sections) ? payload.sections.filter((entry): entry is JsonRecord => isRecord(entry)) : []; + const color = asRecord(payload.color); + const layout = asRecord(surface.layout); + const requiredSections = asStringArray(surface.requiredSections ?? sections.map((entry) => entry.id)); + const allowedFonts = asStringArray(surface.allowedFonts); + const allowedColors = asStringArray(color.allowedValues); + const maxContentWidth = typeof layout.maxContentWidth === "number" ? layout.maxContentWidth : null; + + return [ + `${asString(payload.contractId) ?? surfaceId} v${asString(payload.version) ?? "0.0.0"}`, + asString(payload.description) ?? "Working contract for generation guidance.", + `Required sections: ${requiredSections.join(", ") || "none recorded"}`, + `Fonts: ${allowedFonts.join(", ") || "none recorded"}`, + `Max content width: ${maxContentWidth ?? "not specified"}`, + `Color policy: ${asString(color.policy) ?? "off"}`, + `Allowed colors: ${allowedColors.join(", ") || "none recorded"}`, + ].join("\n"); +} + +function buildPreparedPromptSummary(preparedPayload: ReturnType): string { + const generation = asRecord(preparedPayload.generation); + const structure = asRecord(generation.structure); + const layout = asRecord(generation.layout); + const visual = asRecord(generation.visual); + const guidance = asRecord(generation.guidance); + const constraints = asRecord(preparedPayload.constraints); + const color = asRecord(constraints.color); + const motion = asRecord(constraints.motion); + const sections = Array.isArray(preparedPayload.sections) ? preparedPayload.sections : []; + const repairs = extractRepairEntries(preparedPayload.repairMap); + const requiredSections = asStringArray(structure.requiredSectionIds); + const focusOrder = asStringArray(guidance.generationFocusOrder); + const allowedFonts = asStringArray(visual.allowedFonts); + const requiredContainers = asStringArray(layout.requiredContainers); + const topRepairs = repairs + .slice(0, 5) + .map((entry) => { + const code = asString(entry.code) ?? "unknown"; + const summary = asString(entry.summary) ?? ""; + return summary ? `${code}: ${summary}` : code; + }); + + return [ + `Contract: ${asString(preparedPayload.contract.id) ?? "unknown"} v${asString(preparedPayload.contract.version) ?? "0.0.0"}`, + `Focus order: ${focusOrder.join(", ") || "none"}`, + `Required sections: ${requiredSections.join(", ") || "none"}`, + `Section count: ${sections.length}`, + `Allowed fonts: ${allowedFonts.join(", ") || "none"}`, + `Max content width: ${typeof layout.maxContentWidth === "number" ? `${layout.maxContentWidth}px` : "unspecified"}`, + `Required containers: ${requiredContainers.join(", ") || "none"}`, + `Color policy: ${asString(color.policy) ?? "off"}`, + `Motion durations: ${ + Array.isArray(motion.allowedDurationsMs) + ? motion.allowedDurationsMs.map((value) => `${String(value)}ms`).join(", ") + : "none" + }`, + `Top repair priorities: ${topRepairs.join(", ") || "none"}`, + ].join("\n"); +} + +function selectRelevantComponents(preparedPayload: ReturnType): Array> { + const sections = Array.isArray(preparedPayload.sections) + ? preparedPayload.sections.filter((entry): entry is JsonRecord => isRecord(entry)) + : []; + const components = Array.isArray(preparedPayload.components) + ? preparedPayload.components.filter((entry): entry is JsonRecord => isRecord(entry)) + : []; + const referencedIds = new Set(); + + for (const section of sections) { + const anatomy = asRecord(section.anatomy); + const defaultComponentId = asString(anatomy.defaultComponentId); + if (defaultComponentId) { + referencedIds.add(defaultComponentId); + } + for (const componentId of asStringArray(anatomy.allowedComponentIds)) { + referencedIds.add(componentId); + } + const slots = Array.isArray(anatomy.slots) ? anatomy.slots : []; + for (const slot of slots) { + const slotRecord = asRecord(slot); + for (const componentId of asStringArray(slotRecord.acceptsComponentIds)) { + referencedIds.add(componentId); + } + } + } + + if (referencedIds.size === 0) { + return components.slice(0, 12); + } + return components.filter((component) => referencedIds.has(asString(component.id) ?? "")); +} + +function loadPreparedPayloadForSession(session: GenerationSession): ReturnType { + if (session.preparedInputPath && fs.existsSync(session.preparedInputPath)) { + return readJsonFile>(session.preparedInputPath, "prepared generation payload"); + } + const bundle = loadCompiledSurfaceBundle(session.bundleRoot, session.surfaceId, process.cwd()); + return buildPreparedGenerationPayload(bundle); +} + +function loadRuntimeAcceptedSuggestions(filePath?: string): RuntimeAcceptedSuggestion[] { + if (!filePath) { + return []; + } + const resolvedPath = path.resolve(filePath); + if (!fs.existsSync(resolvedPath)) { + throw new SessionInputError(`Accepted suggestions file not found at ${resolvedPath}.`); + } + let payload: unknown; + try { + payload = JSON.parse(fs.readFileSync(resolvedPath, "utf8")) as unknown; + } catch (error) { + throw new SessionInputError( + `Accepted suggestions file is not valid JSON: ${resolvedPath} (${error instanceof Error ? error.message : String(error)}).`, + ); + } + const payloadRecord = asRecord(payload); + const suggestions: unknown[] = Array.isArray(payload) + ? payload + : Array.isArray(payloadRecord.suggestions) + ? payloadRecord.suggestions + : []; + return suggestions + .filter((entry): entry is JsonRecord => isRecord(entry)) + .map((entry) => { + const findingCode = asString(entry.findingCode); + const findingMessage = asString(entry.findingMessage); + const summary = asString(entry.summary); + const suggestedPath = asString(entry.suggestedPath); + const rationale = asString(entry.rationale); + if (!findingCode || !findingMessage || !summary || !suggestedPath) { + throw new SessionInputError(`Accepted suggestion entries must include findingCode, findingMessage, summary, and suggestedPath: ${resolvedPath}.`); + } + return { + findingCode, + findingMessage, + summary, + suggestedPath, + ...(rationale ? { rationale } : {}), + }; + }); +} + +function loadRuntimeDesignerNotes(filePath?: string): string[] { + if (!filePath) { + return []; + } + const resolvedPath = path.resolve(filePath); + if (!fs.existsSync(resolvedPath)) { + throw new SessionInputError(`Designer notes file not found at ${resolvedPath}.`); + } + let payload: unknown; + try { + payload = JSON.parse(fs.readFileSync(resolvedPath, "utf8")) as unknown; + } catch (error) { + throw new SessionInputError( + `Designer notes file is not valid JSON: ${resolvedPath} (${error instanceof Error ? error.message : String(error)}).`, + ); + } + const payloadRecord = asRecord(payload); + const rawNotes: unknown[] = Array.isArray(payload) + ? payload + : Array.isArray(payloadRecord.designerNotes) + ? payloadRecord.designerNotes + : Array.isArray(payloadRecord.notes) + ? payloadRecord.notes + : []; + return [...new Set( + rawNotes + .map((entry) => { + if (typeof entry === "string") { + return entry.trim(); + } + if (isRecord(entry)) { + return asString(entry.content) ?? ""; + } + return ""; + }) + .filter(Boolean), + )]; +} + +function parseRuntimeFindingCodes(value?: string): string[] { + if (!value) { + return []; + } + return [...new Set( + value + .split(",") + .map((entry) => entry.trim()) + .filter(Boolean), + )].sort((left, right) => left.localeCompare(right)); +} + +function buildGuidanceHandoff( + session: GenerationSession, + paths: ReturnType, + guidanceStrategy: GuidanceStrategy, + options: { + acceptedSuggestions?: RuntimeAcceptedSuggestion[]; + designerNotes?: string[]; + findingCodes?: string[]; + } = {}, +): GenerationGuidanceHandoff { + const acceptedSuggestions = options.acceptedSuggestions ?? []; + const designerNotes = options.designerNotes ?? []; + const findingCodes = [...new Set([ + ...(options.findingCodes ?? []), + ...acceptedSuggestions.map((entry) => entry.findingCode), + ])].sort((left, right) => left.localeCompare(right)); + const preparedPayload = guidanceStrategy === "unguided" ? null : loadPreparedPayloadForSession(session); + const repairMap = preparedPayload ? extractRepairEntries(preparedPayload.repairMap) : []; + const matchedRepairs = repairMap.filter((entry) => findingCodes.includes(asString(entry.code) ?? "")); + const brief = session.brief && fs.existsSync(session.brief.path) + ? { + ...session.brief, + text: fs.readFileSync(session.brief.path, "utf8").trim(), + } + : null; + + return { + schemaVersion: 1, + surfaceId: session.surfaceId, + sessionId: session.sessionId, + tool: session.tool, + guidanceStrategy, + generatedAt: session.startedAt, + brief, + session: { + sessionPath: paths.sessionPath, + preparedInputPath: session.preparedInputPath, + contractPath: session.contractPath, + repairMapPath: session.repairMapPath, + }, + runtimeGuidance: { + findingCodes, + matchedRepairCodes: matchedRepairs.map((entry) => asString(entry.code) ?? "").filter(Boolean), + acceptedSuggestions, + designerNotes, + }, + promptSummary: guidanceStrategy === "prompt-summary" + ? { + effectiveContractSummary: summarizeContractForSurface(session.contractPath, session.surfaceId), + preparedGuidanceSummary: buildPreparedPromptSummary(preparedPayload!), + } + : null, + jsonPrimary: guidanceStrategy === "json-primary" + ? { + surface: asRecord(preparedPayload!.surface), + contract: asRecord(preparedPayload!.contract), + summary: asRecord(preparedPayload!.summary), + generation: asRecord(preparedPayload!.generation), + constraints: asRecord(preparedPayload!.constraints), + sections: Array.isArray(preparedPayload!.sections) + ? preparedPayload!.sections.filter((entry): entry is JsonRecord => isRecord(entry)) + : [], + components: selectRelevantComponents(preparedPayload!), + repairMap, + matchedRepairs, + } + : null, + }; +} + function parseCsvPaths(value?: string): string[] { if (!value) return []; return [...new Set( @@ -1286,12 +1807,6 @@ function buildComparisonArtifact( if (baselineBuilt.session.tool !== guidedBuilt.session.tool) { throw new SessionInputError("Baseline and guided sessions must use the same tool."); } - if (baselineBuilt.session.guidanceMode !== "unguided") { - throw new SessionInputError("Baseline session must use guidanceMode=unguided."); - } - if (guidedBuilt.session.guidanceMode !== "prepared") { - throw new SessionInputError("Guided session must use guidanceMode=prepared."); - } if (!baselineBuilt.session.brief || !guidedBuilt.session.brief) { throw new SessionInputError("Both sessions must freeze the same implementation brief before comparison."); } @@ -1343,9 +1858,36 @@ function buildComparisonArtifact( guidedBuilt.summary.firstAcceptableAttempt <= baselineBuilt.summary.firstAcceptableAttempt; const guidedFewerFirstAttemptBlockingFindings = guidedFirstAttempt.blockingFindingCount < baselineFirstAttempt.blockingFindingCount; + const heuristics: GenerationSessionComparison["heuristics"] = { + baseline: baselineBuilt.summary.heuristics, + guided: guidedBuilt.summary.heuristics, + delta: { + unresolvedAcceptedSuggestionRate: numericHeuristicDelta( + baselineBuilt.summary.heuristics.latestAttempt.unresolvedAcceptedSuggestionRate, + guidedBuilt.summary.heuristics.latestAttempt.unresolvedAcceptedSuggestionRate, + ), + noChangesAfterEditFailureCount: + (guidedBuilt.summary.heuristics.latestAttempt.noChangesAfterEditFailureCount ?? 0) - + (baselineBuilt.summary.heuristics.latestAttempt.noChangesAfterEditFailureCount ?? 0), + recoverableToolErrorCount: + (guidedBuilt.summary.heuristics.latestAttempt.recoverableToolErrorCount ?? 0) - + (baselineBuilt.summary.heuristics.latestAttempt.recoverableToolErrorCount ?? 0), + touchedFilesPerResolvedFinding: numericHeuristicDelta( + baselineBuilt.summary.heuristics.latestAttempt.touchedFilesPerResolvedFinding, + guidedBuilt.summary.heuristics.latestAttempt.touchedFilesPerResolvedFinding, + ), + repeatedFindingCarryoverCount: + guidedBuilt.summary.heuristics.repeatedFindingCarryoverCount - + baselineBuilt.summary.heuristics.repeatedFindingCarryoverCount, + rerunsToAcceptableOutcome: numericHeuristicDelta( + baselineBuilt.summary.heuristics.rerunsToAcceptableOutcome, + guidedBuilt.summary.heuristics.rerunsToAcceptableOutcome, + ), + }, + }; return { - schemaVersion: 2, + schemaVersion: 3, surfaceId: baselineBuilt.session.surfaceId, tool: baselineBuilt.session.tool, brief: { @@ -1356,7 +1898,7 @@ function buildComparisonArtifact( baseline: { sessionId: baselineBuilt.session.sessionId, sessionDir: baselineBuilt.session.sessionDir, - guidanceMode: baselineBuilt.session.guidanceMode, + guidanceStrategy: baselineBuilt.session.guidanceStrategy, attemptCount: baselineBuilt.summary.attemptCount, firstAcceptableAttempt: baselineBuilt.summary.firstAcceptableAttempt, latestOutcome: baselineBuilt.summary.latestOutcome, @@ -1364,11 +1906,12 @@ function buildComparisonArtifact( latestAttempt: baselineLatestAttempt, recurringFindingCodes: baselineBuilt.summary.recurringFindingCodes, recurringRepairCodes: baselineBuilt.summary.recurringRepairCodes, + heuristics: baselineBuilt.summary.heuristics, }, guided: { sessionId: guidedBuilt.session.sessionId, sessionDir: guidedBuilt.session.sessionDir, - guidanceMode: guidedBuilt.session.guidanceMode, + guidanceStrategy: guidedBuilt.session.guidanceStrategy, attemptCount: guidedBuilt.summary.attemptCount, firstAcceptableAttempt: guidedBuilt.summary.firstAcceptableAttempt, latestOutcome: guidedBuilt.summary.latestOutcome, @@ -1376,6 +1919,7 @@ function buildComparisonArtifact( latestAttempt: guidedLatestAttempt, recurringFindingCodes: guidedBuilt.summary.recurringFindingCodes, recurringRepairCodes: guidedBuilt.summary.recurringRepairCodes, + heuristics: guidedBuilt.summary.heuristics, }, delta: { firstAttemptVerdict: { @@ -1399,6 +1943,7 @@ function buildComparisonArtifact( }, rubric, }, + heuristics, checks: { guidedFewerFirstAttemptBlockingFindings, guidedReachedAcceptableNoLater, @@ -1485,8 +2030,8 @@ function getSuggestionSortKey(left: ContractDeltaSuggestion, right: ContractDelt function buildSuggestionArtifact(sessionDir: string): ContractDeltaSuggestionsArtifact { const built = buildGenerationSessionSummary(sessionDir); - if (built.session.guidanceMode !== "prepared") { - throw new SessionInputError("Contract delta suggestions require a guided prepared session."); + if (built.session.guidanceStrategy === "unguided") { + throw new SessionInputError("Contract delta suggestions require a guided session."); } const repairMapDoc = readJsonFile(built.session.repairMapPath, "repair map"); @@ -1560,11 +2105,11 @@ function buildSuggestionArtifact(sessionDir: string): ContractDeltaSuggestionsAr }).sort(getSuggestionSortKey); return { - schemaVersion: 1, + schemaVersion: 2, surfaceId: built.session.surfaceId, sessionId: built.session.sessionId, tool: built.session.tool, - guidanceMode: built.session.guidanceMode, + guidanceStrategy: built.session.guidanceStrategy, generatedAt: asString(latestAttempt.metadata.createdAt) ?? asString(latestAttempt.validate.provenance && asRecord(latestAttempt.validate.provenance).evaluatedAt) ?? @@ -1631,7 +2176,7 @@ export async function runInitGenerationSessionCommand( } const tool = ensureSessionTool(options.tool); - const guidanceMode = ensureGuidanceMode(options.guidanceMode); + const guidanceStrategy = ensureGuidanceStrategy(options.guidanceStrategy ?? options.guidanceMode); const workspaceRoot = path.resolve(options.workspaceRoot); if (!fs.existsSync(workspaceRoot) || !fs.statSync(workspaceRoot).isDirectory()) { throw new SessionInputError(`Workspace root directory not found at ${workspaceRoot}.`); @@ -1652,18 +2197,18 @@ export async function runInitGenerationSessionCommand( const sessionBundle = loadCompiledSurfaceBundle(paths.bundleRoot, options.surfaceId, process.cwd()); let preparedInputPath: string | null = null; - if (guidanceMode === "prepared") { + if (guidanceStrategy !== "unguided") { const preparedPayload = buildPreparedGenerationPayload(sessionBundle); writeDeterministicJsonSync(paths.preparedInputPath, preparedPayload); preparedInputPath = paths.preparedInputPath; } const session: GenerationSession = { - schemaVersion: 2, + schemaVersion: 3, surfaceId: options.surfaceId, sessionId, tool, - guidanceMode, + guidanceStrategy, workspaceRoot, sourceBundleRoot: loadedBundle.root, sessionDir: paths.sessionDir, @@ -1671,16 +2216,86 @@ export async function runInitGenerationSessionCommand( preparedInputPath, contractPath: sessionBundle.contract.path, repairMapPath: sessionBundle.surface.repairMap.path, + guidanceArtifacts: { + baseHandoffPath: paths.guidanceHandoffPath, + }, startedAt: new Date().toISOString(), ...(options.briefFile ? { brief: freezeBriefFile(paths.sessionDir, options.briefFile) } : {}), successRule: { finalStatus: "pass-or-reviewed-warn", }, }; + const handoff = buildGuidanceHandoff(session, paths, guidanceStrategy); + writeDeterministicJsonSync(paths.guidanceHandoffPath, handoff); writeDeterministicJsonSync(paths.sessionPath, session); process.stdout.write( - `${JSON.stringify({ ok: true, session, paths }, null, 2)}\n`, + `${JSON.stringify({ ok: true, session, handoff, paths }, null, 2)}\n`, + ); + return 0; + } catch (error) { + if (error instanceof SessionInputError || error instanceof AdapterInputError) { + writeError(error, error.code); + return 10; + } + writeError(error instanceof Error ? error : new Error(String(error)), "generation-session.internal"); + return 1; + } +} + +export async function runPrepareGenerationHandoffCommand( + options: PrepareGenerationHandoffCommandOptions, +): Promise { + try { + if (!options.sessionDir) { + throw new SessionInputError("--session-dir is required."); + } + + const { session, paths } = loadSession(options.sessionDir); + const guidanceStrategy = ensureGuidanceStrategy(options.guidanceStrategy ?? session.guidanceStrategy); + let preparedInputPath = session.preparedInputPath; + if (guidanceStrategy !== "unguided" && !preparedInputPath) { + const bundle = loadCompiledSurfaceBundle(session.bundleRoot, session.surfaceId, process.cwd()); + const preparedPayload = buildPreparedGenerationPayload(bundle); + writeDeterministicJsonSync(paths.preparedInputPath, preparedPayload); + preparedInputPath = paths.preparedInputPath; + } + + const sessionForHandoff: GenerationSession = { + ...session, + guidanceStrategy, + preparedInputPath, + guidanceArtifacts: { + baseHandoffPath: options.outPath ? path.resolve(options.outPath) : paths.guidanceHandoffPath, + }, + }; + const handoff = buildGuidanceHandoff(sessionForHandoff, paths, guidanceStrategy, { + acceptedSuggestions: loadRuntimeAcceptedSuggestions(options.acceptedSuggestionsFile), + designerNotes: loadRuntimeDesignerNotes(options.designerNotesFile), + findingCodes: parseRuntimeFindingCodes(options.findingCodes), + }); + const handoffPath = sessionForHandoff.guidanceArtifacts.baseHandoffPath ?? paths.guidanceHandoffPath; + writeDeterministicJsonSync(handoffPath, handoff); + + const updatedSession: GenerationSession = { + ...sessionForHandoff, + }; + writeDeterministicJsonSync(paths.sessionPath, updatedSession); + + process.stdout.write( + `${JSON.stringify( + { + ok: true, + handoff, + session: updatedSession, + paths: { + handoffPath, + sessionPath: paths.sessionPath, + }, + }, + null, + 2, + )}\n`, ); return 0; } catch (error) { @@ -1745,12 +2360,12 @@ export async function runRecordGenerationAttemptCommand( }); const metadata: GenerationSessionAttemptMetadata = { - schemaVersion: 2, + schemaVersion: 3, surfaceId: session.surfaceId, sessionId: session.sessionId, attemptNumber, tool: session.tool, - guidanceMode: session.guidanceMode, + guidanceStrategy: session.guidanceStrategy, createdAt: new Date().toISOString(), validateStatus: response.status, validateExitCode: response.status === "block" ? 30 : 0, @@ -1758,6 +2373,7 @@ export async function runRecordGenerationAttemptCommand( assessmentPath: attemptPaths.assessmentPath, validatePath: attemptPaths.validatePath, touchedFiles: assessment.touchedFiles ?? [], + guidanceHandoffPath: session.guidanceArtifacts.baseHandoffPath, contractRun, }; writeDeterministicJsonSync(attemptPaths.metadataPath, metadata); @@ -2191,16 +2807,19 @@ export async function runSummarizeGenerationBenchmarkCommand( })); const report: GenerationBenchmarkReport = { - schemaVersion: 1, + schemaVersion: 2, generatedAt: new Date().toISOString(), comparisons: comparisons.map(({ path: comparisonPath, value }) => ({ surfaceId: value.surfaceId, tool: value.tool, comparisonPath, meetsGoal: value.checks.meetsGoal, + baselineGuidanceStrategy: value.baseline.guidanceStrategy, + guidedGuidanceStrategy: value.guided.guidanceStrategy, guidedFewerFirstAttemptBlockingFindings: value.checks.guidedFewerFirstAttemptBlockingFindings, guidedReachedAcceptableNoLater: value.checks.guidedReachedAcceptableNoLater, guidedRubricBetterDimensions: value.checks.guidedRubricBetterDimensions, + heuristics: value.heuristics.delta, })), suggestions: suggestions.map(({ path: suggestionsPath, value }) => ({ surfaceId: value.surfaceId, @@ -2231,6 +2850,46 @@ export async function runSummarizeGenerationBenchmarkCommand( (total, entry) => total + entry.value.suggestions.filter((suggestion) => suggestion.status === "proposed").length, 0, ), + heuristics: { + lowerUnresolvedAcceptedSuggestionRate: countHeuristicImprovement( + comparisons.map(({ value }) => value.heuristics.delta.unresolvedAcceptedSuggestionRate), + ), + lowerNoChangesAfterEditFailureCount: comparisons.filter( + ({ value }) => value.heuristics.delta.noChangesAfterEditFailureCount < 0, + ).length, + lowerRecoverableToolErrorCount: comparisons.filter( + ({ value }) => value.heuristics.delta.recoverableToolErrorCount < 0, + ).length, + lowerTouchedFilesPerResolvedFinding: countHeuristicImprovement( + comparisons.map(({ value }) => value.heuristics.delta.touchedFilesPerResolvedFinding), + ), + lowerRepeatedFindingCarryoverCount: comparisons.filter( + ({ value }) => value.heuristics.delta.repeatedFindingCarryoverCount < 0, + ).length, + lowerRerunsToAcceptableOutcome: countHeuristicImprovement( + comparisons.map(({ value }) => value.heuristics.delta.rerunsToAcceptableOutcome), + ), + averageDelta: { + unresolvedAcceptedSuggestionRate: averageNullable( + comparisons.map(({ value }) => value.heuristics.delta.unresolvedAcceptedSuggestionRate), + ), + noChangesAfterEditFailureCount: averageNullable( + comparisons.map(({ value }) => value.heuristics.delta.noChangesAfterEditFailureCount), + ), + recoverableToolErrorCount: averageNullable( + comparisons.map(({ value }) => value.heuristics.delta.recoverableToolErrorCount), + ), + touchedFilesPerResolvedFinding: averageNullable( + comparisons.map(({ value }) => value.heuristics.delta.touchedFilesPerResolvedFinding), + ), + repeatedFindingCarryoverCount: averageNullable( + comparisons.map(({ value }) => value.heuristics.delta.repeatedFindingCarryoverCount), + ), + rerunsToAcceptableOutcome: averageNullable( + comparisons.map(({ value }) => value.heuristics.delta.rerunsToAcceptableOutcome), + ), + }, + }, }, }; diff --git a/packages/interfacectl-cli/src/index.ts b/packages/interfacectl-cli/src/index.ts index e1b5e1f..f974463 100644 --- a/packages/interfacectl-cli/src/index.ts +++ b/packages/interfacectl-cli/src/index.ts @@ -18,6 +18,7 @@ import { runCaptureGenerationPreviewCommand, runCompareGenerationSessionsCommand, runInitGenerationSessionCommand, + runPrepareGenerationHandoffCommand, runRecordGenerationAttemptCommand, runReviewContractDeltaSuggestionsCommand, runReviewGenerationAttemptCommand, @@ -321,7 +322,8 @@ program .requiredOption("--surface ", "Surface identifier") .requiredOption("--workspace-root ", "Workspace root for emitted run artifacts") .option("--tool ", "Generation tool identifier (codex|cursor|local-llm)") - .option("--guidance-mode ", "Session guidance mode (prepared|unguided)") + .option("--guidance-strategy ", "Session guidance strategy (prompt-summary|json-primary|unguided)") + .option("--guidance-mode ", "Legacy alias for --guidance-strategy (prepared|unguided)") .option("--brief-file ", "Optional implementation brief file to freeze into the session") .option("--session ", "Optional session identifier") .option("--artifacts-root ", "Optional session artifacts root (defaults under workspaceRoot/artifacts/generation-sessions)") @@ -331,6 +333,7 @@ program surfaceId: options.surface, workspaceRoot: options.workspaceRoot, tool: options.tool, + guidanceStrategy: options.guidanceStrategy, guidanceMode: options.guidanceMode, briefFile: options.briefFile, sessionId: options.session, @@ -338,6 +341,26 @@ program }); }); +program + .command("prepare-generation-handoff") + .description("Build one canonical strategy-aware guidance handoff artifact for a tracked generation session") + .requiredOption("--session-dir ", "Path to the generation session directory") + .option("--guidance-strategy ", "Optional guidance strategy override (prompt-summary|json-primary|unguided)") + .option("--accepted-suggestions ", "Optional accepted suggestions JSON file") + .option("--designer-notes ", "Optional designer notes JSON file") + .option("--finding-codes ", "Optional comma-separated finding codes to match against repair guidance") + .option("--out ", "Write the handoff JSON to the provided file") + .action(async (options) => { + process.exitCode = await runPrepareGenerationHandoffCommand({ + sessionDir: options.sessionDir, + guidanceStrategy: options.guidanceStrategy, + acceptedSuggestionsFile: options.acceptedSuggestions, + designerNotesFile: options.designerNotes, + findingCodes: options.findingCodes, + outPath: options.out, + }); + }); + program .command("record-generation-attempt") .description("Validate and record one generation attempt for a tracked session") @@ -394,9 +417,9 @@ program program .command("compare-generation-sessions") - .description("Compare one unguided session against one prepared guided session") - .requiredOption("--baseline-session-dir ", "Path to the unguided baseline session directory") - .requiredOption("--guided-session-dir ", "Path to the prepared guided session directory") + .description("Compare two generation sessions for the same frozen brief") + .requiredOption("--baseline-session-dir ", "Path to the baseline generation session directory") + .requiredOption("--guided-session-dir ", "Path to the candidate generation session directory") .option("--out-dir ", "Output directory for comparison artifacts") .action(async (options) => { process.exitCode = await runCompareGenerationSessionsCommand({ @@ -408,7 +431,7 @@ program program .command("suggest-contract-deltas") - .description("Generate evidence-backed contract refinement suggestions from one guided session") + .description("Generate evidence-backed contract refinement suggestions from one guided generation session") .requiredOption("--session-dir ", "Path to the guided generation session directory") .option("--out ", "Write suggestion JSON to the provided file") .action(async (options) => { diff --git a/packages/interfacectl-cli/test/generation-benchmark.test.mjs b/packages/interfacectl-cli/test/generation-benchmark.test.mjs index 7e408c2..238e3af 100644 --- a/packages/interfacectl-cli/test/generation-benchmark.test.mjs +++ b/packages/interfacectl-cli/test/generation-benchmark.test.mjs @@ -150,6 +150,7 @@ function buildAssessment({ visual, responsiveness, notes, + heuristics, }) { return { structure, @@ -158,6 +159,7 @@ function buildAssessment({ visual, responsiveness, notes, + ...(heuristics ? { heuristics } : {}), }; } @@ -206,14 +208,14 @@ async function withServer(handler, callback) { } } -test("guided vs unguided benchmark artifacts compare sessions, emit deterministic suggestions, track reviewed decisions, and carry explicit preview refs", async (t) => { +test("strategy-aware benchmark artifacts compare sessions, emit deterministic suggestions, track reviewed decisions, and carry explicit preview refs", async (t) => { await ensureChromiumAvailable(t); const tempRoot = await fsp.mkdtemp(path.join(os.tmpdir(), "interfacectl-generation-benchmark-")); const workspaceRoot = path.join(tempRoot, "workspace"); const bundleRoot = path.join(tempRoot, "bundle"); - const baselineSessionDir = path.join(workspaceRoot, "artifacts", "generation-sessions", "demo-surface", "baseline-unguided"); - const guidedSessionDir = path.join(workspaceRoot, "artifacts", "generation-sessions", "demo-surface", "guided-prepared"); + const baselineSessionDir = path.join(workspaceRoot, "artifacts", "generation-sessions", "demo-surface", "baseline-prompt-summary"); + const guidedSessionDir = path.join(workspaceRoot, "artifacts", "generation-sessions", "demo-surface", "guided-json-primary"); const briefPath = path.join(tempRoot, "task-brief.md"); try { @@ -235,9 +237,9 @@ test("guided vs unguided benchmark artifacts compare sessions, emit deterministi ); assert.equal(compileResult.exitCode, 0, compileResult.stderr); - for (const [sessionId, guidanceMode] of [ - ["baseline-unguided", "unguided"], - ["guided-prepared", "prepared"], + for (const [sessionId, guidanceStrategy] of [ + ["baseline-prompt-summary", "prompt-summary"], + ["guided-json-primary", "json-primary"], ]) { const initResult = await runCli( [ @@ -250,8 +252,8 @@ test("guided vs unguided benchmark artifacts compare sessions, emit deterministi workspaceRoot, "--session", sessionId, - "--guidance-mode", - guidanceMode, + "--guidance-strategy", + guidanceStrategy, "--brief-file", briefPath, ], @@ -270,6 +272,12 @@ test("guided vs unguided benchmark artifacts compare sessions, emit deterministi visual: "partial", responsiveness: "weak", notes: "Unguided baseline missed the contract markers on the first attempt.", + heuristics: { + unresolvedAcceptedSuggestionCount: 2, + unresolvedAcceptedSuggestionRate: 1, + noChangesAfterEditFailureCount: 1, + recoverableToolErrorCount: 2, + }, }), ); const baselineAttemptOne = await runCli( @@ -315,6 +323,13 @@ test("guided vs unguided benchmark artifacts compare sessions, emit deterministi visual: "partial", responsiveness: "partial", notes: "Baseline corrected the structure but still drifted on color.", + heuristics: { + unresolvedAcceptedSuggestionCount: 1, + unresolvedAcceptedSuggestionRate: 0.5, + noChangesAfterEditFailureCount: 1, + recoverableToolErrorCount: 1, + touchedFilesPerResolvedFinding: 2, + }, }), ); const baselineAttemptTwo = await runCli( @@ -388,6 +403,13 @@ test("guided vs unguided benchmark artifacts compare sessions, emit deterministi visual: "partial", responsiveness: "strong", notes: "Guided attempt matched the structure but still carried a color warning.", + heuristics: { + unresolvedAcceptedSuggestionCount: 0, + unresolvedAcceptedSuggestionRate: 0, + noChangesAfterEditFailureCount: 0, + recoverableToolErrorCount: 0, + touchedFilesPerResolvedFinding: 1, + }, }), ); const guidedAttemptOne = await runCli( @@ -466,6 +488,8 @@ test("guided vs unguided benchmark artifacts compare sessions, emit deterministi const compareOutput = JSON.parse(compareResult.stdout); const comparison = JSON.parse(await fsp.readFile(compareOutput.paths.jsonPath, "utf8")); validateWithSchema(comparison, generationSessionComparisonSchema, "generation session comparison"); + assert.equal(comparison.baseline.guidanceStrategy, "prompt-summary"); + assert.equal(comparison.guided.guidanceStrategy, "json-primary"); assert.equal(Boolean(comparison.baseline.firstAttempt.preview), true); assert.equal(Boolean(comparison.baseline.latestAttempt.preview), true); assert.equal(Boolean(comparison.guided.firstAttempt.preview), true); @@ -475,6 +499,8 @@ test("guided vs unguided benchmark artifacts compare sessions, emit deterministi assert.equal(comparison.delta.firstAttemptBlockingFindingCountDelta < 0, true); assert.equal(comparison.delta.attemptsToAcceptableOutcome.baseline, 2); assert.equal(comparison.delta.attemptsToAcceptableOutcome.guided, 1); + assert.equal(comparison.heuristics.delta.unresolvedAcceptedSuggestionRate < 0, true); + assert.equal(comparison.heuristics.delta.recoverableToolErrorCount < 0, true); const contractPath = path.join(workspaceRoot, "contracts", "generated", "demo-surface.contract.json"); const contractBeforeReview = await fsp.readFile(contractPath); @@ -563,6 +589,9 @@ test("guided vs unguided benchmark artifacts compare sessions, emit deterministi assert.equal(benchmarkReport.overall.surfaceCount, 1); assert.equal(benchmarkReport.overall.surfacesMeetingGoal, 1); assert.equal(benchmarkReport.overall.acceptedSuggestionCount, 1); + assert.equal(benchmarkReport.comparisons[0].baselineGuidanceStrategy, "prompt-summary"); + assert.equal(benchmarkReport.comparisons[0].guidedGuidanceStrategy, "json-primary"); + assert.equal(benchmarkReport.overall.heuristics.lowerRecoverableToolErrorCount, 1); } finally { await fsp.rm(tempRoot, { recursive: true, force: true }); } diff --git a/packages/interfacectl-cli/test/generation-session.test.mjs b/packages/interfacectl-cli/test/generation-session.test.mjs index 23f1121..f6ac884 100644 --- a/packages/interfacectl-cli/test/generation-session.test.mjs +++ b/packages/interfacectl-cli/test/generation-session.test.mjs @@ -14,6 +14,7 @@ import generationAttemptReviewSchema from "../schemas/generation-attempt-review. import generationAttemptPreviewSchema from "../schemas/generation-attempt-preview.schema.json" with { type: "json" }; import generationSessionSchema from "../schemas/generation-session.schema.json" with { type: "json" }; import generationSessionSummarySchema from "../schemas/generation-session-summary.schema.json" with { type: "json" }; +import generationGuidanceHandoffSchema from "../schemas/generation-guidance-handoff.schema.json" with { type: "json" }; import contractRunsSchema from "../schemas/contract-runs.schema.json" with { type: "json" }; import contractLineageSchema from "../schemas/contract-lineage.schema.json" with { type: "json" }; @@ -155,6 +156,7 @@ function buildAssessment({ responsiveness, notes, touchedFiles, + heuristics, }) { return { structure, @@ -164,6 +166,7 @@ function buildAssessment({ responsiveness, notes, ...(touchedFiles ? { touchedFiles } : {}), + ...(heuristics ? { heuristics } : {}), }; } @@ -249,9 +252,15 @@ test("generation session commands freeze bundle input, record attempts, and emit const session = JSON.parse(await fsp.readFile(path.join(sessionDir, "session.json"), "utf8")); validateWithSchema(session, generationSessionSchema, "generation session"); - assert.equal(session.guidanceMode, "prepared"); + assert.equal(session.guidanceStrategy, "prompt-summary"); assert.ok(fs.existsSync(path.join(sessionDir, "bundle", "manifest.json"))); assert.ok(fs.existsSync(path.join(sessionDir, "prepared-input.json"))); + assert.ok(fs.existsSync(path.join(sessionDir, "guidance-handoff.json"))); + const handoff = JSON.parse(await fsp.readFile(path.join(sessionDir, "guidance-handoff.json"), "utf8")); + validateWithSchema(handoff, generationGuidanceHandoffSchema, "generation guidance handoff"); + assert.equal(handoff.guidanceStrategy, "prompt-summary"); + assert.equal(Boolean(handoff.promptSummary), true); + assert.equal(handoff.jsonPrimary, null); const assessmentOnePath = path.join(tempRoot, "assessment-1.json"); await writeJson( @@ -360,6 +369,127 @@ test("generation session commands freeze bundle input, record attempts, and emit } }); +test("prepare-generation-handoff emits deterministic strategy artifacts with runtime guidance", async () => { + const tempRoot = await fsp.mkdtemp(path.join(os.tmpdir(), "interfacectl-generation-handoff-")); + const workspaceRoot = path.join(tempRoot, "workspace"); + const bundleRoot = path.join(tempRoot, "bundle"); + const sessionDir = path.join(workspaceRoot, "artifacts", "generation-sessions", "demo-surface", "handoff-session"); + const acceptedSuggestionsPath = path.join(tempRoot, "accepted-suggestions.json"); + const designerNotesPath = path.join(tempRoot, "designer-notes.json"); + const promptSummaryPath = path.join(tempRoot, "prompt-summary-handoff.json"); + const jsonPrimaryOnePath = path.join(tempRoot, "json-primary-handoff-1.json"); + const jsonPrimaryTwoPath = path.join(tempRoot, "json-primary-handoff-2.json"); + + try { + await writeDemoWorkspace(workspaceRoot, { sectionValid: false, colorValid: true }); + + const compileResult = await runCli( + [ + "compile", + "--contract", + path.join(workspaceRoot, "contracts", "surfaces.web.contract.json"), + "--out", + bundleRoot, + ], + tempRoot, + ); + assert.equal(compileResult.exitCode, 0, compileResult.stderr); + + const initResult = await runCli( + [ + "init-generation-session", + "--bundle-root", + bundleRoot, + "--surface", + "demo-surface", + "--workspace-root", + workspaceRoot, + "--session", + "handoff-session", + ], + tempRoot, + ); + assert.equal(initResult.exitCode, 0, initResult.stderr); + + await writeJson(acceptedSuggestionsPath, { + suggestions: [ + { + findingCode: "section.required.missing", + findingMessage: "Main hero section is missing.", + summary: "Restore the main hero section.", + suggestedPath: "surfaces[id=demo-surface].requiredSections", + rationale: "The fixture requires the main hero section.", + }, + ], + }); + await writeJson(designerNotesPath, { + designerNotes: [ + "Keep the hero heading flush left.", + "Anchor links should use the contract accent.", + ], + }); + + const promptSummaryResult = await runCli( + [ + "prepare-generation-handoff", + "--session-dir", + sessionDir, + "--guidance-strategy", + "prompt-summary", + "--accepted-suggestions", + acceptedSuggestionsPath, + "--designer-notes", + designerNotesPath, + "--finding-codes", + "section.required.missing", + "--out", + promptSummaryPath, + ], + tempRoot, + ); + assert.equal(promptSummaryResult.exitCode, 0, promptSummaryResult.stderr); + + const jsonPrimaryArgs = [ + "prepare-generation-handoff", + "--session-dir", + sessionDir, + "--guidance-strategy", + "json-primary", + "--accepted-suggestions", + acceptedSuggestionsPath, + "--designer-notes", + designerNotesPath, + "--finding-codes", + "section.required.missing", + ]; + const jsonPrimaryOneResult = await runCli([...jsonPrimaryArgs, "--out", jsonPrimaryOnePath], tempRoot); + const jsonPrimaryTwoResult = await runCli([...jsonPrimaryArgs, "--out", jsonPrimaryTwoPath], tempRoot); + assert.equal(jsonPrimaryOneResult.exitCode, 0, jsonPrimaryOneResult.stderr); + assert.equal(jsonPrimaryTwoResult.exitCode, 0, jsonPrimaryTwoResult.stderr); + + const promptSummary = JSON.parse(await fsp.readFile(promptSummaryPath, "utf8")); + const jsonPrimaryOne = JSON.parse(await fsp.readFile(jsonPrimaryOnePath, "utf8")); + const jsonPrimaryTwo = JSON.parse(await fsp.readFile(jsonPrimaryTwoPath, "utf8")); + validateWithSchema(promptSummary, generationGuidanceHandoffSchema, "prompt-summary guidance handoff"); + validateWithSchema(jsonPrimaryOne, generationGuidanceHandoffSchema, "json-primary guidance handoff"); + validateWithSchema(jsonPrimaryTwo, generationGuidanceHandoffSchema, "repeated json-primary guidance handoff"); + assert.deepEqual(jsonPrimaryOne, jsonPrimaryTwo); + assert.equal(promptSummary.guidanceStrategy, "prompt-summary"); + assert.equal(jsonPrimaryOne.guidanceStrategy, "json-primary"); + assert.equal(Boolean(promptSummary.promptSummary), true); + assert.equal(promptSummary.jsonPrimary, null); + assert.equal(promptSummary.runtimeGuidance.acceptedSuggestions.length, 1); + assert.equal(promptSummary.runtimeGuidance.designerNotes.length, 2); + assert.equal(promptSummary.runtimeGuidance.findingCodes.includes("section.required.missing"), true); + assert.equal(jsonPrimaryOne.promptSummary, null); + assert.equal(Boolean(jsonPrimaryOne.jsonPrimary), true); + assert.equal(jsonPrimaryOne.jsonPrimary.sections.length > 0, true); + assert.equal(Array.isArray(jsonPrimaryOne.runtimeGuidance.matchedRepairCodes), true); + } finally { + await fsp.rm(tempRoot, { recursive: true, force: true }); + } +}); + test("review-generation-attempt marks reviewed warnings acceptable without changing the validate payload", async () => { const tempRoot = await fsp.mkdtemp(path.join(os.tmpdir(), "interfacectl-generation-review-")); const workspaceRoot = path.join(tempRoot, "workspace"); @@ -591,7 +721,7 @@ test("capture-generation-preview writes preview artifacts and surfaces explicit await fsp.readFile(path.join(sessionDir, "summary.json"), "utf8"), ); validateWithSchema(summary, generationSessionSummarySchema, "generation session summary"); - assert.equal(summary.schemaVersion, 3); + assert.equal(summary.schemaVersion, 4); assert.equal(summary.attempts[0].preview.imagePath, path.join(sessionDir, "attempts", "001.preview.png")); assert.equal(summary.attempts[0].preview.metadataPath, path.join(sessionDir, "attempts", "001.preview.json")); assert.equal(summary.attempts[0].preview.url.endsWith("/preview"), true); @@ -891,11 +1021,11 @@ test("generation session summary aggregates recurring finding and repair codes", try { await writeJson(path.join(sessionDir, "session.json"), { - schemaVersion: 2, + schemaVersion: 3, surfaceId: "demo-surface", sessionId: "summary-session", tool: "codex", - guidanceMode: "prepared", + guidanceStrategy: "prompt-summary", workspaceRoot: tempRoot, sourceBundleRoot: path.join(tempRoot, "source-bundle"), sessionDir, @@ -903,6 +1033,9 @@ test("generation session summary aggregates recurring finding and repair codes", preparedInputPath: path.join(sessionDir, "prepared-input.json"), contractPath: path.join(sessionDir, "bundle", "contract", "normalized.json"), repairMapPath: path.join(bundleSurfaceDir, "repair-map.json"), + guidanceArtifacts: { + baseHandoffPath: path.join(sessionDir, "guidance-handoff.json"), + }, startedAt: "2026-03-12T00:00:00.000Z", successRule: { finalStatus: "pass-or-reviewed-warn" }, });