Skip to content

feat: add strategy-aware benchmark artifacts#36

Merged
mikeylong merged 1 commit intomainfrom
codex/interfacectl-generation-benchmark-loop
Mar 14, 2026
Merged

feat: add strategy-aware benchmark artifacts#36
mikeylong merged 1 commit intomainfrom
codex/interfacectl-generation-benchmark-loop

Conversation

@mikeylong
Copy link
Copy Markdown
Collaborator

Summary

  • add canonical guidance strategies for generation sessions and handoff artifacts
  • add strategy-aware comparison and benchmark reporting semantics
  • update generator-facing docs for prompt-summary vs json-primary vs unguided flows

Testing

  • pnpm --filter @surfaces/interfacectl-cli build
  • pnpm --filter @surfaces/interfacectl-cli test -- test/generation-session.test.mjs test/generation-benchmark.test.mjs

@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
interfacectl Error Error Mar 14, 2026 6:49pm

Request Review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 339e2e9655

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2280 to +2283
const updatedSession: GenerationSession = {
...sessionForHandoff,
};
writeDeterministicJsonSync(paths.sessionPath, updatedSession);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep session strategy immutable when preparing handoff

This command persists sessionForHandoff back to session.json, which includes the override values for guidanceStrategy and baseHandoffPath; if prepare-generation-handoff is run after attempts exist, it can retroactively relabel the session strategy and handoff path for all later summaries/comparisons even though earlier attempts were produced under a different strategy. Because buildGenerationSessionSummary reports the session-level strategy, this mutates benchmark evidence instead of just emitting an alternate handoff artifact.

Useful? React with 👍 / 👎.

.requiredOption("--workspace-root <path>", "Workspace root for emitted run artifacts")
.option("--tool <tool>", "Generation tool identifier (codex|cursor)")
.option("--guidance-mode <mode>", "Session guidance mode (prepared|unguided)")
.option("--tool <tool>", "Generation tool identifier (codex|cursor|local-llm)")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align advertised init tool values with runtime validation

init-generation-session help now claims --tool accepts local-llm, but ensureSessionTool still only allows codex|cursor, so using the documented local-llm value fails with an input error. This creates a broken CLI contract for users who rely on --help output.

Useful? React with 👍 / 👎.

@mikeylong
Copy link
Copy Markdown
Collaborator Author

Holding this PR as superseded-in-place for now. The current branch is stale and the repo is still receiving a standalone failing Vercel status that is not part of the allowed landing path. The intended next step is to remove that stale repo-level Vercel integration, cut a fresh replacement benchmark PR from current main, and then close this PR with a link to the replacement.

@mikeylong mikeylong force-pushed the codex/interfacectl-generation-benchmark-loop branch from 339e2e9 to fc31496 Compare March 14, 2026 23:08
@mikeylong
Copy link
Copy Markdown
Collaborator Author

Refreshed this PR against current main on March 14, 2026. This PR remains the merge vehicle for the benchmark/session work; it is no longer being treated as superseded.

Local verification on the refreshed branch:

  • pnpm --filter @surfaces/interfacectl-cli build
  • pnpm --filter @surfaces/interfacectl-cli test -- test/generation-session.test.mjs test/generation-benchmark.test.mjs

Current blocker is still external to the branch content: GitHub is attaching a failing plain Vercel status to this repo from the org-wide Vercel app installation, and the current token here is org-admin but not org-owner, so I could not remove or narrow that installation from the CLI/API. Once that owner-level settings fix is applied, this PR is ready to continue as the benchmark merge vehicle.

@mikeylong mikeylong merged commit 7492ac5 into main Mar 14, 2026
2 checks passed
@mikeylong mikeylong deleted the codex/interfacectl-generation-benchmark-loop branch March 19, 2026 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant