This file is for contributors working on the plugin itself.
If you are trying to use Flow inside OpenCode, start with the top-level README.md instead.
Current maintainer contract lives in docs/maintainer-contract.md. Use docs/contributor-map.md to pick the right source files and checks before touching high-risk areas. Ownership boundaries and cross-domain review/triage policy live in docs/architecture/ownership-operating-model.md.
This repo's maintainer workflow is intentionally Bun-first. In target projects, Flow is script-first: existing package.json scripts are the primary contract, and package-manager detection is supporting evidence.
For monorepos, package-manager detection starts from the current tool directory and walks upward to the mutable Flow workspace root, so subpackage-local evidence can override root-level defaults.
package.json#packageManager is authoritative when present and takes precedence over conflicting lockfiles in the same directory.
If one directory has conflicting lockfile families and no explicit package.json#packageManager, runtime records package-manager evidence as ambiguous instead of guessing. In that case prompts should continue on existing package.json scripts instead of manager-specific guesses.
Install dependencies and run the full local check:
bun install
bun run checkUseful scripts:
bun run buildbun run deadcodebun run testbun run test:fastbun run test:deepbun run typecheckbun run checkbun run report:prompt-evalbun run eval:review-capturebun run eval:review-capture:checkbun run eval:prompt-capturebun run eval:prompt-capture:checkbun run install:opencodeto install the global OpenCode plugin and generated global Flow skillsbun run uninstall:opencodeto clear the canonical globalflow.jsplugin slot, including stale or outdated Flow plugin files, and remove only intact generated global Flow-owned skills
bun run check is the canonical local/mainline readiness contract. Run focused scripts first when they help isolate a touched area, but do not treat focused CI preflights as replacements for the full local contract unless docs/maintainer-contract.md records a no-weakening proof.
Gate status terms:
- Hard gates fail the command and block merge/release readiness. Examples:
bun run check:dependency-contract,bun run check:architecture-seams:enforce,bun run check:generated-drift,bun run gate:completion-lane,bun run test:replay, andbun run bench:gate. - Advisory gates are supplemental visibility and do not block by exit code.
bun run check:boundary-reportis advisory by design today: it exits0and reports prompt/tool boundary findings unless a future reviewed change promotes it with script, docs, and test updates. - Diagnostic/report commands support investigation or planning.
bun run check:architecture-seamsis seam report mode;bun run report:runtime-simplification-metricsprints simplification metrics. Use the corresponding hard gate (check:architecture-seams:enforce) for pass/fail readiness.
The full matrix, artifact owners, source-of-truth scripts, repeated-inside-check status, and CI/local no-weakening rules live in docs/maintainer-contract.md#gate-contract-matrix.
Standing dependency/tool checklist: keep zod aligned with @opencode-ai/plugin, preserve raw tool(...) arg shapes at the SDK boundary, and run bun run check:dependency-contract for dependency, schema, or tool-surface changes.
Use the smallest tier that can prove the current change, then broaden before release or cross-surface merges.
| Tier | When to run | Commands |
|---|---|---|
| Focused touched-slice checks | While editing one risk area | Use docs/maintainer-contract.md#if-you-touch-x-run-y |
| Docs/projection checks | Current docs or OpenCode projection metadata changes | bun test tests/docs-tool-parity.test.ts tests/docs-semantic-parity.test.ts tests/docs-stale-reference-policy.test.ts; add bun test tests/config/tool-schemas.test.ts tests/descriptor-family-parity.test.ts when projection behavior is affected |
| Runtime invariant fast lane | Runtime/domain/transition confidence | bun run test:fast; bun run test:replay for snapshot/runtime invariant coverage |
| Generated drift lane | Prompt, review, descriptor, or generated surface changes | bun run check:generated-drift |
| Deep/randomized confidence | Broad regression confidence beyond focused tests | bun run test:deep or bun run test:randomized |
| Mainline readiness | Before release or cross-surface merge | bun run check |
Hard gates block readiness. Advisory and diagnostic commands guide investigation only unless the maintainer contract promotes them with script, docs, and test updates. Focused checks are isolation tools, not replacements for bun run check before release or cross-surface changes.
src/index.ts— plugin entrypointsrc/installer.ts— local OpenCode plugin installersrc/config.ts— command and agent injectionsrc/adapters/opencode/tools.ts— OpenCode runtime tool surface- Native OpenCode owns image/file attachments; Flow does not capture or materialize chat/command attachments by default and owns only workflow JSON/state under
.flow/**plus derived docs src/runtime/schema.ts— session and contract schemassrc/runtime/transitions/— domain state transition rules split by lifecycle phasesrc/runtime/domain/completion.ts— shared completion-policy calculationssrc/runtime/application/session-engine.ts— root-scoped session mutation orchestrationsrc/runtime/application/workspace-runtime.ts— tool-argument parsing and workspace-root adapterssrc/runtime/session.ts— persistence and lifecycle exportssrc/runtime/render.ts— derived markdown renderingsrc/prompts/agents.ts— fallback agent prompt surfacessrc/prompts/commands.ts— fallback slash-command templatessrc/prompts/mode-contracts.ts— canonical prompt-mode boundaries used by prompts, tests, and capture toolingsrc/prompts/skills.tsandsrc/prompts/generated/skill-docs.ts— generated Flow skill specs and rendererssrc/adapters/opencode/skill-bundle.ts— installer/uninstaller support for generated Flow-owned~/.config/opencode/skills/**
Flow is built around a few stable responsibilities and authority boundaries:
user / slash command / agent
-> OpenCode adapter tool surface
-> runtime application action
-> domain and transition policy
-> `.flow/**/session.json` snapshot persistence
-> derived markdown rendering
- A plugin
confighook injects commands and agents. - Runtime tools are adapter entrypoints and delegate to application/domain runtime helpers; they do not own workflow policy.
- Session state is stored under
.flow/active/<session-id>/session.json, with inactive resumable sessions under.flow/stored/<session-id>/and closed history under.flow/completed/<session-id>-<timestamp>/. - Domain transitions and runtime policy helpers remain authoritative for workflow state changes.
- Prompted agents call runtime tools instead of mutating state directly.
- Coordinators use OpenCode task/subagent handoffs for bounded planning, implementation, and review work when the host supports them, so each role can work in a fresh child context while runtime tools remain the state authority.
- Generated OpenCode skills under
~/.config/opencode/skills/flow-{plan,run,review}/SKILL.mdprovide on-demand guidance. Slash commands and agents remain fallback surfaces and must keep working when skills are absent, denied, or hidden by OpenCode permissions. - Native OpenCode owns image/file attachments. Flow leaves host/model attachment context untouched and does not create Flow-owned workspace files from chat or command attachments by default.
- Readable markdown docs are rendered beside each saved session directory under
.flow/active/<session-id>/docs/,.flow/stored/<session-id>/docs/, or.flow/completed/<session-id>-<timestamp>/docs/.
Live runtime persistence is snapshot-primary: runtime application ports load and save session snapshots, then sync derived artifacts. Rendered markdown docs are derived artifacts, not workflow truth. Core action and role-protocol metadata are projection/regression infrastructure; they are not live persistence. The core workflow event/replay stack is active semantic and regression infrastructure, but it is not the live persistence authority unless a future migration explicitly promotes it.
Projection metadata is consolidated through the OpenCode surface descriptor family in src/adapters/opencode/tool-surface/descriptors.ts. That family can describe core-backed mutation tools, workspace/control tools, read tools, and render-only tools without pretending every surface has both a runtime action and a core workflow action. Adapter implementation modules still own the dispatch constants they invoke, and src/adapters/opencode/tool-surface/schemas.ts owns a payload schema registry that co-locates each tool's raw arg shape, parser schema, and owner metadata; descriptor parity tests compare both sources against the descriptor projection contract. Runtime transitions still enforce behavior; descriptors, prompts, docs, and audit surfaces project or verify that behavior.
flow-plannerflow-workerflow-autoflow-reviewerflow-control
flow-plannerreads the repo and creates a compact execution-ready planflow-workerexecutes exactly one approved feature and, where OpenCode Task/subagent handoff is supported, asksflow-reviewerthrough Task for an independent fresh-context approval pass before persistenceflow-reviewerreviews either the execution gate (feature) or the completion gate (final); the final gate follows the runtime-owned final review policy (detailedcross-feature by default,broadwhen explicitly configured)flow-autocoordinates planning, execution, review, recovery, and continuation; where OpenCode Task/subagent handoff is supported, it routes planning toflow-planner, implementation toflow-worker, and approval toflow-reviewerin fresh child contextsflow-controlhandles status/history/session/reset requests and the review command surface
Task/subagent handoffs are prompt-level orchestration only. Flow runtime tools remain authoritative for state transitions and persisted session data, and prompts must never edit .flow files directly.
Read-only repo review stays separate from feature execution and is exposed through /flow-review on flow-control. User-facing depth tokens map to internal rigor:
default=>broad_auditdetailed=>deep_auditexhaustive=>full_audit
/flow-review now returns a renderer-backed human report by default; the structured review ledger remains an internal contract behind flow_review_render.
Flow may only claim achieved full_audit when every major discovered repo surface is directly reviewed with no major unreviewed gaps.
Prompt behavior is part of the product contract. Keep prompt-mode boundaries in src/prompts/mode-contracts.ts and use that file as the canonical source for prompt visibility and mode behavior, not as the owner of runtime transition law:
- which prompt surfaces exist
- which source files define each mode
- which runtime and repository mutations are allowed
- which Flow tools are expected or forbidden
- what each mode must do before stopping
Generated skills are now part of the default OpenCode install lifecycle. Keep flow-plan, flow-run, and flow-review generated from Flow-owned specs; they may reference mode contracts, role protocols, and registered runtime tools, but must not define new tools, state transitions, completion gates, persistence paths, review semantics, or .flow/** write behavior. Command templates and role prompts should stay slim fallback surfaces with a fallback contract: mode title/boundary, allowed/forbidden Flow tools, stop condition, never edit .flow/**, one-sentence tool ordering, and recovery guidance when a skill is unavailable or denied.
Providerless evals protect this contract without calling a model API:
bun run eval:review-capture:checkvalidates/flow-reviewcapture scenarios.bun run eval:prompt-capture:checkvalidates prompt-mode capture scenarios for planner, worker, auto, reviewer, run, and control behavior.bun run report:prompt-evalwrites the combined prompt-eval summary artifacts.
To refresh manual prompt captures:
- Run
bun run eval:prompt-captureto export capture prompts. - Fill a capture JSON with the observed model/plugin output.
- Run
bun run eval:prompt-capture -- --score <capture-file.json>. - Promote calibrated outputs with
bun run eval:prompt-capture -- --promote <capture-file.json>.
The scorer accepts structured tool intent (toolCalls, actualToolCalls, plannedToolCalls, toolPlan, or willCallTools) when available and falls back to affirmative prose matching otherwise. Keep structured tool-call evidence when possible; it is less brittle than text-only assertions.
Do not add model-provider credentials to this path. These checks are intentionally offline so prompt quality stays testable in CI and local development.
Default OpenCode tool surface, in descriptor docs-row order:
flow_status— Show the active Flow session summaryflow_doctor— Run non-destructive readiness checks for Flow in the current workspaceflow_history— Show active, stored, and completed Flow session historyflow_history_show— Show a specific active, stored, or completed Flow session by idflow_session_activate— Activate a stored Flow session by idflow_plan_start— Create or refresh the active Flow planning sessionflow_auto_prepare— Classify a flow-auto invocation and choose the next stepflow_session_close— Close the active Flow session as completed, deferred, or abandonedflow_plan_context_record— Persist repo profile, research, implementation approach, and optional planning decisions into the active Flow session from a JSON payloadflow_plan_apply— Persist a Flow draft plan into the active session from a JSON payloadflow_plan_approve— Approve the active Flow draft planflow_plan_select_features— Keep only selected features in the active Flow draft planflow_run_start— Start the next runnable Flow featureflow_run_complete_feature— Persist an already-validated Flow feature execution result from a JSON payloadflow_reset_feature— Reset a Flow feature to pendingflow_review_record_feature— Record an already-validated reviewer decision for the active feature from a JSON payloadflow_review_record_final— Record an already-validated reviewer decision for final cross-feature validation from a JSON payloadflow_review_render— Render a structured Flow review ledger into a human-readable report, structured JSON, or both
Native OpenCode owns image/file attachment handling. Flow tools and prompts must not add a Flow-owned attachment materialization surface unless a new explicit product requirement, schema contract, docs update, and regression tests establish that ownership boundary.
Keep operator-facing messaging simple. Runtime remains the single owner of workflow semantics and internal complexity.
- Runtime owns workflow semantics; prompts and docs describe them.
- Keep live runtime persistence snapshot-primary unless a dedicated event-first migration plan proves and stages a different authority.
- Runtime transitions and snapshot persistence are the supported workflow authority. Do not reintroduce core workflow replay, event-store, checkpoint-store, or projection-store surfaces without an explicit product requirement and replacement tests.
- Generated skills are instruction surfaces only; keep slash commands and agents usable as fallback surfaces when skills are absent or denied.
- Package API is root-only (
opencode-plugin-flowimport). Internal paths are not public API and may change in any release. - Keep
zodaligned with@opencode-ai/pluginunless a reviewed SDK-boundary change is intentional. - Preserve direct
tool(...)arg shapes at the SDK boundary. - Use permission-only OpenCode agent restrictions; do not reintroduce boolean
toolsconfig for read-only Flow agents. - Prefer deletion over new helper layers.
- Keep release-bound source free of debug-only artifacts. Do not leave ad-hoc
console.*calls ordebuggerstatements insrcor the built release artifact. Inspect existing logging, telemetry, CLI-output, and test patterns before changingconsole.*; remove temporary debug noise, but preserve intentional operator or observability signals with an equivalent replacement that keeps severity, message intent, and key context. - Pair behavior changes with targeted tests and run the existing validation scripts before release.
Flow treats engineering quality as part of the workflow contract, not just reviewer preference:
- Planning records a runtime-owned stack and standards profile. Local repo guidance and configs outrank official docs, and official docs outrank broader Exa/websearch guidance.
- Flow caches the generated stack and standards profile in
.flow/standards-profile.json; the cache is ignored when the workspace, start directory, schema version, package-manager hint, or relevant source-file fingerprint changes, and external guidance expires after 30 days. - Prefer deletion and reuse over new abstraction layers.
- Keep diffs small, reviewable, and reversible.
- Use existing package scripts and repo utilities before adding new commands.
- Validate at the smallest useful scope first, then use broader gates before release.
- Keep production/release-bound code free of debug-only artifacts (
console.*anddebugger). - Preserve intentional observability: deleting a meaningful log, diagnostic event, or operator-facing message is only acceptable when an equivalent logger, telemetry, or stdout/stderr replacement remains and preserves severity, message intent, and key context.
When changing console.* in release-bound code, use this decision tree:
- Temporary debug trace or local scratch output: remove it.
- CLI/operator output: route it through an injected logger or explicit
process.stdout.write/process.stderr.writeadapter. - Application diagnostic signal: use the repo's existing structured logger with a level and contextual fields.
- Cross-service or performance diagnostic signal: use the repo's existing telemetry API for spans, events, metrics, or logs.
- No existing observability facility: add the smallest local injected adapter needed for the current surface, or report a blocker when an equivalent replacement would require a broader observability decision; do not add a dependency unless the change explicitly approves one.
The release hygiene gate is enforced through these mechanisms:
biome.jsonenables Biome'slint/suspicious/noConsolerule for production source. Biome documents this rule as non-recommended by default, configurable as an error, and intended to keep console debugging out of shipped code.- The release build uses Bun's
--drop=consolesetting so bundled dependency code cannot reintroduce console calls intodist/index.js. bun run check:release-hygienescanssrcanddist/index.jsafter build so release artifacts cannot silently reintroduceconsole.*ordebugger.
Development-only scripts and tests may still print to stdout/stderr when they are intentionally operator-facing. Release-bound CLI code should make that intent explicit with injectable logger functions or direct process.stdout.write / process.stderr.write adapters. The goal is to avoid shipping raw debug consoles, not to reduce production observability.
Retryable runtime failures can include structured recovery metadata alongside the error summary.
That metadata can include:
errorCoderesolutionHintrecoveryStageprerequisite- optional
requiredArtifact nextCommand- optional
nextRuntimeTool - optional
nextRuntimeArgs
The runtime uses this to distinguish between:
- missing prerequisites
- immediately executable recovery actions
Examples:
- missing reviewer approval reports a
reviewer_result_requiredprerequisite - missing validation scope or evidence reports
validation_rerun_required - missing final review payload reports
completion_payload_rebuild_required - failing review or validation can point directly to
flow_reset_feature
Flow now persists a few higher-level concepts directly in runtime state:
- planning decisions can be classified as
autonomous_choice,recommend_confirm, orhuman_required - runtime summaries expose the latest blocking planning decision as
decisionGate - runtime status/doctor structured payloads and detailed views include
laneReasonso lane selection remains auditable without overloading compact operator summaries - planning decisions also carry a domain such as
architecture,product,quality,scope, ordelivery - plans can declare a
deliveryPolicyso completion can be driven by a clean finish, a core-work finish, or a threshold replan_requiredoutcomes must carry a structured reason, failed assumption, and recommended adjustment- closed sessions carry an explicit closure kind:
completed,deferred, orabandoned
True runtime-level parallel feature execution is intentionally deferred. Current behavior remains single-feature-at-a-time execution with improved lane and recovery visibility.
Keep Flow prompts narrow and stable. Prefer platform-native efficiency controls before adding plugin-specific machinery:
- keep orchestration prompts focused on routing and recovery, not duplicated workflow narration
- enable OpenCode compaction and provider cache keys when sessions get long
- treat
experimental.session.compactingas optional escalation only if there is real evidence of Flow state loss - avoid introducing Flow-owned compaction or measurement plumbing unless a concrete failure mode justifies it
OpenCode plugin tools expect args to be provided as a raw Zod shape, not a top-level schema object.
Example:
const FlowRunStartArgsShape = {
featureId: z.string().optional(),
};This plugin uses two validation layers:
- SDK-facing tool
argsstay as raw shapes for OpenCode's plugin contract - stricter runtime validation happens later through schemas such as
WorkerResultSchema
For the heaviest payload tools (flow_plan_context_record, flow_plan_apply, flow_run_complete_feature, flow_review_record_feature, and flow_review_record_final), expose the current raw object-shape contract directly at the SDK boundary and let runtime schemas enforce the stricter semantic refinements. Do not reintroduce JSON-string transport fields or alternate direct-caller fallbacks.
The test suite covers:
- command and agent injection
- tool argument shapes
- session creation, save, and load
- markdown doc rendering
- plan application, selection, and approval
- feature execution and reviewer gating
- blocked and replan-required outcomes
- final-review completion rules
- reset behavior
- prerequisite-aware recovery metadata and autonomous recovery behavior
Run tests with:
bun testFast lane for runtime invariant safety checks:
bun run test:fastDeep lane for broad coverage (default CI depth):
bun run test:deepReplay/event/checkpoint/projection persistence was removed during the 2026-05-07 simplification. The supported persistence contract is active/stored/completed session snapshots plus rendered session docs.