Skip to content

Move verbose checklist from hook reason into SKILL.md system context#3

Open
micahstubbs wants to merge 41 commits intoblader:mainfrom
micahstubbs:main
Open

Move verbose checklist from hook reason into SKILL.md system context#3
micahstubbs wants to merge 41 commits intoblader:mainfrom
micahstubbs:main

Conversation

@micahstubbs
Copy link
Copy Markdown

Summary

  • Moves the full completion checklist (6 items + behavioral instructions) from the hook's reason field into SKILL.md
  • Reduces the hook's block reason to a single line: TASKMASTER (N): Stop blocked. Follow the taskmaster completion checklist. Done signal: TASKMASTER_DONE::<session_id>
  • Bumps version to 2.2.0

Problem

The hook's reason field is displayed as a visible message in the Claude Code session history every time the stop is blocked. The previous ~30 line checklist (re-read user messages, check task list, check plan, check for errors, check for loose ends, check for blockers, honesty check, DO NOT NARRATE, etc.) was polluting the conversation transcript and making it hard to read actual human prompts and model outputs.

Solution

The key insight is that SKILL.md content is loaded as system context — it's available to the model but invisible as messages in the session history. So the architecture is now:

Component Contains Visible in session history?
SKILL.md Full completion checklist + behavioral instructions No (system context)
Hook reason Brief trigger with counter, status, and done signal Yes (one clean line)

The model still has access to all the same instructions via the skill's system context, so behavior is unchanged. The session transcript is now readable.

Changes

  • SKILL.md: Added "Completion Checklist" section with the full behavioral instructions previously in the hook reason
  • hooks/check-completion.sh: Reduced reason to one-liner referencing the skill checklist
  • check-completion.sh: Same change applied to root-level variant
  • docs/SPEC.md: Updated to document the new prompt architecture

- Apply tilde expansion fix from upstream blader/taskmaster 6598e99
  to both check-completion.sh and hooks/check-completion.sh.
  Bash does not expand ~ inside double-quoted strings, so transcript_path
  values like ~/.claude/... would fail [ -f ] checks.

- Add docs/upstream-reviews/2026-02-25-blader-taskmaster-main.md
  with per-commit cherry-pick/rewrite/ignore decisions for all 13
  upstream commits
Survey of all 32 forks. Only mickn/taskmaster has substantial original
engineering (native Codex hooks + semantic completion verifier, v5.0.0).
Documents recommended adoption tiers and open questions before porting.
Detailed design covering all three tiers from
docs/upstream-reviews/blader-taskmaster-forks.md:

- T1 (adopt now): TASKMASTER_VERIFY_COMMAND shell-verifier gate, JSON
  state-file layout with legacy-counter migration, tagged
  hook-internal-prompt detection
- T2 (verify first): Codex native hooks capability probe, then native
  SessionStart/UserPromptSubmit/Stop hooks with wrapper as fallback
- T3 (opt-in): semantic completion verifier with Anthropic-first
  provider auto-detection, secret redaction, caching, fail-open

Each tier has env vars, file lists, JSON schemas, migration paths,
test plans, and risk tables. Phased rollout: T1 unblocks T2.0 unblocks
T2.1+T2.2 unblocks T3.
Bite-sized TDD plan covering all three Tier-1 features from
docs/designs/2026-04-28-072245-fork-pattern-adoption.md:

- Phase A (T1.1): TASKMASTER_VERIFY_COMMAND helper lib + integration
- Phase B (T1.3): tagged hook-injected prompts + detection lib
- Phase C (T1.2): JSON state file with flock + legacy migration
- Phase D: version bump to 4.3.0, CHANGELOG, tag

Each phase: failing-test commit, then implementation commit. Commits
are bisect-friendly. Plan includes complete code (not pseudocode),
exact file paths, and smoke tests at every integration point.
When TASKMASTER_VERIFY_COMMAND is set, the stop hook runs the command
after the done token is detected. Exit 0 allows stop; non-zero blocks
with a truncated output dump. Verifier only fires when the token is
present, so mid-work stop attempts don't pay the cost of a slow verifier.
…T1.1)

Code review on Phase A flagged that mktemp output could leak if the
hook is killed mid-run (between mktemp and rm). RETURN trap inside
the function ensures cleanup on any function exit path, including
signal-induced ones.
…nd=...] (T1.3)

Adds an explicit single-line marker to the top of every prompt the hook
injects. Downstream consumers detect injected prompts via the new
taskmaster-prompt-detect lib (forward path: tag match; back-compat:
legacy substring match against current and mickn/taskmaster wording).
…ed kinds (T1.3)

Spec review on Phase B flagged two gaps: tests asserted 5 of mickn's 6
legacy substrings (missing Completion-check-before-stopping); SPEC §3.5
listed all 5 tag kinds without flagging which are reserved for T2/T3.
Both addressed.
Code review on Phase B flagged that the readonly TASKMASTER_INJECTED_TAG_VERSION
declaration would error on second source under set -e. No call site does this
today, but it's an unforced asymmetry with the Phase A verify-command lib.
Sentinel guard makes the lib safe to source multiple times.
…T1.2)

stop_count now lives in a flock-protected JSON file at
$TASKMASTER_STATE_DIR/<session_id>.json (default: $TMPDIR/taskmaster/state).
Legacy counter files are absorbed on first read and deleted. Schema is
versioned for forward compatibility with T2/T3 fields (latest_user_prompt,
last_verifier_run).
Code review on Phase C flagged two Critical issues:
1. Bash truncates $tmp before jq runs; if $path is unparseable, jq fails
   but mv unconditionally clobbers $path with the empty tmp file. Now
   each write helper (init/update/capture_prompt/record_verifier_run)
   gates mv on jq's exit code and rms the tmp on failure.
2. Migration race could rewind stop_count when two hooks fire concurrently
   on a session with a stale legacy file. Migration now (a) holds the
   flock across the entire cat → update → rm-legacy sequence, (b) re-checks
   the legacy file under lock, (c) absorbs additively
   (.stop_count = .stop_count + N) so a peer that already incremented past
   N is not rewound.

Also adds: 12-digit length cap on legacy counter (prevents int64 overflow
on downstream NEXT=$((COUNT + 1))); validation that record_verifier_run's
$complete is literal true|false (rejects string "true" with EX_USAGE 64).

8 new regression tests.
Re-review caught that [[ ! -f "${PATH}.tmp"* ]] doesn't expand the
glob and effectively never reports a leak. Replaced with
`compgen -G` so the test actually fails when a tmp file is left behind.
Also tightened the bytes-preserved assertion from >0 to byte-exact.
…file)

See CHANGELOG.md for the full entry. Three independent additions
ported from the fork-network review (mickn/taskmaster), each
opt-in or backward-compatible:

- T1.1: TASKMASTER_VERIFY_COMMAND shell-verifier gate
- T1.3: Tagged hook-injected prompts + detection lib
- T1.2: JSON state file with flock + legacy migration
Final cross-cutting review flagged that the canonical hook uses
'set -euo pipefail' while the hooks/ mirror uses only 'set -u'.
Under set -e, a non-zero from a state-lib helper (e.g. on a
corrupt state file) would abort the hook before emitting its
'decision: block' JSON, dropping the user back into the agent
with no continuation prompt. Aligning to set -uo pipefail makes
the two hooks behave identically and ensures the decision is
always emitted.
… Code layout

The installer now copies check-completion.sh to ~/.claude/hooks/taskmaster-check-completion.sh
alongside the existing skill directory install. Existing installs with the old
skills/taskmaster/hooks/ path are automatically migrated. Uninstall cleans up both locations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant