fix: handle reasoning/thinking content from models#2983
Open
TheArchitectit wants to merge 77 commits intoultraworkers:mainfrom
Open
fix: handle reasoning/thinking content from models#2983TheArchitectit wants to merge 77 commits intoultraworkers:mainfrom
TheArchitectit wants to merge 77 commits intoultraworkers:mainfrom
Conversation
When the model API returns a context_window_blocked error (because the request exceeds the model's context window), the CLI now automatically: 1. Compact the session (remove old messages to free up space) 2. Retry the original request with the compacted session 3. Report results to the user This eliminates the need for users to manually run /compact when they hit context limits - the recovery happens automatically. ## Technical Details - Detection: Looks for 'context_window' or 'Context window' in error message - Uses runtime::compact_session() to aggressively compact (max_estimated_tokens=0) - Creates new runtime with compacted session and retries the turn - Reports compaction results and final status to user ## Testing Tested successfully with a request that exceeded model's context: - Auto-compact triggered: 'Messages removed 19, Messages kept 5' - Successfully retried and completed after compaction
…t-window-error feat: auto-compact and retry on context window errors
When the model API returns a context_window_blocked error (because the request exceeds the model's context window), the CLI now automatically: 1. Compact the session (remove old messages to free up space) 2. Retry the original request with the compacted session 3. Report results to the user This eliminates the need for users to manually run /compact when they hit context limits - the recovery happens automatically. ## Technical Details - Detection: Looks for 'context_window' or 'Context window' in error message - Uses runtime::compact_session() to aggressively compact (max_estimated_tokens=0) - Creates new runtime with compacted session and retries the turn - Reports compaction results and final status to user ## Testing Tested successfully with a request that exceeded model's context: - Auto-compact triggered: 'Messages removed 19, Messages kept 5' - Successfully retried and completed after compaction
…rl+P Adds an interactive setup wizard that lets users configure their provider, API key, base URL, and model without setting environment variables. Configuration is persisted to ~/.claw/settings.json (with 0600 permissions). New features: - `claw setup` CLI subcommand runs the wizard from the terminal - `/setup` slash command runs the wizard inside the REPL (hot-swaps model) - Ctrl+P hotkey in the REPL triggers /setup for in-session provider swap - Stored provider config used as fallback when env vars are absent - Three-tier auth resolution: env var > .env file > stored config - RuntimeProviderConfig struct and validation in settings schema Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Ctrl+P now inserts a sentinel char (\x01) that the highlighter renders as a cyan "[Provider Swap]" prompt. User presses Enter to confirm and launch the setup wizard. Returns ReadOutcome::ProviderSwap so the REPL loop can hot-swap the model and reprint the connection line. Also fixes clippy warnings: merged duplicate match arms in provider_config_value, doc_markdown on ProviderKind, map_unwrap_or idioms in setup_wizard.rs, and pre-existing clippy issues in main.rs and commands/lib.rs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously /resume latest only searched the current workspace's fingerprinted session directory. If you started claw from a different directory, it found zero sessions even though sessions existed elsewhere on disk. Changes: - Add global_sessions_root() pointing to ~/.claw/sessions/ - Add scan_global_sessions() to scan all workspace namespaces - Modify latest_session() to fall back to global scan when no workspace-local sessions are found - Add load_session_loose() that skips workspace validation for alias references (latest/last/recent) so cross-workspace resume works while still enforcing workspace check for explicit IDs - Wire load_session_loose() into CLI's load_session_reference() - Add provider field to config validation schema (needed because user's settings.json already has the provider key) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous implementation only scanned ~/.claw/sessions/ for the global fallback, but sessions are actually stored in the project-local <cwd>/.claw/sessions/<fingerprint>/ by SessionStore::from_cwd(). Now scans both the global root and the project-local parent directory (checking all fingerprint subdirs) so /resume latest finds sessions regardless of where they're stored. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously /resume latest returned the most recently created session, which was always the empty one just created on startup. Now it skips sessions with 0 messages and excludes the current session ID, so it finds the previous session with actual conversation history. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implement complete LSP support for code intelligence tools: - lsp_transport.rs: JSON-RPC 2.0 transport over stdio with Content-Length framing, async request/response handling, and graceful shutdown - lsp_process.rs: LSP process manager with initialize handshake, and methods for hover, goto_definition, references, document_symbols, completion, format - lsp_discovery.rs: Auto-discovery of installed LSP servers (rust-analyzer, clangd, gopls, pyright, typescript-language-server, etc.) with PATH lookup - lsp_client.rs: Rewired LspRegistry to use real LSP processes instead of placeholder JSON, with lazy-start on first dispatch call - config.rs: Added LspServerConfig for user-configured LSP servers - config_validate.rs: Validation for lsp config section - main.rs: CLI integration with server discovery at startup, /lsp slash command for status/start/stop/restart, and graceful shutdown on exit - commands/src/lib.rs: Added SlashCommand::Lsp variant The LSP tool is now available to the agent for hover, definition, references, symbols, completion, and diagnostics queries. Servers are auto-discovered at REPL startup and lazily started on first use. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rust-analyzer installed through rustup exits non-zero on --version
("Unknown binary in official toolchain"), which caused discovery
to skip it. Changed command_exists_on_path to treat any successful
spawn as "found", regardless of exit code — only a failure to
spawn (command not found) means the server isn't available.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…chment Wire LSP into the Read/Edit/Write tool flow so the agent automatically gets diagnostics after file operations: - lsp_transport: Add LspServerMessage enum, read_message() for handling both responses and server-initiated notifications, notification queue with drain_notifications(), send_request now handles interleaved publishDiagnostics without breaking - lsp_process: Add did_open(), did_change(), drain_diagnostics(), open file tracking (HashSet) and version counters for didChange, language_id_for_path() and severity_name() helpers - lsp_client: Add notify_file_open(), notify_file_change(), fetch_diagnostics_for_file() with best-effort graceful fallback, registry-level open file tracking, diagnostic caching - tools: Enrich run_read_file with didOpen + diagnostics, run_write_file and run_edit_file with didChange + diagnostics, format_diagnostic_appendix() for readable diagnostic output appended to tool results All enrichment is non-blocking: if no LSP server is available, tools work exactly as before. No errors propagate from the LSP layer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Split the three large LSP files into module directories with sub-files: lsp_transport/ (was 560 lines): - mod.rs (425) — types + LspTransport impl - tests.rs (134) — test module lsp_process/ (was 929 lines): - mod.rs (436) — LspProcess struct + public methods + error types - parse.rs (311) — helper functions and LSP response parsers - tests.rs (194) — test module lsp_client/ (was 1338 lines): - mod.rs (466) — LspRegistry struct + impl, re-exports from types - types.rs (103) — LspAction, LspDiagnostic, LspServerStatus, etc. - dispatch.rs (224) — LspRegistry::dispatch() method - tests.rs (273) — core registry tests - tests_lifecycle.rs (294) — lifecycle and integration tests All files under 500 lines. All 501 runtime tests pass. Clippy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…transport modules - Add lsp_auto_start field to RuntimeFeatureConfig (default: true) - Add lspAutoStart bool field validation in config_validate - Parse lspAutoStart from config JSON - Auto-start discovered LSP servers on REPL init when enabled - Add /lsp toggle command to enable/disable auto-start at runtime - Remove lsp_client.rs, lsp_process.rs, lsp_transport.rs (2831 lines) — functionality consolidated into discovery-based auto-start - Show auto-start status in /lsp status output
Implement complete LSP support for code intelligence tools: - lsp_transport.rs: JSON-RPC 2.0 transport over stdio with Content-Length framing, async request/response handling, and graceful shutdown - lsp_process.rs: LSP process manager with initialize handshake, and methods for hover, goto_definition, references, document_symbols, completion, format - lsp_discovery.rs: Auto-discovery of installed LSP servers (rust-analyzer, clangd, gopls, pyright, typescript-language-server, etc.) with PATH lookup - lsp_client.rs: Rewired LspRegistry to use real LSP processes instead of placeholder JSON, with lazy-start on first dispatch call - config.rs: Added LspServerConfig for user-configured LSP servers - config_validate.rs: Validation for lsp config section - main.rs: CLI integration with server discovery at startup, /lsp slash command for status/start/stop/restart, and graceful shutdown on exit - commands/src/lib.rs: Added SlashCommand::Lsp variant The LSP tool is now available to the agent for hover, definition, references, symbols, completion, and diagnostics queries. Servers are auto-discovered at REPL startup and lazily started on first use. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rust-analyzer installed through rustup exits non-zero on --version
("Unknown binary in official toolchain"), which caused discovery
to skip it. Changed command_exists_on_path to treat any successful
spawn as "found", regardless of exit code — only a failure to
spawn (command not found) means the server isn't available.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…chment Wire LSP into the Read/Edit/Write tool flow so the agent automatically gets diagnostics after file operations: - lsp_transport: Add LspServerMessage enum, read_message() for handling both responses and server-initiated notifications, notification queue with drain_notifications(), send_request now handles interleaved publishDiagnostics without breaking - lsp_process: Add did_open(), did_change(), drain_diagnostics(), open file tracking (HashSet) and version counters for didChange, language_id_for_path() and severity_name() helpers - lsp_client: Add notify_file_open(), notify_file_change(), fetch_diagnostics_for_file() with best-effort graceful fallback, registry-level open file tracking, diagnostic caching - tools: Enrich run_read_file with didOpen + diagnostics, run_write_file and run_edit_file with didChange + diagnostics, format_diagnostic_appendix() for readable diagnostic output appended to tool results All enrichment is non-blocking: if no LSP server is available, tools work exactly as before. No errors propagate from the LSP layer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Split the three large LSP files into module directories with sub-files: lsp_transport/ (was 560 lines): - mod.rs (425) — types + LspTransport impl - tests.rs (134) — test module lsp_process/ (was 929 lines): - mod.rs (436) — LspProcess struct + public methods + error types - parse.rs (311) — helper functions and LSP response parsers - tests.rs (194) — test module lsp_client/ (was 1338 lines): - mod.rs (466) — LspRegistry struct + impl, re-exports from types - types.rs (103) — LspAction, LspDiagnostic, LspServerStatus, etc. - dispatch.rs (224) — LspRegistry::dispatch() method - tests.rs (273) — core registry tests - tests_lifecycle.rs (294) — lifecycle and integration tests All files under 500 lines. All 501 runtime tests pass. Clippy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…transport modules - Add lsp_auto_start field to RuntimeFeatureConfig (default: true) - Add lspAutoStart bool field validation in config_validate - Parse lspAutoStart from config JSON - Auto-start discovered LSP servers on REPL init when enabled - Add /lsp toggle command to enable/disable auto-start at runtime - Remove lsp_client.rs, lsp_process.rs, lsp_transport.rs (2831 lines) — functionality consolidated into discovery-based auto-start - Show auto-start status in /lsp status output
Remove SlashCommand::Setup (provider wizard), PROVIDER_FIELDS (provider config), and stale imports that leaked in from the feat/lsp-integration branch which included other PRs. Also fix pre-existing clippy findings (Duration::from_hours, is_ok_and).
…SP, resume-latest)
Add the 3-stage Trident compaction strategy from R.A.D.1.C.A.L, adapted for the Rust CLI session model: Stage 1 - SUPERSEDE: Zero-cost factual pruning. If a file was read and then later written/edited, the earlier read is obsolete and removed. Earlier writes superseded by later writes are also dropped. Stage 2 - COLLAPSE: Buffer short chatty exchanges (under 200 chars, no tool calls) and collapse them into dense summary blocks when the threshold is exceeded. Stage 3 - CLUSTER: Group semantically similar messages (same tool names, same file paths, similar lengths) using Jaccard-based fingerprinting and collapse clusters into summary blocks. All three stages run before the existing summary-based compaction, so less data needs to be summarized. Wired into both /compact and the auto-compact retry on context window errors.
…e retry - Add TimeoutConfig to HTTP client builder with connect_timeout (30s) and request_timeout (5min) defaults, configurable via CLAW_API_CONNECT_TIMEOUT and CLAW_API_REQUEST_TIMEOUT env vars - Add with_timeout() builder to both AnthropicClient and OpenAiCompatClient for per-client timeout configuration - Parse Retry-After header on 429 responses and use it to override exponential backoff delay when present - Add ApiTimeoutConfig to runtime config with apiTimeout settings in ~/.claw/settings.json (connectTimeout, requestTimeout, maxRetries) - Add retry_after field to ApiError::Api for propagating rate limit backoff hints through the retry pipeline
Some providers/proxies return HTTP 400 with bodies like "no parseable body" or "connection reset" during transient network blips. These are not real bad requests — they're gateway errors wearing a 400 mask. Detect known gateway error phrases in 400 response bodies and mark them as retryable so the existing exponential backoff handles them.
- compact.rs: fix panic when preserve_recent_messages=0 - main.rs: progressive 4-round auto-compact retry with session_mut fix - main.rs: detect "no parseable body" as context window overflow - anthropic.rs: remove debug eprintln - error.rs: add "no parseable body" to CONTEXT_WINDOW_ERROR_MARKERS - config.rs, lib.rs: conflict resolution fixes from merge 💘 Generated with Crush Assisted-by: GLM 5.1 FP8 via Crush <crush@charm.land>
Instead of erroring when neither mode nor tasks are specified, default to "2x" (2 Explore + 2 Plan + 2 Verification = 6 agents). Co-authored-by: GLM 5.1 FP8 via Crush <crush@charm.land>
The combined branch had the old setup_wizard without prompt_fast_model() and save_settings_field(), so claw setup never asked for the subagent model. Restore the provider-wizard version that includes the fast model prompt and writes subagentModel to settings.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The fast model prompt (prompt_fast_model, subagentModel) was lost during the merge into feat/all-prs-combined. This adds it back so claw setup asks for a smaller/cheaper model for Agent subtasks and writes subagentModel to ~/.claw/settings.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- TeamStatus tool with 3 actions:
- status: live snapshot (running/completed/failed counts, agent details)
- summary: final results when agents finish (includes result content)
- events: timeline from append-only event log
- Background team watcher thread spawned by TeamCreate:
- Polls agent .json files every 2s
- Prints [team] progress to stderr on agent completion/failure
- Updates team manifest status when all agents finish
- Writes events to .clawd-agents/teams/{team_id}-events.jsonl
- TeamStatus added to PARALLEL_SAFE_TOOLS and all agent allowed_tools
Co-authored-by: GLM 5.1 FP8 via Crush <crush@charm.land>
…agentModel Co-authored-by: GLM 5.1 FP8 via Crush <crush@charm.land>
On REPL start, check for missing provider.apiKey, provider.baseUrl, and subagentModel. Print a warning with instructions to run `claw setup` or `/setup` if any are absent. Co-authored-by: GLM 5.1 FP8 via Crush <crush@charm.land>
- Agents post completion/failure to team inbox on termination
(.clawd-agents/mailbox/team/{team_id}/{agent_id}-{ts}.json)
- Team watcher reads from inbox instead of polling .json files
- New TeamStatus action=inbox reads team messages from the inbox
- AgentOutput carries team_id, persisted in manifest
- AgentInput accepts team_id from TeamCreate
- TeamCreate passes team_id to each spawned agent
- Inbox cleaned up when all agents finish
Co-authored-by: GLM 5.1 FP8 via Crush <crush@charm.land>
…nitoring - TeamInboxReporter: per-tool-call progress reporting to team inbox - TaskClaim tool: atomic claim/release/list with .clawd-agents/claims/ lock files - Team-scoped task_ids to prevent cross-team claim collisions - AgentSuggestion tool: propose AGENTS.md additions (human review required) - ContextRequest tool: iterative retrieval with 3-cycle budget for sub-agents - Context-window-aware auto-compaction (70% threshold) prevents overflow - Model token limits for qwen/glm/generic models with 131K fallback - Reviewer subagent_type: read-only tools, no bash/write - Team mode presets: 1x-6x (tiny/small/medium/large/xlarge/mega) - /team slash command + Ctrl+T toggle (off by default, CLAWD_AGENT_TEAMS=1) - TeamDelete: disk-based deletion with inbox/claims cleanup - TeamStatus: kill stuck agents, list AGENTS.md suggestions - AGENTS.md: auto-loaded shared learnings in sub-agent system prompt - Periodic git commits every 5 tool calls via TeamInboxReporter - Claims released on failure/panic in spawn_agent_job - Fixed doubled .clawd-agents/.clawd-agents/ paths (set CLAWD_AGENT_STORE abs) - Fixed "unknown error" in team watcher (added error field to inbox messages) 💘 Generated with Crush Assisted-by: GLM 5.1 FP8 via Crush <crush@charm.land>
Some OpenAI-compatible providers (e.g., GLM-5) omit the `id` field in streaming and non-streaming responses. Adding #[serde(default)] allows the parser to accept these responses instead of failing with "missing field `id`". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds scripts/install.sh that builds the release binary and links it to ~/.local/bin/claw. Run after code changes to update the CLI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a provider returns HTML (e.g., error page, wrong endpoint) instead of JSON in an SSE stream, provide a clear error message instead of hanging or failing with a cryptic parse error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a provider returns a JSON error (e.g., {"error":{"message":"..."}})
without SSE framing (no "data:" prefix), the SSE parser was silently
ignoring it and hanging. Now detects and surfaces these errors.
Also handles HTML responses that lack SSE framing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Some providers (GLM, DeepSeek) emit reasoning tokens in `reasoning_content` or nested `thinking.content` fields instead of `content`. Added support for these fields so reasoning models work correctly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The final streaming chunk from some providers contains only finish_reason and usage, with no delta field. Made it optional to prevent parse errors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When preserve_recent_messages == 0, raw_keep_from equals messages.len(), causing index out of bounds when accessing session.messages[k]. Added k >= session.messages.len() check to prevent panic. Reason: Compaction with preserve_recent_messages=0 triggered OOB access when checking for tool-use/tool-result pair preservation at boundary. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Create cli/parse.rs (~1,200 lines) with argument parsing functions - Create cli/model.rs (~130 lines) with model provenance tracking - Create cli/mod.rs to export cli module - Remove duplicate code from main.rs (~1,300 lines reduced) This is part of the ongoing modularization effort to reduce main.rs from 13,700 lines to manageable, focused modules under 500 lines each. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Create cli/doctor.rs (~695 lines) with health check functions - main.rs reduced from 12,377 to 11,751 lines - Export BUILD_TARGET, render_doctor_report from cli module - Remove duplicate constants (OFFICIAL_REPO_*, DEPRECATED_INSTALL_COMMAND) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove unused get_agent_result_preview function from tools/lib.rs - Remove unused is_task_claimed function from tools/lib.rs - Remove unused setup_agent_worktree/teardown_agent_worktree from tools/lib.rs - Remove unused RulesImportConfig enum and parse_optional_rules_import from config.rs Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Create cli/format.rs module with all report formatting functions - Extract StatusContext, StatusUsage, GitWorkspaceSummary structs - Extract format_model_report, format_model_switch_report - Extract format_permissions_report, format_permissions_switch_report - Extract format_cost_report, format_resume_report, render_resume_usage - Extract format_compact_report, format_auto_compaction_notice - Extract format_status_report, format_sandbox_report - Extract format_commit_preflight_report, format_commit_skipped_report - Extract format_bughunter_report, format_ultraplan_report - Extract format_pr_report, format_issue_report - Update main.rs imports to use cli module - Remove duplicate definitions from main.rs Total: 413 lines extracted Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Create cli/permission.rs module with CliPermissionPrompter - Extract permission_mode_for_mcp_tool and mcp_annotation_flag functions - Update main.rs imports to use cli module - Remove duplicate definitions from main.rs Total: 78 lines extracted Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Create search.rs module with web fetch and web search functionality - Extract WebFetchInput, WebSearchInput input structs - Extract WebFetchOutput, WebSearchOutput, WebSearchResultItem output types - Extract SearchHit struct and all helper functions - Extract execute_web_fetch and execute_web_search functions - Update lib.rs to use search module Total: 461 lines extracted from lib.rs Design consideration for multi-agent workflows: - Search module is now self-contained and can be used by agents - Clean separation enables future agent-level search capabilities - Output types are serializable for inter-agent communication Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Create team.rs module for multi-agent workflow coordination - Extract task claiming (claim_task, release_claim, list_claims) - Extract TeamInboxReporter for agent progress reporting - Extract expand_team_mode for team mode presets - Extract agent_mailbox_dir, claims_dir for directory management - Extract append_team_event for event logging - Update lib.rs to use team module Total: 292 lines extracted from lib.rs Multi-agent architecture considerations: - Task claiming uses atomic rename to prevent race conditions - Team inbox enables real-time progress monitoring - Kill signals allow coordinated agent termination - Mode presets support scalable team configurations (1x-6x) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Create agent.rs module with agent-related utilities - Extract AgentInput and AgentOutput structs - Extract agent_store_dir, make_agent_id, slugify_agent_name - Extract normalize_subagent_type, canonical_tool_token, iso8601_now - Update lib.rs to use agent module - Remove duplicate structs and functions from lib.rs Total: 161 lines extracted from lib.rs Multi-agent architecture considerations: - Agent IDs are unique nanosecond timestamps - Subagent types are normalized to canonical forms - Agent store directory supports CLAWD_AGENT_STORE env override Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When using reasoning-capable models (Claude extended thinking, Grok 3, OpenAI o3/o4), the application failed with "assistant stream produced no content" because thinking blocks were being completely ignored. Changes: - Added AssistantEvent::ThinkingDelta variant in runtime - Added flush_thinking_block() to convert thinking to displayable text - Updated build_assistant_message() to accept thinking as valid content - Updated tools/src/lib.rs to emit thinking events from stream - Added tests for thinking content handling - Also includes: Team command ordering fix, GitHub CLI env pass-through Test Plan: - Added build_assistant_message_accepts_thinking_content test - Added build_assistant_message_accepts_thinking_with_signature test - All 23 conversation tests pass - All 562 runtime tests pass
The previous fix only handled thinking content in the tools crate's ProviderRuntimeClient, but the CLI uses AnthropicRuntimeClient which has its own stream processing logic. This caused the \"assistant stream produced no content\" error to persist. Changes: - ContentBlockDelta::ThinkingDelta now emits AssistantEvent::ThinkingDelta - ContentBlockDelta::SignatureDelta now emits AssistantEvent::ThinkingDelta - push_output_block now emits ThinkingDelta for OutputContentBlock::Thinking - Updated synthetic MessageStop check to include ThinkingDelta as content This completes the fix for handling reasoning/thinking content.
…errors Models like claude-sonnet-4-* were requesting 64,000 max_tokens, which combined with ~80k input tokens exceeded the 131k context window limit. Changed: - Non-opus models: 64_000 -> 40_000 tokens - This leaves ~90k for input + 40k for output within the 128k context window Fixes: context window blocked errors with large input sessions
Instead of fixed 32k/64k max_tokens, calculate dynamically based on estimated input size. This ensures input + output always fits within the 131k context window. Changes: - Added max_tokens_for_request() that takes estimated_input_tokens - Added estimate_request_input_tokens() for rough token estimation - max_tokens now = min(base_max, available_space - 4k buffer, min 8k) With 90k input: max_tokens reduces to ~37k (fits in 131k window) With 10k input: max_tokens stays at 64k (full output capacity) Fixes: context window blocked errors with large inputs
Add same dynamic max_tokens calculation to tools crate that was added to CLI crate. Prevents context window errors in subagents. Changes: - Added max_tokens_for_request() with context window awareness - Added estimate_input_tokens() for rough token estimation - ProviderRuntimeClient now calculates max_tokens based on input size
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: Handle reasoning/thinking content from models
Problem
When using reasoning-capable models (e.g., Claude with extended thinking, Grok 3, OpenAI o3/o4), the application fails with:
This occurs when the model returns only thinking/reasoning blocks without regular text content.
Root Cause (Two Places!)
1. tools crate (
tools/src/lib.rs)The SSE stream parser was explicitly ignoring thinking content blocks:
2. CLI crate (
rusty-claude-cli/src/main.rs)CRITICAL: The CLI uses its own
AnthropicRuntimeClientwith separate stream processing that was ALSO ignoring thinking content - only rendering it visually without emitting events:When a model returned only thinking content, zero
AssistantEventcontent events were produced, causingbuild_assistant_messageto fail.Solution
1. Runtime (
runtime/src/conversation.rs)AssistantEvent::ThinkingDelta { thinking, signature }variantflush_thinking_block()helper to convert thinking to<thinking>tags2. Tools crate (
tools/src/lib.rs)push_output_block()now emitsThinkingDeltafor thinking blocksContentBlockDeltahandler processesThinkingDeltaandSignatureDeltaMessageStopcheck includes thinking as valid content3. CLI crate (
rusty-claude-cli/src/main.rs)CRITICAL FIX:
ContentBlockDelta::ThinkingDeltanow emitsAssistantEvent::ThinkingDeltaContentBlockDelta::SignatureDeltanow emitsAssistantEvent::ThinkingDeltapush_output_block()now emitsThinkingDeltaforOutputContentBlock::ThinkingMessageStopcheck includesThinkingDeltaas valid contentChanges Checklist
runtime/src/conversation.rsThinkingDeltavariant toAssistantEventruntime/src/conversation.rsflush_thinking_block()helperruntime/src/conversation.rsbuild_assistant_message()runtime/src/conversation.rstools/src/lib.rspush_output_block()tools/src/lib.rsContentBlockDeltahandlertools/src/lib.rsrusty-claude-cli/src/main.rsThinkingDeltafromContentBlockDelta::ThinkingDeltarusty-claude-cli/src/main.rsThinkingDeltafromContentBlockDelta::SignatureDeltarusty-claude-cli/src/main.rsThinkingDeltafrompush_output_block()rusty-claude-cli/src/main.rscommands/src/lib.rsTeamcommand positionruntime/src/sandbox.rsghCLI usage within sandboxTesting
build_assistant_message_accepts_thinking_contenttestbuild_assistant_message_accepts_thinking_with_signaturetestImpact