feat(anthropic): add prompt caching support to direct Anthropic API#2160
Open
chosw1029 wants to merge 1 commit intostrands-agents:mainfrom
Open
feat(anthropic): add prompt caching support to direct Anthropic API#2160chosw1029 wants to merge 1 commit intostrands-agents:mainfrom
chosw1029 wants to merge 1 commit intostrands-agents:mainfrom
Conversation
- accept system_prompt_content in stream() / format_request() and emit Anthropic list-form system with cache_control on the block preceding a cachePoint - surface cacheReadInputTokens and cacheWriteInputTokens in metadata usage events, matching BedrockModel field names - add unit tests covering translation precedence and metadata extraction Relates to strands-agents#1140, strands-agents#1432
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
AnthropicModelcurrently provides no way to take advantage of Anthropic's prompt caching feature, whichBedrockModelalready supports. Three gaps:stream()dropssystem_prompt_content. The event loop (event_loop/streaming.py) passes bothsystem_promptandsystem_prompt_contentto the model.BedrockModeluses the latter, butAnthropicModel.stream()only declaressystem_prompt: str | Noneand silently swallowssystem_prompt_contentvia**kwargs.format_request()sends system as a plain string. The resulting request body is{"system": "<string>", ...}, so there is no place to attachcache_control.Cache token counts are dropped. In
format_chunk()themetadatacase only extractsinput_tokens/output_tokens. Anthropic already returnscache_creation_input_tokensandcache_read_input_tokensin the usage object, but they never reach downstream consumers.This PR closes the three gaps:
stream()andformat_request()now acceptsystem_prompt_content: list[SystemContentBlock] | None.system_prompt_contentis supplied, thesystemfield is emitted in Anthropic list-form. AcachePointblock attachescache_control: {"type": "ephemeral"}to the preceding text block, mirroring the convention already used by_format_request_messages.format_chunk()emitscacheReadInputTokens/cacheWriteInputTokensin the metadata usage dict. Names matchBedrockModel's, so existing observability code (spans that readcacheReadInputTokens) works without changes. The existingUsageTypedDict already declares both fields as optional.No beta headers are required — prompt caching is GA on the direct Anthropic API for Claude Sonnet 3.7+, Sonnet 4.x, Opus 4.x, and Haiku 4.x. The minimum cacheable prefix is enforced by the API, not the SDK.
Backwards compatibility
stream()/format_request()add a keyword argument with a default ofNone; existing callers are unaffected.system_prompt_contentis absent, behavior is byte-for-byte identical (system is still sent as a plain string).format_chunk()only adds cache keys to the metadata usage dict when the upstream response reports non-zero cache counts; otherwise the shape is unchanged.AnthropicConfigTypedDict — caching is driven bysystem_prompt_content, which is already produced byAgentwhen the caller passes alist[SystemContentBlock]system prompt.Related Issues
Relates to #1140 (Prompt caching support for all models — currently ""ready for contribution"") and #1432 (cache_strategy=""auto"" across providers; the AnthropicModel path was listed but never implemented).
Documentation PR
Will follow up with a docs PR on
strands-agents/agents-docsonce the approach is confirmed. Happy to include it in this PR if preferred.Type of Change
New feature
Testing
Unit tests added in
tests/strands/models/test_anthropic.py:system_prompt_contentwith acachePointblock → requestsystemis list-form and the preceding text block carriescache_control: {""type"": ""ephemeral""}.system_prompt_contentwith text blocks only (nocachePoint) → list-form system, nocache_control.system_prompt_contenttakes precedence oversystem_promptwhen both are supplied.cache_read_input_tokens/cache_creation_input_tokens→format_chunkexposescacheReadInputTokens/cacheWriteInputTokens.Manually verified with a live Claude Sonnet 4.5 call: a second request with an identical system prefix reports
cache_read_input_tokens > 0where the pre-patch code reported0.Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli.
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.