Skip to content

feat(anthropic): add prompt caching support to direct Anthropic API#2160

Open
chosw1029 wants to merge 1 commit intostrands-agents:mainfrom
chosw1029:feat/anthropic-prompt-caching
Open

feat(anthropic): add prompt caching support to direct Anthropic API#2160
chosw1029 wants to merge 1 commit intostrands-agents:mainfrom
chosw1029:feat/anthropic-prompt-caching

Conversation

@chosw1029
Copy link
Copy Markdown

Description

AnthropicModel currently provides no way to take advantage of Anthropic's prompt caching feature, which BedrockModel already supports. Three gaps:

  1. stream() drops system_prompt_content. The event loop (event_loop/streaming.py) passes both system_prompt and system_prompt_content to the model. BedrockModel uses the latter, but AnthropicModel.stream() only declares system_prompt: str | None and silently swallows system_prompt_content via **kwargs.

  2. format_request() sends system as a plain string. The resulting request body is {"system": "<string>", ...}, so there is no place to attach cache_control.

  3. Cache token counts are dropped. In format_chunk() the metadata case only extracts input_tokens / output_tokens. Anthropic already returns cache_creation_input_tokens and cache_read_input_tokens in the usage object, but they never reach downstream consumers.

This PR closes the three gaps:

  • stream() and format_request() now accept system_prompt_content: list[SystemContentBlock] | None.
  • When system_prompt_content is supplied, the system field is emitted in Anthropic list-form. A cachePoint block attaches cache_control: {"type": "ephemeral"} to the preceding text block, mirroring the convention already used by _format_request_messages.
  • format_chunk() emits cacheReadInputTokens / cacheWriteInputTokens in the metadata usage dict. Names match BedrockModel's, so existing observability code (spans that read cacheReadInputTokens) works without changes. The existing Usage TypedDict already declares both fields as optional.

No beta headers are required — prompt caching is GA on the direct Anthropic API for Claude Sonnet 3.7+, Sonnet 4.x, Opus 4.x, and Haiku 4.x. The minimum cacheable prefix is enforced by the API, not the SDK.

Backwards compatibility

  • stream() / format_request() add a keyword argument with a default of None; existing callers are unaffected.
  • When system_prompt_content is absent, behavior is byte-for-byte identical (system is still sent as a plain string).
  • format_chunk() only adds cache keys to the metadata usage dict when the upstream response reports non-zero cache counts; otherwise the shape is unchanged.
  • No change to AnthropicConfig TypedDict — caching is driven by system_prompt_content, which is already produced by Agent when the caller passes a list[SystemContentBlock] system prompt.

Related Issues

Relates to #1140 (Prompt caching support for all models — currently ""ready for contribution"") and #1432 (cache_strategy=""auto"" across providers; the AnthropicModel path was listed but never implemented).

Documentation PR

Will follow up with a docs PR on strands-agents/agents-docs once the approach is confirmed. Happy to include it in this PR if preferred.

Type of Change

New feature

Testing

Unit tests added in tests/strands/models/test_anthropic.py:

  • system_prompt_content with a cachePoint block → request system is list-form and the preceding text block carries cache_control: {""type"": ""ephemeral""}.
  • system_prompt_content with text blocks only (no cachePoint) → list-form system, no cache_control.
  • system_prompt_content takes precedence over system_prompt when both are supplied.
  • Anthropic response metadata with cache_read_input_tokens / cache_creation_input_tokensformat_chunk exposes cacheReadInputTokens / cacheWriteInputTokens.
  • Anthropic response with zero cache counts → metadata usage dict unchanged.

Manually verified with a live Claude Sonnet 4.5 call: a second request with an identical system prefix reports cache_read_input_tokens > 0 where the pre-patch code reported 0.

Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli.

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- accept system_prompt_content in stream() / format_request() and emit
  Anthropic list-form system with cache_control on the block preceding
  a cachePoint
- surface cacheReadInputTokens and cacheWriteInputTokens in metadata
  usage events, matching BedrockModel field names
- add unit tests covering translation precedence and metadata extraction

Relates to strands-agents#1140, strands-agents#1432
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant