Problem Statement
When using ToolResultExternalizer with InMemoryExternalizationStorage, externalized content accumulates without bound for the lifetime of the storage instance. For long-running agents that externalize many tool results, this means memory grows monotonically — the very scenario where externalization is most needed (long conversations with many tool calls) is also where unbounded storage becomes a problem.
FileExternalizationStorage and S3ExternalizationStorage have the same issue with disk/object accumulation, though the consequences are less acute (disk is cheaper than RAM, and S3 has lifecycle policies).
The ExternalizationStorage protocol currently has no mechanism for eviction, cleanup, or lifecycle management of stored artifacts.
Proposed Solution
Explore adding eviction capabilities to the storage layer. Possible approaches:
-
TTL-based eviction — stored content expires after a configurable duration. The storage backend automatically drops entries older than the TTL. Well-suited for in-memory and file backends.
-
LRU/capacity-based eviction — limit the number of stored entries or total byte size. When the limit is reached, the oldest/least-recently-retrieved entries are evicted. Prevents unbounded growth.
-
Turns-to-live (TTnL) — externalized content expires after N agent loop turns. Similar to TTL but measured in conversation turns rather than wall-clock time. More semantically meaningful for agents — content from 50 turns ago is likely irrelevant regardless of how much wall time has passed. Could be implemented by tracking turn count at storage time and evicting on subsequent AfterToolCallEvent or BeforeModelCallEvent fires.
-
Tree-based eviction — evict entire tool call chains at once. When an agent calls tool A which triggers tools B and C, all three externalized results form a logical unit. Evicting them together preserves consistency — you don't end up with orphaned child references pointing to evicted parent content, or partially-available chains where some context is missing. This aligns with how SlidingWindowConversationManager already removes tool use/result pairs together to maintain conversation coherence.
-
Conversation-scoped lifecycle — tie storage lifecycle to agent invocations. A hook on AfterInvocationEvent could clear entries from completed conversations, or storage could be scoped per-session.
-
Explicit delete(reference) on the protocol — let the plugin or user code clean up entries when they're no longer needed. Most flexible but puts the burden on the caller.
Any solution should be opt-in and backwards compatible with the existing ExternalizationStorage protocol (i.e., don't add required methods to the protocol).
Use Case
-
Long-running agents — An agent running for hours, making hundreds of tool calls, with InMemoryExternalizationStorage. Without eviction, memory grows until the process is killed.
-
Serverless/containerized agents — Fixed memory budget. Need to externalize large results for context savings but can't afford unbounded storage growth.
-
Multi-session servers — A server hosting multiple agent sessions. Each session's externalized content should be cleaned up when the session ends, not accumulate across sessions.
-
Cost management for S3 — Agents that externalize thousands of results per day to S3. Without lifecycle management, bucket size and costs grow indefinitely.
-
Multi-step reasoning chains — An agent that builds up context across 10 tool calls, then moves on to a new topic. The first chain's externalized results are no longer relevant and should be evictable as a unit without manually tracking individual references.
Alternatives Solutions
-
S3 lifecycle policies — S3 already supports object expiration rules. Users can configure these outside the SDK. This works for S3 but doesn't help in-memory or file backends.
-
User creates new storage per session — Users can instantiate a fresh InMemoryExternalizationStorage() for each agent invocation. Simple but requires the user to understand the lifecycle issue and manage it themselves.
-
Weak references in memory storage — Use weakref so entries can be GC'd when nothing holds a reference. Doesn't work well since the reference string is the only handle to the content.
-
Decorator/wrapper pattern — An EvictingStorage wrapper that adds eviction behavior on top of any ExternalizationStorage implementation, keeping the base protocol minimal. Example: EvictingStorage(InMemoryExternalizationStorage(), max_entries=100). This preserves the intentionally minimal ExternalizationStorage protocol (store/retrieve) by making eviction additive rather than requiring changes to the protocol or existing backends.
-
Leverage existing SDK patterns — SlidingWindowConversationManager uses count-based eviction (window_size) and removes tool use/result pairs together to maintain coherence. This is the closest prior art in the SDK for what eviction of externalized results would look like. An eviction strategy could mirror this approach — count-based, FIFO, respecting logical groupings.
-
Leverage Python ecosystem patterns — functools.lru_cache (capacity-based), cachetools.TTLCache (time-based), and Redis eviction policies (allkeys-lru, volatile-ttl) are well-established patterns. A turns-to-live or tree-based approach would be novel to the agent domain but composes well with the conversation manager's existing sliding window behavior, which has been explored in related context management discussions.
-
Do nothing — Document that users should manage storage lifecycle themselves. Simplest but violates the "obvious path is the happy path" tenet for long-running agents.
Additional Context
Problem Statement
When using
ToolResultExternalizerwithInMemoryExternalizationStorage, externalized content accumulates without bound for the lifetime of the storage instance. For long-running agents that externalize many tool results, this means memory grows monotonically — the very scenario where externalization is most needed (long conversations with many tool calls) is also where unbounded storage becomes a problem.FileExternalizationStorageandS3ExternalizationStoragehave the same issue with disk/object accumulation, though the consequences are less acute (disk is cheaper than RAM, and S3 has lifecycle policies).The
ExternalizationStorageprotocol currently has no mechanism for eviction, cleanup, or lifecycle management of stored artifacts.Proposed Solution
Explore adding eviction capabilities to the storage layer. Possible approaches:
TTL-based eviction — stored content expires after a configurable duration. The storage backend automatically drops entries older than the TTL. Well-suited for in-memory and file backends.
LRU/capacity-based eviction — limit the number of stored entries or total byte size. When the limit is reached, the oldest/least-recently-retrieved entries are evicted. Prevents unbounded growth.
Turns-to-live (TTnL) — externalized content expires after N agent loop turns. Similar to TTL but measured in conversation turns rather than wall-clock time. More semantically meaningful for agents — content from 50 turns ago is likely irrelevant regardless of how much wall time has passed. Could be implemented by tracking turn count at storage time and evicting on subsequent
AfterToolCallEventorBeforeModelCallEventfires.Tree-based eviction — evict entire tool call chains at once. When an agent calls tool A which triggers tools B and C, all three externalized results form a logical unit. Evicting them together preserves consistency — you don't end up with orphaned child references pointing to evicted parent content, or partially-available chains where some context is missing. This aligns with how
SlidingWindowConversationManageralready removes tool use/result pairs together to maintain conversation coherence.Conversation-scoped lifecycle — tie storage lifecycle to agent invocations. A hook on
AfterInvocationEventcould clear entries from completed conversations, or storage could be scoped per-session.Explicit
delete(reference)on the protocol — let the plugin or user code clean up entries when they're no longer needed. Most flexible but puts the burden on the caller.Any solution should be opt-in and backwards compatible with the existing
ExternalizationStorageprotocol (i.e., don't add required methods to the protocol).Use Case
Long-running agents — An agent running for hours, making hundreds of tool calls, with
InMemoryExternalizationStorage. Without eviction, memory grows until the process is killed.Serverless/containerized agents — Fixed memory budget. Need to externalize large results for context savings but can't afford unbounded storage growth.
Multi-session servers — A server hosting multiple agent sessions. Each session's externalized content should be cleaned up when the session ends, not accumulate across sessions.
Cost management for S3 — Agents that externalize thousands of results per day to S3. Without lifecycle management, bucket size and costs grow indefinitely.
Multi-step reasoning chains — An agent that builds up context across 10 tool calls, then moves on to a new topic. The first chain's externalized results are no longer relevant and should be evictable as a unit without manually tracking individual references.
Alternatives Solutions
S3 lifecycle policies — S3 already supports object expiration rules. Users can configure these outside the SDK. This works for S3 but doesn't help in-memory or file backends.
User creates new storage per session — Users can instantiate a fresh
InMemoryExternalizationStorage()for each agent invocation. Simple but requires the user to understand the lifecycle issue and manage it themselves.Weak references in memory storage — Use
weakrefso entries can be GC'd when nothing holds a reference. Doesn't work well since the reference string is the only handle to the content.Decorator/wrapper pattern — An
EvictingStoragewrapper that adds eviction behavior on top of anyExternalizationStorageimplementation, keeping the base protocol minimal. Example:EvictingStorage(InMemoryExternalizationStorage(), max_entries=100). This preserves the intentionally minimalExternalizationStorageprotocol (store/retrieve) by making eviction additive rather than requiring changes to the protocol or existing backends.Leverage existing SDK patterns —
SlidingWindowConversationManageruses count-based eviction (window_size) and removes tool use/result pairs together to maintain coherence. This is the closest prior art in the SDK for what eviction of externalized results would look like. An eviction strategy could mirror this approach — count-based, FIFO, respecting logical groupings.Leverage Python ecosystem patterns —
functools.lru_cache(capacity-based),cachetools.TTLCache(time-based), and Redis eviction policies (allkeys-lru, volatile-ttl) are well-established patterns. A turns-to-live or tree-based approach would be novel to the agent domain but composes well with the conversation manager's existing sliding window behavior, which has been explored in related context management discussions.Do nothing — Document that users should manage storage lifecycle themselves. Simplest but violates the "obvious path is the happy path" tenet for long-running agents.
Additional Context