Skip to content

Python: Support OpenAI and Gemini allowed_tools tool choice#5322

Open
giles17 wants to merge 6 commits intomicrosoft:mainfrom
giles17:agent/fix-5309-1
Open

Python: Support OpenAI and Gemini allowed_tools tool choice#5322
giles17 wants to merge 6 commits intomicrosoft:mainfrom
giles17:agent/fix-5309-1

Conversation

@giles17
Copy link
Copy Markdown
Contributor

@giles17 giles17 commented Apr 17, 2026

Motivation and Context

OpenAI and Azure OpenAI support an allowed_tools tool choice type that lets calers restrict which tools the model may invoke without removing tools from the prompt, preserving prompt caching benefits. The Agent Framework had no way to express this constraint.

Fixes #5309

Description

The ToolMode TypedDict gains an optional allowed_tools: list[str] field, validated to only be used with mode="auto". The OpenAI chat client's _prepare_options translates this into the wire format ({"type": "allowed_tools", "mode": "auto", "tools": [...]}) expected by the OpenAI API. Additionally, finish_reason is now propagated through AgentResponse and AgentResponseUpdate so calers can inspect why the model stopped generating, and Pydantic-based tool models (used by providers like Gemini) are properly serialized in _tools_to_dict.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by giles17's agent

Copilot and others added 2 commits April 17, 2026 03:14
Add allowed_tools field to ToolMode TypedDict, enabling users to restrict
which tools the model may call via the OpenAI allowed_tools tool_choice
type. This preserves prompt caching by keeping all tools in the tools list
while limiting which ones the model can invoke.

- Add allowed_tools: list[str] to ToolMode TypedDict
- Add validation in validate_tool_mode() (only valid when mode == "auto")
- Convert to OpenAI API format in _prepare_options()
- Add tests for validation and API payload generation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 17, 2026 03:49
@giles17 giles17 self-assigned this Apr 17, 2026
@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented Apr 17, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/anthropic/agent_framework_anthropic
   _chat_client.py4393592%446, 449, 530, 623, 625, 768, 795–796, 874, 876, 906–907, 952, 968–969, 976–978, 982–984, 988–991, 1105, 1115, 1167, 1315–1316, 1333, 1346, 1359, 1384–1385
packages/bedrock/agent_framework_bedrock
   _chat_client.py38210173%298–299, 315–324, 329, 345–351, 354–355, 363, 380, 389, 400, 402, 404, 409, 424–425, 446, 459, 471, 474, 482–483, 486–487, 489–490, 495–497, 499, 509–510, 532, 539, 548–549, 551–552, 554–556, 558, 560–561, 567–569, 572–573, 579–582, 588–598, 601, 620, 625, 667–668, 681, 707, 719, 724, 733, 737, 745–746, 750, 752–759
packages/core/agent_framework
   _types.py11088792%58, 67–68, 122, 127, 146, 148, 152, 156, 158, 160, 162, 180, 184, 210, 232, 237, 242, 246, 276, 689–690, 849–850, 1285, 1357, 1392, 1412, 1422, 1474, 1606–1608, 1790, 1893–1898, 1923, 2017, 2025–2027, 2032, 2123, 2135, 2158, 2413, 2437, 2536, 2790, 2999, 3072, 3083, 3085–3089, 3091, 3094–3102, 3112, 3182, 3319, 3324, 3329, 3334, 3338, 3422–3424, 3453, 3541–3545
packages/gemini/agent_framework_gemini
   _chat_client.py338199%375
packages/openai/agent_framework_openai
   _chat_client.py91812186%522–525, 529–530, 536–537, 572–578, 599, 607, 630, 748, 847, 906, 908, 910, 912, 978, 992, 1072, 1082, 1087, 1130, 1252, 1433, 1438, 1442–1444, 1448–1449, 1515, 1544, 1550, 1560, 1566, 1571, 1577, 1582–1583, 1602, 1692, 1714–1715, 1730–1731, 1749–1750, 1793, 1959, 1997–1998, 2014, 2016, 2096–2104, 2134, 2244, 2279, 2294, 2314–2324, 2337, 2348–2352, 2366, 2380–2391, 2400, 2432–2435, 2445–2446, 2457–2459, 2473–2475, 2485–2486, 2492, 2507
   _chat_completion_client.py3582792%428, 524–525, 529, 755–762, 764–767, 777, 855, 857, 874, 895, 923, 936, 960, 980, 1020, 1295
TOTAL29097346888% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5811 30 💤 0 ❌ 0 🔥 1m 35s ⏱️

Copy link
Copy Markdown
Contributor Author

@giles17 giles17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 93%

✓ Correctness

This PR contains only cosmetic/formatting changes: a single blank line added after the ToolMode class in _types.py, and several multi-line expressions collapsed into single lines in test_hyperlight_codeact.py. There are no logic changes and no correctness issues. The allowed_tools feature referenced in the issue context is already fully implemented in the codebase (ToolMode TypedDict, validate_tool_mode, and OpenAI client conversion).

✓ Security Reliability

This PR contains only cosmetic changes: an extra blank line added in _types.py (line 3157) and reformatting of multi-line expressions into single lines in test_hyperlight_codeact.py. There are no functional, security, or reliability changes. The allowed_tools field referenced in context lines already existed prior to this diff.

✓ Test Coverage

The PR adds allowed_tools support to ToolMode with good test coverage for the core validation and OpenAI Responses API client conversion. Tests cover valid single/multiple tools, invalid mode combinations, and regression for plain auto mode. Two test coverage gaps are notable: (1) no test for an empty allowed_tools list ([]), which passes validation and produces a likely-invalid API payload {"type": "allowed_tools", "tools": []}, and (2) the Chat Completions client (_chat_completion_client.py line 665-666) silently drops allowed_tools by falling through to run_options["tool_choice"] = mode (i.e., just "auto"), but there is no test documenting this behavior or warning the user.

✗ Design Approach

The diff itself is trivial — a blank line added to _types.py and cosmetic test reformatting. No logic is changed. However, the allowed_tools field that this PR exposes in ToolMode is not fully wired up: _chat_completion_client.py (lines 655–666) never checks for allowed_tools and silently falls through to emitting plain tool_choice: "auto", making the feature a no-op for users of that client. The _chat_client.py (lines 1218–1224) handles it correctly, creating an inconsistency between the two clients.

Flagged Issues

  • _chat_completion_client.py _prepare_options (lines 655–666): the allowed_tools branch is missing. When mode == "auto" and allowed_tools is set, the code falls through to run_options["tool_choice"] = mode, silently discarding the list and emitting plain "auto". _chat_client.py lines 1218–1224 show the correct pattern to mirror. Without this fix the feature is non-functional for the Chat Completions client.

Suggestions

  • Add a test for validate_tool_mode({"mode": "auto", "allowed_tools": []}) — an empty list passes validation today but would produce {"type": "allowed_tools", "tools": []} at the API level. Consider whether validation should reject it, and add a test either way to document the expected behavior.
  • Add a test in test_openai_chat_completion_client.py covering tool_choice={"mode": "auto", "allowed_tools": ["fn"]} to lock in the expected API payload (or to document that allowed_tools is silently dropped), analogous to test_prepare_options_allowed_tools in test_openai_chat_client.py. If allowed_tools is intentionally unsupported in the Chat Completions client, consider raising a warning so users don't silently lose the restriction.

Automated review by giles17's agents

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Python SDK support for OpenAI/Azure OpenAI tool_choice.type="allowed_tools" so callers can restrict tool invocation without removing tools from the prompt/tool list.

Changes:

  • Extend core ToolMode to include optional allowed_tools (only valid with mode="auto") and update validation.
  • Update OpenAI chat client option preparation to translate allowed_tools into the OpenAI wire format, with accompanying unit tests.
  • Adjust samples to suppress pyright optional-dependency import errors for orjson.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
python/packages/core/agent_framework/_types.py Adds allowed_tools to ToolMode and extends validate_tool_mode constraints.
python/packages/core/tests/core/test_types.py Adds unit tests for ToolMode.allowed_tools and validation behavior.
python/packages/openai/agent_framework_openai/_chat_client.py Maps ToolMode.allowed_tools into OpenAI tool_choice “allowed_tools” payload.
python/packages/openai/tests/openai/test_openai_chat_client.py Adds tests ensuring _prepare_options emits correct OpenAI tool_choice format.
python/samples/02-agents/conversations/file_history_provider.py Adds pyright ignore for optional orjson import.
python/samples/02-agents/conversations/file_history_provider_conversation_persistence.py Adds pyright ignore for optional orjson import.
python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py Minor test formatting adjustments.

Comment thread python/packages/core/agent_framework/_types.py Outdated
…ions client support

- validate_tool_mode now checks allowed_tools is a non-string sequence of
  strings and normalizes to list[str], raising ContentError for invalid types
- Add missing allowed_tools branch in _chat_completion_client._prepare_options
  so allowed_tools is emitted as the OpenAI allowed_tools wire format instead
  of being silently dropped
- Add tests for invalid allowed_tools types (string, int, mixed), empty list,
  tuple normalization, and Chat Completions client payload generation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

@giles17 giles17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 91%

✓ Correctness

The diff adds support for allowed_tools in ToolMode, following the same pattern as the existing required_function_name field. The validation logic in _types.py correctly checks type constraints (non-string sequence of strings), normalizes tuples to lists, and gates the field to mode == 'auto'. Both the Chat Completion client and the Responses API client correctly convert the validated allowed_tools into the OpenAI API format. The walrus operator chain in the client's if/elif branches is correct — mode is assigned even when the first condition short-circuits. Tests cover the key cases including invalid types, empty lists, tuple normalization, and single/multiple tool names. No correctness issues found.

✓ Security Reliability

The implementation is clean and follows the established patterns for ToolMode validation and client conversion. Input validation is thorough (type-checking allowed_tools as a non-string sequence of strings), and the conversion to OpenAI API format is correct. The validation function properly prevents conflicting fields (e.g., both required_function_name and allowed_tools). No security or reliability issues found.

✓ Test Coverage

The new allowed_tools feature has solid test coverage for validation logic (type checks, normalization, invalid mode combinations) and basic client payload generation (single and multiple tools). Two minor gaps: (1) no client-level test for an empty allowed_tools list, which the validation explicitly permits and would produce "tools": [] in the payload; (2) no regression test verifying that {"mode": "auto"} without allowed_tools still falls through to produce tool_choice = "auto" (though this is indirectly covered by the existing parametrized test at line 1627). Overall the coverage is good and assertions are meaningful—each test verifies specific structural properties of the output rather than just asserting no exception.

✓ Design Approach

The PR adds allowed_tools support to ToolMode following the same pattern as required_function_name: extend the TypedDict, validate centrally in validate_tool_mode, and convert to provider-specific API format in the client. The implementation is consistent with the existing framework design at every layer. No fundamental design problems found. One minor observation: when allowed_tools is already a list (the common case), validate_tool_mode returns the original dict object unchanged (final return tool_choice), while a tuple input returns a newly-constructed dict — this asymetry is harmless and matches the existing behavior for required_function_name, but worth being aware of. There are no missing cases in validation logic and the is not None guard in the client correctly passes empty lists through to the API.

Suggestions

  • Add a client-level test for empty allowed_tools list (e.g., {"mode": "auto", "allowed_tools": []}) to verify _prepare_options produces {"type": "allowed_tools", "mode": "auto", "tools": []} rather than falling through to run_options["tool_choice"] = mode. Validation tests confirm the empty list is accepted, but no client test exercises the resulting payload shape.
  • Consider adding a regression test verifying that {"mode": "auto"} (without allowed_tools) still produces tool_choice = "auto" through _prepare_options, since the new elif branch could theoretically interfere if the walrus operator condition were wrong. The existing parametrized test at line 1627 covers "auto" as a string but not {"mode": "auto"} as a dict without allowed_tools.

Automated review by giles17's agents

@cecheta
Copy link
Copy Markdown
Member

cecheta commented Apr 18, 2026

Thanks for looking into this, just to add that allowed_tools also supports mode: required in addition to auto.

https://github.com/openai/openai-python/blob/main/src/openai/types/responses/tool_choice_allowed.py

@giles17 giles17 changed the title Python: Support OpenAI allowed_tools tool choice in Python SDK Python: Support OpenAI allowed_tools tool choice Apr 20, 2026
OpenAI's allowed_tools tool_choice type supports both mode 'auto' and
'required'. Update validation, client conversion, and tests to allow
both modes instead of restricting to 'auto' only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread python/packages/core/agent_framework/_types.py
Copy link
Copy Markdown
Member

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some additional work tbd, but removing the request changes

@eavanvalkenburg eavanvalkenburg dismissed their stale review April 21, 2026 17:07

discussed approach

Copy link
Copy Markdown
Contributor Author

@giles17 giles17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 89%

✗ Correctness

The PR adds allowed_tools support to the core ToolMode TypedDict and implements provider-specific handling across OpenAI, Gemini, Anthropic, Bedrock, and Ollama. The core validation logic is thorough (mutual exclusion with required_function_name, mode gating, type checks, normalization). The OpenAI clients correctly map to the allowed_tools API payload. Provider warnings are properly placed. There is one correctness bug in the Gemini client: the new allowed_tools override block (lines 838-840) can set allowed_names to an empty list [], which the pre-existing truthiness check on line 843 (if allowed_names:) treats as falsy, causing the empty list to be silently dropped while mode is still set to ANY. This means allowed_tools: [] on Gemini produces 'model must call at least one function from the full set' — the semantic opposite of the user's intent ('no tools callable'). The core validation and tests explicitly support empty lists as valid.

✗ Security Reliability

The allowed_tools feature is well-implemented overall with solid input validation in the core layer. One reliability issue: the Gemini client's interaction between new allowed_tools code and the existing truthiness check on allowed_names causes incorrect behavior when allowed_tools is an empty list — the mode is silently changed to ANY (requiring a function call) without restricting to any names, producing the opposite of the caller's intent. The empty list is explicitly allowed by the validation layer (tested in test_types.py line 1178) and handled correctly by both OpenAI clients, making this a Gemini-specific regression. All other provider changes (warnings in Anthropic/Bedrock/Ollama, OpenAI conversion logic, core validation) look correct and well-tested.

✓ Test Coverage

Test coverage for the allowed_tools feature is generally solid: core validation has thorough tests (happy paths, edge cases, type coercion, mutual exclusion), both OpenAI clients have good coverage, and all unsupported providers have warning tests. There is one gap worth noting: the Gemini client silently degrades allowed_tools: [] (empty list) into ANY mode with no function name filtering (equivalent to 'required' — call any tool), because the if allowed_names: guard on line 843 is falsy for []. This is a behavioral inconsistency with the OpenAI clients which faithfully pass through empty lists. A test for this edge case would help document the intended behavior. The Responses API client tests (test_openai_chat_client.py) also lack an empty allowed_tools test, unlike the Chat Completions client tests which do have one.

✗ Design Approach

The implementation is largely well-structured and consistent with the framework's provider-extension pattern. One genuine design issue: the Gemini client silently promotes mode='auto' to ANY (i.e., required/forced tool use) when allowed_tools is present, because Gemini's API only supports allowedFunctionNames with ANY mode. This is a silent semantic change — the user requested optional tool use but gets mandatory tool use — with no warning to the caller. All other providers either honour the mode value as-is (OpenAI, Azure) or log a warning that the feature is unsupported (Anthropic, Bedrock, Ollama). The Gemini case is worse than unsupported: it partially honours the feature but changes the contract without telling the user.


Automated review by giles17's agents

Comment thread python/packages/gemini/agent_framework_gemini/_chat_client.py Outdated
# allowed_tools overrides: Gemini requires ANY mode for allowedFunctionNames
if "allowed_tools" in tool_mode:
allowed_names = list(tool_mode["allowed_tools"])
function_calling_mode = types.FunctionCallingConfigMode.ANY
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mode='auto' is silently promoted to ANY (forced tool call) here because Gemini's API only accepts allowedFunctionNames with ANY mode. This contradicts the user's intent: auto means the model may optionally call tools, while ANY means it must. Every other provider either honours the requested mode or emits a warning. This should log a warning so callers are not surprised by the changed behaviour.

Suggested change
function_calling_mode = types.FunctionCallingConfigMode.ANY
allowed_names = list(tool_mode["allowed_tools"])
if tool_mode.get("mode") == "auto":
logger.warning(
"Gemini does not support allowedFunctionNames with AUTO mode; "
"promoting to ANY (required) mode to honour allowed_tools"
)
function_calling_mode = types.FunctionCallingConfigMode.ANY

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f7ca2fdd.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to Any is still a issue, because Any basically maps to required, see https://ai.google.dev/gemini-api/docs/function-calling?example=meeting#function_calling_modes and I think really for Gemini, allowed_tools is not the way, it is required_function_name that should be combined with Any and there is not real equivalence, except maybe Validated, we could look at that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did notice this and I agree. However, VALIDATED was added to google-genai in v1.32.0, and we currently pin >=1.0.0. So this would require bumping the minimum to >=1.32.0 in pyproject.toml. Is that okay?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's go for that, thanks

@giles17 giles17 changed the title Python: Support OpenAI allowed_tools tool choice Python: Support OpenAI and Gemini allowed_tools tool choice Apr 21, 2026
… providers

- Use FunctionCallingConfigMode.VALIDATED instead of ANY when allowed_tools
  is set with auto mode in Gemini, preserving optional tool-call semantics.
- Handle allowed_tools in required mode with required_function_name precedence.
- Fix allowed_names guard to use identity check (is not None) so empty lists
  are preserved.
- Bump google-genai minimum to >=1.32.0 (VALIDATED added in that version).
- Add warnings in Anthropic and Bedrock when allowed_tools is set but not
  supported.
- Add Gemini unit tests for allowed_tools with auto, required, empty list,
  and required_function_name precedence scenarios.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Feature]: Support OpenAI allowed_tools

5 participants