Python: Support OpenAI and Gemini allowed_tools tool choice#5322
Python: Support OpenAI and Gemini allowed_tools tool choice#5322giles17 wants to merge 6 commits intomicrosoft:mainfrom
allowed_tools tool choice#5322Conversation
Add allowed_tools field to ToolMode TypedDict, enabling users to restrict which tools the model may call via the OpenAI allowed_tools tool_choice type. This preserves prompt caching by keeping all tools in the tools list while limiting which ones the model can invoke. - Add allowed_tools: list[str] to ToolMode TypedDict - Add validation in validate_tool_mode() (only valid when mode == "auto") - Convert to OpenAI API format in _prepare_options() - Add tests for validation and API payload generation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
giles17
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 93%
✓ Correctness
This PR contains only cosmetic/formatting changes: a single blank line added after the ToolMode class in _types.py, and several multi-line expressions collapsed into single lines in test_hyperlight_codeact.py. There are no logic changes and no correctness issues. The allowed_tools feature referenced in the issue context is already fully implemented in the codebase (ToolMode TypedDict, validate_tool_mode, and OpenAI client conversion).
✓ Security Reliability
This PR contains only cosmetic changes: an extra blank line added in _types.py (line 3157) and reformatting of multi-line expressions into single lines in test_hyperlight_codeact.py. There are no functional, security, or reliability changes. The allowed_tools field referenced in context lines already existed prior to this diff.
✓ Test Coverage
The PR adds
allowed_toolssupport toToolModewith good test coverage for the core validation and OpenAI Responses API client conversion. Tests cover valid single/multiple tools, invalid mode combinations, and regression for plain auto mode. Two test coverage gaps are notable: (1) no test for an emptyallowed_toolslist ([]), which passes validation and produces a likely-invalid API payload{"type": "allowed_tools", "tools": []}, and (2) the Chat Completions client (_chat_completion_client.pyline 665-666) silently dropsallowed_toolsby falling through torun_options["tool_choice"] = mode(i.e., just"auto"), but there is no test documenting this behavior or warning the user.
✗ Design Approach
The diff itself is trivial — a blank line added to _types.py and cosmetic test reformatting. No logic is changed. However, the
allowed_toolsfield that this PR exposes inToolModeis not fully wired up:_chat_completion_client.py(lines 655–666) never checks forallowed_toolsand silently falls through to emitting plaintool_choice: "auto", making the feature a no-op for users of that client. The_chat_client.py(lines 1218–1224) handles it correctly, creating an inconsistency between the two clients.
Flagged Issues
-
_chat_completion_client.py_prepare_options(lines 655–666): theallowed_toolsbranch is missing. Whenmode == "auto"andallowed_toolsis set, the code falls through torun_options["tool_choice"] = mode, silently discarding the list and emitting plain"auto"._chat_client.pylines 1218–1224 show the correct pattern to mirror. Without this fix the feature is non-functional for the Chat Completions client.
Suggestions
- Add a test for
validate_tool_mode({"mode": "auto", "allowed_tools": []})— an empty list passes validation today but would produce{"type": "allowed_tools", "tools": []}at the API level. Consider whether validation should reject it, and add a test either way to document the expected behavior. - Add a test in
test_openai_chat_completion_client.pycoveringtool_choice={"mode": "auto", "allowed_tools": ["fn"]}to lock in the expected API payload (or to document thatallowed_toolsis silently dropped), analogous totest_prepare_options_allowed_toolsintest_openai_chat_client.py. Ifallowed_toolsis intentionally unsupported in the Chat Completions client, consider raising a warning so users don't silently lose the restriction.
Automated review by giles17's agents
There was a problem hiding this comment.
Pull request overview
Adds Python SDK support for OpenAI/Azure OpenAI tool_choice.type="allowed_tools" so callers can restrict tool invocation without removing tools from the prompt/tool list.
Changes:
- Extend core
ToolModeto include optionalallowed_tools(only valid withmode="auto") and update validation. - Update OpenAI chat client option preparation to translate
allowed_toolsinto the OpenAI wire format, with accompanying unit tests. - Adjust samples to suppress pyright optional-dependency import errors for
orjson.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| python/packages/core/agent_framework/_types.py | Adds allowed_tools to ToolMode and extends validate_tool_mode constraints. |
| python/packages/core/tests/core/test_types.py | Adds unit tests for ToolMode.allowed_tools and validation behavior. |
| python/packages/openai/agent_framework_openai/_chat_client.py | Maps ToolMode.allowed_tools into OpenAI tool_choice “allowed_tools” payload. |
| python/packages/openai/tests/openai/test_openai_chat_client.py | Adds tests ensuring _prepare_options emits correct OpenAI tool_choice format. |
| python/samples/02-agents/conversations/file_history_provider.py | Adds pyright ignore for optional orjson import. |
| python/samples/02-agents/conversations/file_history_provider_conversation_persistence.py | Adds pyright ignore for optional orjson import. |
| python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py | Minor test formatting adjustments. |
…ions client support - validate_tool_mode now checks allowed_tools is a non-string sequence of strings and normalizes to list[str], raising ContentError for invalid types - Add missing allowed_tools branch in _chat_completion_client._prepare_options so allowed_tools is emitted as the OpenAI allowed_tools wire format instead of being silently dropped - Add tests for invalid allowed_tools types (string, int, mixed), empty list, tuple normalization, and Chat Completions client payload generation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
giles17
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 91%
✓ Correctness
The diff adds support for
allowed_toolsinToolMode, following the same pattern as the existingrequired_function_namefield. The validation logic in_types.pycorrectly checks type constraints (non-string sequence of strings), normalizes tuples to lists, and gates the field tomode == 'auto'. Both the Chat Completion client and the Responses API client correctly convert the validatedallowed_toolsinto the OpenAI API format. The walrus operator chain in the client's if/elif branches is correct —modeis assigned even when the first condition short-circuits. Tests cover the key cases including invalid types, empty lists, tuple normalization, and single/multiple tool names. No correctness issues found.
✓ Security Reliability
The implementation is clean and follows the established patterns for ToolMode validation and client conversion. Input validation is thorough (type-checking allowed_tools as a non-string sequence of strings), and the conversion to OpenAI API format is correct. The validation function properly prevents conflicting fields (e.g., both required_function_name and allowed_tools). No security or reliability issues found.
✓ Test Coverage
The new
allowed_toolsfeature has solid test coverage for validation logic (type checks, normalization, invalid mode combinations) and basic client payload generation (single and multiple tools). Two minor gaps: (1) no client-level test for an emptyallowed_toolslist, which the validation explicitly permits and would produce"tools": []in the payload; (2) no regression test verifying that{"mode": "auto"}withoutallowed_toolsstill falls through to producetool_choice = "auto"(though this is indirectly covered by the existing parametrized test at line 1627). Overall the coverage is good and assertions are meaningful—each test verifies specific structural properties of the output rather than just asserting no exception.
✓ Design Approach
The PR adds
allowed_toolssupport toToolModefollowing the same pattern asrequired_function_name: extend the TypedDict, validate centrally invalidate_tool_mode, and convert to provider-specific API format in the client. The implementation is consistent with the existing framework design at every layer. No fundamental design problems found. One minor observation: whenallowed_toolsis already alist(the common case),validate_tool_modereturns the original dict object unchanged (finalreturn tool_choice), while a tuple input returns a newly-constructed dict — this asymetry is harmless and matches the existing behavior forrequired_function_name, but worth being aware of. There are no missing cases in validation logic and theis not Noneguard in the client correctly passes empty lists through to the API.
Suggestions
- Add a client-level test for empty
allowed_toolslist (e.g.,{"mode": "auto", "allowed_tools": []}) to verify_prepare_optionsproduces{"type": "allowed_tools", "mode": "auto", "tools": []}rather than falling through torun_options["tool_choice"] = mode. Validation tests confirm the empty list is accepted, but no client test exercises the resulting payload shape. - Consider adding a regression test verifying that
{"mode": "auto"}(withoutallowed_tools) still producestool_choice = "auto"through_prepare_options, since the newelifbranch could theoretically interfere if the walrus operator condition were wrong. The existing parametrized test at line 1627 covers"auto"as a string but not{"mode": "auto"}as a dict withoutallowed_tools.
Automated review by giles17's agents
|
Thanks for looking into this, just to add that https://github.com/openai/openai-python/blob/main/src/openai/types/responses/tool_choice_allowed.py |
allowed_tools tool choice in Python SDKallowed_tools tool choice
OpenAI's allowed_tools tool_choice type supports both mode 'auto' and 'required'. Update validation, client conversion, and tests to allow both modes instead of restricting to 'auto' only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
eavanvalkenburg
left a comment
There was a problem hiding this comment.
some additional work tbd, but removing the request changes
giles17
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 89%
✗ Correctness
The PR adds
allowed_toolssupport to the coreToolModeTypedDict and implements provider-specific handling across OpenAI, Gemini, Anthropic, Bedrock, and Ollama. The core validation logic is thorough (mutual exclusion withrequired_function_name, mode gating, type checks, normalization). The OpenAI clients correctly map to theallowed_toolsAPI payload. Provider warnings are properly placed. There is one correctness bug in the Gemini client: the newallowed_toolsoverride block (lines 838-840) can setallowed_namesto an empty list[], which the pre-existing truthiness check on line 843 (if allowed_names:) treats as falsy, causing the empty list to be silently dropped while mode is still set toANY. This meansallowed_tools: []on Gemini produces 'model must call at least one function from the full set' — the semantic opposite of the user's intent ('no tools callable'). The core validation and tests explicitly support empty lists as valid.
✗ Security Reliability
The
allowed_toolsfeature is well-implemented overall with solid input validation in the core layer. One reliability issue: the Gemini client's interaction between newallowed_toolscode and the existing truthiness check onallowed_namescauses incorrect behavior whenallowed_toolsis an empty list — the mode is silently changed toANY(requiring a function call) without restricting to any names, producing the opposite of the caller's intent. The empty list is explicitly allowed by the validation layer (tested intest_types.pyline 1178) and handled correctly by both OpenAI clients, making this a Gemini-specific regression. All other provider changes (warnings in Anthropic/Bedrock/Ollama, OpenAI conversion logic, core validation) look correct and well-tested.
✓ Test Coverage
Test coverage for the
allowed_toolsfeature is generally solid: core validation has thorough tests (happy paths, edge cases, type coercion, mutual exclusion), both OpenAI clients have good coverage, and all unsupported providers have warning tests. There is one gap worth noting: the Gemini client silently degradesallowed_tools: [](empty list) intoANYmode with no function name filtering (equivalent to 'required' — call any tool), because theif allowed_names:guard on line 843 is falsy for[]. This is a behavioral inconsistency with the OpenAI clients which faithfully pass through empty lists. A test for this edge case would help document the intended behavior. The Responses API client tests (test_openai_chat_client.py) also lack an emptyallowed_toolstest, unlike the Chat Completions client tests which do have one.
✗ Design Approach
The implementation is largely well-structured and consistent with the framework's provider-extension pattern. One genuine design issue: the Gemini client silently promotes
mode='auto'toANY(i.e., required/forced tool use) whenallowed_toolsis present, because Gemini's API only supportsallowedFunctionNameswithANYmode. This is a silent semantic change — the user requested optional tool use but gets mandatory tool use — with no warning to the caller. All other providers either honour the mode value as-is (OpenAI, Azure) or log a warning that the feature is unsupported (Anthropic, Bedrock, Ollama). The Gemini case is worse than unsupported: it partially honours the feature but changes the contract without telling the user.
Automated review by giles17's agents
| # allowed_tools overrides: Gemini requires ANY mode for allowedFunctionNames | ||
| if "allowed_tools" in tool_mode: | ||
| allowed_names = list(tool_mode["allowed_tools"]) | ||
| function_calling_mode = types.FunctionCallingConfigMode.ANY |
There was a problem hiding this comment.
mode='auto' is silently promoted to ANY (forced tool call) here because Gemini's API only accepts allowedFunctionNames with ANY mode. This contradicts the user's intent: auto means the model may optionally call tools, while ANY means it must. Every other provider either honours the requested mode or emits a warning. This should log a warning so callers are not surprised by the changed behaviour.
| function_calling_mode = types.FunctionCallingConfigMode.ANY | |
| allowed_names = list(tool_mode["allowed_tools"]) | |
| if tool_mode.get("mode") == "auto": | |
| logger.warning( | |
| "Gemini does not support allowedFunctionNames with AUTO mode; " | |
| "promoting to ANY (required) mode to honour allowed_tools" | |
| ) | |
| function_calling_mode = types.FunctionCallingConfigMode.ANY |
There was a problem hiding this comment.
Changing to Any is still a issue, because Any basically maps to required, see https://ai.google.dev/gemini-api/docs/function-calling?example=meeting#function_calling_modes and I think really for Gemini, allowed_tools is not the way, it is required_function_name that should be combined with Any and there is not real equivalence, except maybe Validated, we could look at that
There was a problem hiding this comment.
I did notice this and I agree. However, VALIDATED was added to google-genai in v1.32.0, and we currently pin >=1.0.0. So this would require bumping the minimum to >=1.32.0 in pyproject.toml. Is that okay?
There was a problem hiding this comment.
yeah, let's go for that, thanks
allowed_tools tool choiceallowed_tools tool choice
f7ca2fd to
0a28079
Compare
… providers - Use FunctionCallingConfigMode.VALIDATED instead of ANY when allowed_tools is set with auto mode in Gemini, preserving optional tool-call semantics. - Handle allowed_tools in required mode with required_function_name precedence. - Fix allowed_names guard to use identity check (is not None) so empty lists are preserved. - Bump google-genai minimum to >=1.32.0 (VALIDATED added in that version). - Add warnings in Anthropic and Bedrock when allowed_tools is set but not supported. - Add Gemini unit tests for allowed_tools with auto, required, empty list, and required_function_name precedence scenarios. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation and Context
OpenAI and Azure OpenAI support an
allowed_toolstool choice type that lets calers restrict which tools the model may invoke without removing tools from the prompt, preserving prompt caching benefits. The Agent Framework had no way to express this constraint.Fixes #5309
Description
The
ToolModeTypedDict gains an optionalallowed_tools: list[str]field, validated to only be used withmode="auto". The OpenAI chat client's_prepare_optionstranslates this into the wire format ({"type": "allowed_tools", "mode": "auto", "tools": [...]}) expected by the OpenAI API. Additionally,finish_reasonis now propagated throughAgentResponseandAgentResponseUpdateso calers can inspect why the model stopped generating, and Pydantic-based tool models (used by providers like Gemini) are properly serialized in_tools_to_dict.Contribution Checklist