Fix: Vertex streaming + native tool calling leading to blank responses#168
Fix: Vertex streaming + native tool calling leading to blank responses#168ctroche99 wants to merge 17 commits into
Conversation
When streaming is enabled with native tool calling, the Gemini API returns function_call parts that were previously silently ignored, causing blank responses. This fix adds a tool call execution loop inside _handle_streaming_response that: - Detects function_call parts in each streaming round - Executes the corresponding OpenWebUI tool callables (sync and async) - Appends the function responses back into the conversation - Makes a follow-up streaming call so the model can answer with results - Loops up to 10 times to support multi-step tool use - Emits status events so the UI shows which tools are running The call site in pipe() now passes client, model_id, contents, and generation_config so the handler has everything it needs for follow-up requests. https://claude.ai/code/session_01EvKKnagp23fb6fi92HaHBo
The non-streaming path had the same bug as streaming: function_call parts returned by Gemini were silently ignored, causing blank or tool-unaware responses (issue owndev#155). The fix wraps the generate_content call in a tool call loop that: - Detects function_call parts in each response round - Executes the corresponding OpenWebUI tool callables (sync and async) - Appends function responses back into the conversation contents - Makes follow-up generate_content calls so the model can use results - Loops up to 10 times to support multi-step tool use - Emits status events so the UI shows which tools are running Text, thoughts, and image parts accumulate across all rounds so the final response includes output from every step. Image generation models are excluded from the tool call loop since they don't use function calling. https://claude.ai/code/session_01EvKKnagp23fb6fi92HaHBo
Two improvements based on OpenWebUI plugin documentation: 1. Replace asyncio.iscoroutinefunction() with the call-first / inspect.isawaitable() pattern. OpenWebUI wraps tool callables with functools.partial and similar constructs that fool iscoroutinefunction, returning False even when the underlying function is async. Calling first and then awaiting the result if isawaitable() handles all cases correctly. 2. Emit a <details type="tool_calls"> block into the response after each tool call round. This matches the exact shape OpenWebUI itself renders for native tool calls, giving users a collapsible history of which tools ran and what they returned. Applied to both streaming and non-streaming paths. import inspect added at module level. https://claude.ai/code/session_01EvKKnagp23fb6fi92HaHBo
1. Remove redundant 'Running' status event — 'Calling' already fires immediately when the function_call part hits the stream; emitting 'Running' milliseconds later in the execution loop is noise. 2. Separate tool call blocks from answer_chunks so grounding citation processing only receives real model text. Previously the <details> HTML block was in answer_chunks and got passed to _process_grounding_metadata, which would insert citation links into the HTML markup and produce malformed output when grounding and native tool calling were both enabled. tool_call_blocks is now a separate accumulator; the two are joined after grounding runs. 3. Final content assembly now correctly orders: thought block → tool call details → grounded answer text. https://claude.ai/code/session_01EvKKnagp23fb6fi92HaHBo
Header injection (sanitize_header_value):
The regex already removed \x00-\x1F which includes \r and \n, but
adding them explicitly as \r\n makes the intent unambiguous and guards
against edge cases in regex engine interpretation.
Grounding citation byte bounds (grounding_metadata_list processing):
Added clamping of end_index to [last_byte_index, text_len] and used
errors='replace' on decode. Malformed segment offsets from the API
(e.g. due to encoding mismatch on unicode text) would previously cause
silent truncation or IndexError.
Tool loop MAX_TOOL_ITERATIONS warning (both paths):
The loop was silently breaking at the limit. Now logs a warning so
runaway agent loops are visible in server logs.
Empty function_response_parts guard (both paths):
If all tool calls fail to resolve (not in __tools__), we still built
error-result response parts — so in practice this list is never empty.
Added an explicit guard and break anyway to prevent pushing an empty
Content object to the Gemini API, which would cause an API error.
None tool result normalisation (both paths):
Tool callables returning None were producing the literal string "None"
via str(). Now normalised to "" before wrapping in FunctionResponse so
the model receives clean empty text rather than the word "None".
Tool call blocks separated from answer text (non-streaming path):
Same fix previously applied to streaming: <details type="tool_calls">
HTML is now accumulated in tool_call_blocks_ns and combined with the
grounded answer text after _process_grounding_metadata runs, preventing
citation links from being injected into HTML markup.
Status label consistency (non-streaming path):
"Running: {tool_name}" renamed to "Calling: {tool_name}" to match the
streaming path label.
https://claude.ai/code/session_01EvKKnagp23fb6fi92HaHBo
1. Missing newline between thought <details> block and answer text in both streaming (f-string join) and non-streaming (concatenation) paths. Content was running directly adjacent, breaking visual formatting and potentially confusing HTML/markdown parsers. 2. Narrow the exception catch in EncryptedStr.decrypt() to re-raise fatal exceptions (KeyboardInterrupt, SystemExit, MemoryError) instead of swallowing them. The broad `except Exception` was silently catching OOM and interrupt signals, masking serious runtime failures. https://claude.ai/code/session_01EvKKnagp23fb6fi92HaHBo
Two real fixes for native tool calling and one observability change:
1. Tool name normalization (the actual bug)
Gemini sometimes emits function_call.name as "default_api:foo" or
"default_api.foo" instead of the plain "foo" registered via the Python
callable. Our __tools__ lookup keyed on the plain name failed, we sent
back an error FunctionResponse, and the model gave up — producing the
"model thinks about tool but no output" symptom. Strip the prefix when
matching against __tools__, but keep the raw name in the FunctionResponse
so Gemini can pair it with the original FunctionCall.
2. function_calling detection across metadata paths
params.get("function_calling") read from __metadata__["params"], but
OpenWebUI stores model-level params at __metadata__["model"]["params"]
in current versions. Check both, and treat __tools__ being non-empty as
implicit native mode unless function_calling is explicitly "default".
3. INFO-level logging at every decision point
__tools__ registration, function_call detection, name normalization,
tool execution start/result/error, follow-up stream kickoff, and final
answer assembly. Both streaming and non-streaming paths instrumented so
the server terminal shows exactly where the loop breaks if it does.
Root cause of blank output confirmed from server logs: 1. Disable Automatic Function Calling (AFC) The google-genai SDK has AFC enabled by default when Python callables are passed as tools. AFC intercepts function_call parts and executes them internally before our streaming loop ever sees them. This breaks our tool loop (function_calls=0 always) and — critically — passes raw Python return values including OpenWebUI's (HTMLResponse, str) tuples directly to Gemini, which cannot deserialise them, producing blank output. Fix: set automatic_function_calling=AutomaticFunctionCallingConfig(disable=True) in the GenerateContentConfig whenever tools are present. 2. Handle (HTMLResponse, str) tuple return type (both streaming and non-streaming) OpenWebUI tools that return rich UI content yield a tuple where element[0] is an HTMLResponse for the UI renderer and element[1] is the plain-text string for the model. Extract the str component when the tool returns a tuple so Gemini receives clean, serialisable text.
|
Hello, Let me first start off by saying thank you for building this project! I am restricted to a VertexAI account in my environment and this PIPE function allowed me to set it up in OpenWebUI. I quickly ran into the issue reported in #155 and then the issue also seen in #135 . Considering OpenWebUI now considers default tool calling deprecated and "native" is required, I felt this needed to be addressed. I setup server side logging for your tool calling functions and then, based on those, learned about AFC. From there it was easy to have Claude re-write the functions with AFC disabled and handle the process via the Pipe. In my testing, it appears to have worked. Of course, I welcome your own testing and verification. I wanted to leave this comment to let you know I am not just pushing bling Claude PRs to your project. If you prefer non-AI coded PRs, that is fine, go ahead and close this out, but I did want to pass the fix along in case you found it sufficient. |
When our pipe calls a tool callable, the tool fires its own events via its already-bound __event_emitter__ (OWUI injects it via functools.partial before passing callables to the pipe). This renders rich HTML widgets in the chat UI. Emitting `replace` at the end of the streaming loop overwrites the entire message content — including those HTML widgets — with our plain-text final_content, making the widgets disappear. Fix: skip the `replace` event when tool_call_blocks is non-empty (i.e., tools ran this turn). The streamed chat:message:delta events and the final yield already deliver the text response correctly. The HTML widgets from the tools persist alongside the text. Exception: still emit `replace` when grounding is present, since citation marker injection requires replacing the answer text in-place.
Previous fix only skipped the `replace` event when tool_call_blocks was non-empty, but with AFC handling tool execution invisibly to our loop, tool_call_blocks stays at 0 even when a tool fired its HTML widget via its bound __event_emitter__. As a result, `replace`, `chat:message`, `chat:finish`, and the final yield all still override the message body with text-only content — wiping the HTML widget the tool rendered. Switch the gating signal from `tool_call_blocks` (our loop's view) to `__tools__` (what was available to the request, regardless of who executed them). When tools were available this turn, emit completion signals without content overrides and yield empty string so the tool-emitted HTML widget remains the canonical message body. Grounding is still the override exception: it must inject citation markers into the live text.
…vailable Earlier attempts emitted chat:message and chat:finish with done=True (no content) when tools were available, on the theory that omitting `content` would be enough to preserve tool-emitted HTML widgets. It wasn't: the done=True flag itself appears to trigger OWUI to re-render the assistant message in "completed" state using the persisted text-only body, dropping the live-rendered HTML widget. Remove the chat:message, chat:finish, and final yield entirely when tools were available. The async generator completion is enough signal for OWUI's middleware. Grounding remains an exception — citation injection requires the replace event regardless.
…s were available" This reverts commit c86369d.
This reverts commit bf340a8.
…s ran" This reverts commit 93943d2.
|
@ctroche99 thanks for this! Have been having this blank response issue, which is quite annoying. Had a quick look: how do you handle the thought signature that Gemini 3 models need to? Without sending it back, things will crash. Does your solution handle parallel tool calls? |
The pipe drives its own native function-calling loop, which bypasses OWUI's middleware — so the 'embeds' and 'files' events the middleware would normally fire for HTMLResponse cards and base64 image returns were being dropped, leaving rich tool UI live-only and unpersisted. Mirror OWUI middleware's process_tool_result locally: - (HTMLResponse, ctx) tuples -> emit 'embeds' event, send ctx to LLM - bare HTMLResponse -> emit 'embeds' event, send generic ack to LLM - 'data:image/...' strings -> emit 'files' event - dict/list -> JSON-serialise for LLM - generic tuple / str / None -> sensible coercion Both the streaming and non-streaming tool execution sites now route through the shared helper.
…ni 3 Gemini 3 and Gemini 2.5 thinking models attach a thought_signature to each thought Part. When building the multi-turn history for a tool-call round-trip, the entire model turn — thoughts AND function_calls — must be echoed back together. Previously only the function_call parts were included, causing the API to reject follow-up requests on thinking models. Collect thought Parts into thought_parts_this_round alongside the existing function_call_parts_this_round, then prepend them when constructing the model Content. Applies to both streaming and non-streaming tool call paths.
Very welcome! Thought signatures: good catch, I forgot about that requirement; now fixed in 16e3641. I was only including function_call parts in the model turn when building tool-call history, dropping the thought Parts entirely. To fix it I collect thought Parts into thought_parts_this_round alongside the existing function_call_parts_this_round and prepend them when constructing the model Content, so the entire turn — thoughts (with signatures) + function calls — gets echoed back. Applied to both streaming and non-streaming paths. Parallel tool calls — yes, fully handled. function_call_parts_this_round accumulates every function_call Part the model emits across a single stream round; we iterate them in one loop, execute each, build one FunctionResponse per call, and batch all responses into a single user Content before the follow-up request. Heads up — two meaningful changes since the PR was opened that you may want to pull: 51f9fcd — emit OWUI embeds/files events for rich tool returns. Because the pipe drives its own tool-calling loop, it bypasses OWUI's middleware — so tools that return (HTMLResponse, str) tuples or data:image/... base64 strings never get their persisted events fired. Cards would render live then vanish on reload. Added a _process_tool_result_for_owui helper that mirrors what OWUI's middleware process_tool_result does locally: detects HTMLResponse (tuple or bare), image data URLs, dicts/lists, etc., emits {"type": "embeds", "data": {"embeds": [...]}} or {"type": "files", ...}, and returns the LLM-visible string. Rich-UI tools now persist correctly across reloads. This was a bit of a rabbit hole but ultimately worked out very well in the end. I had a tool with a HTML injection that kept wiping the HTML but leaving the context injection. Turns out OWUI uses this 'embed' render tag for non text displays. Updated to catch common ones. Although, I did skip the MCP and external-tool-tuple branches because I had already gone overboard lol. 16e3641 — Gemini 3 thought-signature round-tripping (described above). The other 6 commits in between were widget-positioning experiments that got reverted — net delta since the PR is just those two fixes. |
…ing features
Exposes four individually-toggleable OWUI feature flags for Gemini's
built-in server-side tools, alongside two new Valves for Maps widget
rendering.
New OWUI feature flags (set on model in Settings → Models → Features):
google_search — Google Search grounding only (was bundled in google_search_tool)
url_context — URL Context grounding only (was bundled in google_search_tool)
code_execution — model writes + runs Python server-side
google_maps — Google Maps grounding with optional lat/lng location context
Legacy google_search_tool flag is preserved unchanged (still enables both
Search and URL Context together for backward compatibility).
New Valves:
GOOGLE_MAPS_ENABLE_WIDGET — request google_maps_widget_context_token
GOOGLE_MAPS_API_KEY — Maps Platform key for rendering the Places widget
Response handling:
- executable_code parts rendered as fenced code blocks in stream and
non-stream paths
- code_execution_result parts rendered as output blocks
- chunk.maps grounding chunks added to _format_grounding_chunks_as_sources
- google_maps_widget_context_token emitted as persisted "embeds" event
when GOOGLE_MAPS_API_KEY is set
Maps lat/lng location context passed via body/params latitude+longitude
fields into tool_config.retrieval_config.lat_lng.
Root Cause & Fix: Native Tool Calling Producing Blank Output
After adding
INFO-level logging at every decision point in a live OpenWebUI instance, the blank-output bug with native tool calling + streaming traces to three distinct issues. The first is the dominant one and the reason every prior attempt at a fix looked correct on paper but failed at runtime.1. The Google Genai SDK's Automatic Function Calling (AFC) was silently competing with the pipe's tool loop
The
google-genaiSDK enables Automatic Function Calling (AFC) by default whenever you pass Python callables (rather thantypes.FunctionDeclarationobjects) astoolsinGenerateContentConfig. AFC:function_callparts before they reach the caller's iteratorFunctionResponseThis means the pipe's custom tool-execution loop never fired. The diagnostic logs proved it:
function_calls=0on every stream round, yet server logs showed the underlying tool APIs being hit. AFC was doing the work invisibly. You can spot it in the SDK's own log line:AFC and the pipe's tool loop are doing exactly the same job — they're two competing implementations of the same round-trip. AFC wins the race because it runs inside the SDK. The fix is to explicitly disable AFC so the pipe's loop owns the execution:
This isn't a loss of functionality. AFC is a convenience feature for simple scripts that don't need custom serialization, error handling, or UI rendering. The pipe's loop replaces it with a more capable implementation that handles OWUI-specific requirements.
2. OpenWebUI tools can return
(Response, str)tuples that AFC cannot serialize for GeminiOWUI's tool convention allows callables to return a tuple where element
[0]is aResponseobject for the UI renderer and element[1]is the plain-text string for the model's context. When AFC executed these tools, it tried to pass the raw Python tuple — including the response object — into theFunctionResponseit sends back to Gemini. Gemini cannot deserialize that, so the model produced empty output.This is why the symptom was "the tool runs (visible in server logs) but the chat shows nothing." The tool executed successfully; the serialization of its result to Gemini was what broke.
Fix in the manual tool loop — extract the
strcomponent when a tool returns a tuple:3. Gemini sometimes namespaces tool names as
default_api:fooOnce AFC was disabled and the raw
function_callparts were visible, a secondary issue appeared. Gemini occasionally emitsfunction_call.nameas"default_api:tool_name"or"default_api.tool_name"rather than the plain registered name. Theif tool_name in __tools__lookup failed silently and we sent back an errorFunctionResponse, after which the model gave up producing output.Fix: strip the prefix for the lookup, but preserve the original name in the
FunctionResponseso Gemini can pair it with the originalFunctionCall:Why earlier fixes didn't work
Several prior PRs correctly built a multi-round tool-execution loop over
generate_content_streamwithfunction_calldetection, tool execution, follow-up streams, and<details type="tool_calls">rendering. The code was structurally sound. But because AFC consumed thefunction_callparts before they reached the loop, the loop body was unreachable at runtime. And because AFC passed unserializable return values to Gemini, the final output was empty rather than an error. The bug was invisible without per-step logging.TL;DR
The pipe's tool loop was correct in structure. AFC was just running first and producing unserializable payloads. Disabling AFC, extracting the
strfrom OWUI tuple returns, and normalizing thedefault_api:prefix produces the expected end-to-end behavior in both streaming and non-streaming paths.