Fix responses#4152
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves OpenAI Responses API handling by expanding request parsing support and hardening error handling in the LLM serving path.
Changes:
- Add parsing of Responses
function_call/function_call_outputinput items into GenAIChatHistory, and normalizedeveloperrole tosystem. - Add additional exception handling around request parsing / generation-config parsing and response generation.
- Update the continuous batching demo’s Responses snippet to use a different example model name.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/llm/apis/openai_responses.cpp |
Extends Responses input parsing (tool call items, developer role mapping) and enriches the Python-template messages serialization with tool-call fields. |
src/llm/servable.cpp |
Wraps parsing/config steps in try/catch and tweaks Responses chat-template application behavior/logging. |
src/llm/http_llm_calculator.cc |
Adds std::exception catch to route unexpected errors through existing error handling. |
src/test/http_openai_handler_test.cpp |
Adds unit tests for Responses tool-call items in chat history and developer-role normalization behavior. |
demos/continuous_batching/README.md |
Updates Responses streaming example model identifier. |
| EXPECT_EQ(apiHandler->getToolChoice(), "none"); | ||
| } | ||
|
|
||
| TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) { |
There was a problem hiding this comment.
This TEST_F is indented relative to surrounding tests (e.g., TEST_F at line 2722 starts at column 1). Please align it to the file’s existing style to avoid clang-format/cpplint noise.
| TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) { | |
| TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) { |
| #endif | ||
| } | ||
|
|
||
| TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesDeveloperRoleMappedToSystem) { |
There was a problem hiding this comment.
This TEST_F is indented compared to the surrounding tests in this file. Please align it with the existing style (macros starting at column 1) to keep formatting consistent.
| TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesDeveloperRoleMappedToSystem) { | |
| TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesDeveloperRoleMappedToSystem) { |
| TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesConsecutiveDeveloperMessagesAreMerged) { | ||
| std::string json = R"({ |
There was a problem hiding this comment.
The test name ParsingResponsesConsecutiveDeveloperMessagesAreMerged contradicts the assertions/comment below (it verifies messages are not merged). Please rename the test (and/or adjust the comment) so the intent matches the behavior being asserted.
| messagesArray.PushBack(msgObj, alloc); | ||
| } | ||
| processedDoc.AddMember("messages", messagesArray, alloc); |
There was a problem hiding this comment.
The Responses processedJson is constructed here to provide a {"messages": [...]} payload for the Python chat-template path, but parseTools() is invoked later in parseResponsesPart() and can overwrite request.processedJson with the original doc (Responses-shaped JSON) whenever it mutates tools/tool_choice. That would make GenAiServable::prepareInputs pass a non-empty but incompatible JSON to applyChatTemplate, breaking Responses requests that include tools/tool_choice. Consider rebuilding processedJson after parseTools() using the filtered/updated tools/tool_choice (or preventing parseTools() from overwriting processedJson for Responses) so processedJson always preserves the messages array shape.
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500)); | ||
| success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText); | ||
| } else { | ||
| SPDLOG_LOGGER_WARN(llm_calculator_logger, "Responses processedJson is empty! Falling back to original body"); | ||
| success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, executionContext->payload.body, inputText); |
There was a problem hiding this comment.
processedJson.substr(0, 500) is evaluated unconditionally, even when DEBUG logging is disabled, and it may log user-provided content/tool arguments into logs. Consider removing the content dump (or guarding it behind a should_log(debug) check and logging only size/metadata), and treat an empty Responses processedJson as an internal error rather than silently falling back to the raw request body (which is a different schema and can hide upstream issues).
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500)); | |
| success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText); | |
| } else { | |
| SPDLOG_LOGGER_WARN(llm_calculator_logger, "Responses processedJson is empty! Falling back to original body"); | |
| success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, executionContext->payload.body, inputText); | |
| if (llm_calculator_logger->should_log(spdlog::level::debug)) { | |
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson of size: {}", processedJson.size()); | |
| } | |
| success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText); | |
| } else { | |
| SPDLOG_LOGGER_ERROR(llm_calculator_logger, "Responses processedJson is empty"); | |
| return absl::InternalError("Responses processedJson is empty"); |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/llm/apis/openai_responses.cpp:26
- The file uses std::stringstream (joinServerSideEvents) but does not include . Please add the missing standard header to avoid relying on transitive includes (include-what-you-use).
#include <cmath>
#include <limits>
#include <map>
#include <memory>
#include <stdexcept>
#include <set>
#include <string>
#include <string.h>
| EXPECT_EQ(apiHandler->getToolChoice(), "none"); | ||
| } | ||
|
|
||
| TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) { |
There was a problem hiding this comment.
Indentation is inconsistent with surrounding tests (extra leading spaces before TEST_F here and in the following new tests). This can trip cpplint/clang-format and makes the file harder to scan; please align to the existing column-0 TEST_F(...) style used throughout the file.
| TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) { | |
| TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) { |
| } | ||
| imageHistory.push_back({messageIndex, tensorResult.value()}); | ||
| } else { | ||
| return absl::InvalidArgumentError("Unsupported content type. Supported types are input_text, output_text, text and input_image."); |
There was a problem hiding this comment.
The error message for unsupported content types omits "refusal", even though this function explicitly supports type == "refusal" above. Please update the message to list all supported types (or make it generic) so clients aren’t misled.
| return absl::InvalidArgumentError("Unsupported content type. Supported types are input_text, output_text, text and input_image."); | |
| return absl::InvalidArgumentError("Unsupported content type. Supported types are input_text, output_text, text, refusal and input_image."); |
| if (processedJson.size() > 0) { | ||
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500)); | ||
| success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText); | ||
| } else { | ||
| SPDLOG_LOGGER_WARN(llm_calculator_logger, "Responses processedJson is empty! Falling back to original body"); | ||
| success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, executionContext->payload.body, inputText); | ||
| } |
There was a problem hiding this comment.
processedJson.substr(0, 500) is evaluated unconditionally, which allocates/copies on every request even when debug logging is disabled. Also, for Responses with Python enabled, processedJson is always built (non-empty) in OpenAIResponsesHandler, so the empty-check/fallback branch is dead code. Please guard the substring/log with should_log(debug) (or equivalent) and simplify/remove the unreachable fallback.
| if (processedJson.size() > 0) { | |
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500)); | |
| success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText); | |
| } else { | |
| SPDLOG_LOGGER_WARN(llm_calculator_logger, "Responses processedJson is empty! Falling back to original body"); | |
| success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, executionContext->payload.body, inputText); | |
| } | |
| if (llm_calculator_logger->should_log(spdlog::level::debug)) { | |
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500)); | |
| } | |
| success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText); |
🛠 Summary
JIRA/Issue if applicable.
Describe the changes.
🧪 Checklist
``