Skip to content

Fix responses#4152

Open
michalkulakowski wants to merge 3 commits intomainfrom
mkulakow/responses_fix
Open

Fix responses#4152
michalkulakowski wants to merge 3 commits intomainfrom
mkulakow/responses_fix

Conversation

@michalkulakowski
Copy link
Copy Markdown
Collaborator

🛠 Summary

JIRA/Issue if applicable.
Describe the changes.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copilot AI review requested due to automatic review settings April 22, 2026 09:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves OpenAI Responses API handling by expanding request parsing support and hardening error handling in the LLM serving path.

Changes:

  • Add parsing of Responses function_call / function_call_output input items into GenAI ChatHistory, and normalize developer role to system.
  • Add additional exception handling around request parsing / generation-config parsing and response generation.
  • Update the continuous batching demo’s Responses snippet to use a different example model name.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/llm/apis/openai_responses.cpp Extends Responses input parsing (tool call items, developer role mapping) and enriches the Python-template messages serialization with tool-call fields.
src/llm/servable.cpp Wraps parsing/config steps in try/catch and tweaks Responses chat-template application behavior/logging.
src/llm/http_llm_calculator.cc Adds std::exception catch to route unexpected errors through existing error handling.
src/test/http_openai_handler_test.cpp Adds unit tests for Responses tool-call items in chat history and developer-role normalization behavior.
demos/continuous_batching/README.md Updates Responses streaming example model identifier.

EXPECT_EQ(apiHandler->getToolChoice(), "none");
}

TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TEST_F is indented relative to surrounding tests (e.g., TEST_F at line 2722 starts at column 1). Please align it to the file’s existing style to avoid clang-format/cpplint noise.

Suggested change
TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {
TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {

Copilot uses AI. Check for mistakes.
#endif
}

TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesDeveloperRoleMappedToSystem) {
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TEST_F is indented compared to the surrounding tests in this file. Please align it with the existing style (macros starting at column 1) to keep formatting consistent.

Suggested change
TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesDeveloperRoleMappedToSystem) {
TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesDeveloperRoleMappedToSystem) {

Copilot uses AI. Check for mistakes.
Comment on lines +2833 to +2834
TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesConsecutiveDeveloperMessagesAreMerged) {
std::string json = R"({
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name ParsingResponsesConsecutiveDeveloperMessagesAreMerged contradicts the assertions/comment below (it verifies messages are not merged). Please rename the test (and/or adjust the comment) so the intent matches the behavior being asserted.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines 421 to 423
messagesArray.PushBack(msgObj, alloc);
}
processedDoc.AddMember("messages", messagesArray, alloc);
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Responses processedJson is constructed here to provide a {"messages": [...]} payload for the Python chat-template path, but parseTools() is invoked later in parseResponsesPart() and can overwrite request.processedJson with the original doc (Responses-shaped JSON) whenever it mutates tools/tool_choice. That would make GenAiServable::prepareInputs pass a non-empty but incompatible JSON to applyChatTemplate, breaking Responses requests that include tools/tool_choice. Consider rebuilding processedJson after parseTools() using the filtered/updated tools/tool_choice (or preventing parseTools() from overwriting processedJson for Responses) so processedJson always preserves the messages array shape.

Copilot uses AI. Check for mistakes.
Comment thread src/llm/servable.cpp
Comment on lines +236 to +240
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500));
success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText);
} else {
SPDLOG_LOGGER_WARN(llm_calculator_logger, "Responses processedJson is empty! Falling back to original body");
success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, executionContext->payload.body, inputText);
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processedJson.substr(0, 500) is evaluated unconditionally, even when DEBUG logging is disabled, and it may log user-provided content/tool arguments into logs. Consider removing the content dump (or guarding it behind a should_log(debug) check and logging only size/metadata), and treat an empty Responses processedJson as an internal error rather than silently falling back to the raw request body (which is a different schema and can hide upstream issues).

Suggested change
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500));
success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText);
} else {
SPDLOG_LOGGER_WARN(llm_calculator_logger, "Responses processedJson is empty! Falling back to original body");
success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, executionContext->payload.body, inputText);
if (llm_calculator_logger->should_log(spdlog::level::debug)) {
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson of size: {}", processedJson.size());
}
success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText);
} else {
SPDLOG_LOGGER_ERROR(llm_calculator_logger, "Responses processedJson is empty");
return absl::InternalError("Responses processedJson is empty");

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/llm/apis/openai_responses.cpp:26

  • The file uses std::stringstream (joinServerSideEvents) but does not include . Please add the missing standard header to avoid relying on transitive includes (include-what-you-use).
#include <cmath>
#include <limits>
#include <map>
#include <memory>
#include <stdexcept>
#include <set>
#include <string>
#include <string.h>

EXPECT_EQ(apiHandler->getToolChoice(), "none");
}

TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is inconsistent with surrounding tests (extra leading spaces before TEST_F here and in the following new tests). This can trip cpplint/clang-format and makes the file harder to scan; please align to the existing column-0 TEST_F(...) style used throughout the file.

Suggested change
TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {
TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {

Copilot uses AI. Check for mistakes.
}
imageHistory.push_back({messageIndex, tensorResult.value()});
} else {
return absl::InvalidArgumentError("Unsupported content type. Supported types are input_text, output_text, text and input_image.");
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message for unsupported content types omits "refusal", even though this function explicitly supports type == "refusal" above. Please update the message to list all supported types (or make it generic) so clients aren’t misled.

Suggested change
return absl::InvalidArgumentError("Unsupported content type. Supported types are input_text, output_text, text and input_image.");
return absl::InvalidArgumentError("Unsupported content type. Supported types are input_text, output_text, text, refusal and input_image.");

Copilot uses AI. Check for mistakes.
Comment thread src/llm/servable.cpp
Comment on lines +235 to +241
if (processedJson.size() > 0) {
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500));
success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText);
} else {
SPDLOG_LOGGER_WARN(llm_calculator_logger, "Responses processedJson is empty! Falling back to original body");
success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, executionContext->payload.body, inputText);
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processedJson.substr(0, 500) is evaluated unconditionally, which allocates/copies on every request even when debug logging is disabled. Also, for Responses with Python enabled, processedJson is always built (non-empty) in OpenAIResponsesHandler, so the empty-check/fallback branch is dead code. Please guard the substring/log with should_log(debug) (or equivalent) and simplify/remove the unreachable fallback.

Suggested change
if (processedJson.size() > 0) {
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500));
success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText);
} else {
SPDLOG_LOGGER_WARN(llm_calculator_logger, "Responses processedJson is empty! Falling back to original body");
success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, executionContext->payload.body, inputText);
}
if (llm_calculator_logger->should_log(spdlog::level::debug)) {
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500));
}
success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants