Fix responses by michalkulakowski · Pull Request #4152 · openvinotoolkit/model_server

michalkulakowski · 2026-04-22T09:28:19Z

🛠 Summary

JIRA/Issue if applicable.
Describe the changes.

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

Copilot

Pull request overview

This PR improves OpenAI Responses API handling by expanding request parsing support and hardening error handling in the LLM serving path.

Changes:

Add parsing of Responses function_call / function_call_output input items into GenAI ChatHistory, and normalize developer role to system.
Add additional exception handling around request parsing / generation-config parsing and response generation.
Update the continuous batching demo’s Responses snippet to use a different example model name.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/llm/apis/openai_responses.cpp`	Extends Responses input parsing (tool call items, developer role mapping) and enriches the Python-template `messages` serialization with tool-call fields.
`src/llm/servable.cpp`	Wraps parsing/config steps in try/catch and tweaks Responses chat-template application behavior/logging.
`src/llm/http_llm_calculator.cc`	Adds `std::exception` catch to route unexpected errors through existing error handling.
`src/test/http_openai_handler_test.cpp`	Adds unit tests for Responses tool-call items in chat history and developer-role normalization behavior.
`demos/continuous_batching/README.md`	Updates Responses streaming example model identifier.

Copilot · 2026-04-22T09:34:31Z

    EXPECT_EQ(apiHandler->getToolChoice(), "none");
 }

+  TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {


This TEST_F is indented relative to surrounding tests (e.g., TEST_F at line 2722 starts at column 1). Please align it to the file’s existing style to avoid clang-format/cpplint noise.

Suggested change

TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {

TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {

Copilot · 2026-04-22T09:34:32Z

+  #endif
+  }
+
+  TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesDeveloperRoleMappedToSystem) {


This TEST_F is indented compared to the surrounding tests in this file. Please align it with the existing style (macros starting at column 1) to keep formatting consistent.

Suggested change

TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesDeveloperRoleMappedToSystem) {

TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesDeveloperRoleMappedToSystem) {

Copilot · 2026-04-22T09:34:32Z

+TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesConsecutiveDeveloperMessagesAreMerged) {
+    std::string json = R"({


The test name ParsingResponsesConsecutiveDeveloperMessagesAreMerged contradicts the assertions/comment below (it verifies messages are not merged). Please rename the test (and/or adjust the comment) so the intent matches the behavior being asserted.

Copilot · 2026-04-22T09:34:32Z

            messagesArray.PushBack(msgObj, alloc);
        }
        processedDoc.AddMember("messages", messagesArray, alloc);


The Responses processedJson is constructed here to provide a {"messages": [...]} payload for the Python chat-template path, but parseTools() is invoked later in parseResponsesPart() and can overwrite request.processedJson with the original doc (Responses-shaped JSON) whenever it mutates tools/tool_choice. That would make GenAiServable::prepareInputs pass a non-empty but incompatible JSON to applyChatTemplate, breaking Responses requests that include tools/tool_choice. Consider rebuilding processedJson after parseTools() using the filtered/updated tools/tool_choice (or preventing parseTools() from overwriting processedJson for Responses) so processedJson always preserves the messages array shape.

Copilot · 2026-04-22T09:34:32Z

+                SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500));
+                success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText);
+            } else {
+                SPDLOG_LOGGER_WARN(llm_calculator_logger, "Responses processedJson is empty! Falling back to original body");
+                success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, executionContext->payload.body, inputText);


processedJson.substr(0, 500) is evaluated unconditionally, even when DEBUG logging is disabled, and it may log user-provided content/tool arguments into logs. Consider removing the content dump (or guarding it behind a should_log(debug) check and logging only size/metadata), and treat an empty Responses processedJson as an internal error rather than silently falling back to the raw request body (which is a different schema and can hide upstream issues).

Suggested change

SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500));

success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText);

} else {

SPDLOG_LOGGER_WARN(llm_calculator_logger, "Responses processedJson is empty! Falling back to original body");

success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, executionContext->payload.body, inputText);

if (llm_calculator_logger->should_log(spdlog::level::debug)) {

SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson of size: {}", processedJson.size());

}

success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText);

} else {

SPDLOG_LOGGER_ERROR(llm_calculator_logger, "Responses processedJson is empty");

return absl::InternalError("Responses processedJson is empty");

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/llm/apis/openai_responses.cpp:26

The file uses std::stringstream (joinServerSideEvents) but does not include . Please add the missing standard header to avoid relying on transitive includes (include-what-you-use).

#include <cmath>
#include <limits>
#include <map>
#include <memory>
#include <stdexcept>
#include <set>
#include <string>
#include <string.h>

Copilot · 2026-04-22T10:54:37Z

    EXPECT_EQ(apiHandler->getToolChoice(), "none");
 }

+  TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {


Indentation is inconsistent with surrounding tests (extra leading spaces before TEST_F here and in the following new tests). This can trip cpplint/clang-format and makes the file harder to scan; please align to the existing column-0 TEST_F(...) style used throughout the file.

Suggested change

TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {

TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {

Copilot · 2026-04-22T10:54:38Z

+            }
+            imageHistory.push_back({messageIndex, tensorResult.value()});
+        } else {
+            return absl::InvalidArgumentError("Unsupported content type. Supported types are input_text, output_text, text and input_image.");


The error message for unsupported content types omits "refusal", even though this function explicitly supports type == "refusal" above. Please update the message to list all supported types (or make it generic) so clients aren’t misled.

Suggested change

return absl::InvalidArgumentError("Unsupported content type. Supported types are input_text, output_text, text and input_image.");

return absl::InvalidArgumentError("Unsupported content type. Supported types are input_text, output_text, text, refusal and input_image.");

Copilot · 2026-04-22T10:54:38Z

+            if (processedJson.size() > 0) {
+                SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500));
+                success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText);
+            } else {
+                SPDLOG_LOGGER_WARN(llm_calculator_logger, "Responses processedJson is empty! Falling back to original body");
+                success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, executionContext->payload.body, inputText);
+            }


processedJson.substr(0, 500) is evaluated unconditionally, which allocates/copies on every request even when debug logging is disabled. Also, for Responses with Python enabled, processedJson is always built (non-empty) in OpenAIResponsesHandler, so the empty-check/fallback branch is dead code. Please guard the substring/log with should_log(debug) (or equivalent) and simplify/remove the unreachable fallback.

Suggested change

if (processedJson.size() > 0) {

SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500));

success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText);

} else {

SPDLOG_LOGGER_WARN(llm_calculator_logger, "Responses processedJson is empty! Falling back to original body");

success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, executionContext->payload.body, inputText);

}

if (llm_calculator_logger->should_log(spdlog::level::debug)) {

SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Applying chat template with Responses processedJson (first 500 chars): {}", processedJson.substr(0, 500));

}

success = PyJinjaTemplateProcessor::applyChatTemplate(getProperties()->templateProcessor, getProperties()->modelsPath, processedJson, inputText);

Fix responses

dd77628

Copilot AI review requested due to automatic review settings April 22, 2026 09:28

Copilot started reviewing on behalf of michalkulakowski April 22, 2026 09:29 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

fix

6662428

michalkulakowski requested a review from Copilot April 22, 2026 10:51

Copilot started reviewing on behalf of michalkulakowski April 22, 2026 10:51 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

fix

99735a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix responses#4152

Fix responses#4152
michalkulakowski wants to merge 3 commits intomainfrom
mkulakow/responses_fix

michalkulakowski commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {
	TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesFunctionCallItemsStoredInChatHistory) {

		TEST_F(HttpOpenAIHandlerParsingTest, ParsingResponsesConsecutiveDeveloperMessagesAreMerged) {
		std::string json = R"({

	return absl::InvalidArgumentError("Unsupported content type. Supported types are input_text, output_text, text and input_image.");
	return absl::InvalidArgumentError("Unsupported content type. Supported types are input_text, output_text, text, refusal and input_image.");

Conversation

michalkulakowski commented Apr 22, 2026

🛠 Summary

🧪 Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants