Skip to content

Python: Fix duplicate tool_calls messages in HandoffBuilder with tool approval#4412

Closed
moonbox3 wants to merge 2 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4411-2
Closed

Python: Fix duplicate tool_calls messages in HandoffBuilder with tool approval#4412
moonbox3 wants to merge 2 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4411-2

Conversation

@moonbox3
Copy link
Contributor

@moonbox3 moonbox3 commented Mar 3, 2026

Motivation and Context

When using HandoffBuilder with OpenAIChatClient and tool approval (approval_mode="always_require"), resuming after approval caused the history provider to inject duplicate assistant messages containing tool_calls. This led to unmatched tool_calls without corresponding tool responses, triggering a 400 error from the OpenAI Chat Completions API.

Fixes #4411

Description

The root cause was that HandoffAgentExecutor already manages the full conversation state via _full_conversation, but the agent's history providers would also load and re-inject prior messages—including the assistant's tool_calls message—on the approval resume call. The fix strips existing history providers from the cloned agent and replaces them with a no-op InMemoryHistoryProvider (with load_messages=False, store_inputs=False, store_outputs=False), preventing duplicate message injection. A regression test validates that every tool_calls message has a matching tool response, which is the invariant the OpenAI API enforces.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

moonbox3 and others added 2 commits March 3, 2026 17:34
microsoft#4411)

HandoffAgentExecutor._clone_chat_agent() created cloned agents with empty
context_providers, causing Agent._prepare_run_context() to auto-inject an
InMemoryHistoryProvider. This provider independently loaded stored messages
alongside the full conversation replay from _full_conversation, producing
duplicate assistant tool_calls messages without matching tool responses.
The OpenAI Chat Completions API rejects this with a 400 error.

Fix: Suppress history providers in cloned handoff agents by replacing them
with a disabled InMemoryHistoryProvider (load_messages=False, store_inputs=False,
store_outputs=False). This prevents auto-injection while preserving non-history
context providers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 08:36
Copy link
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 96%

✓ Correctness

The diff is a purely cosmetic change that collapses a multi-line function call into a single line. There is no behavioral, logical, or correctness change whatsoever.

✓ Security Reliability

The diff is a purely cosmetic formatting change — collapsing a multi-line function call into a single line. There are no security, reliability, or correctness implications.

✓ Test Coverage

The diff is a purely cosmetic formatting change — collapsing a multi-line function call into a single line with no behavioral modification. There are no new or changed behaviors that require additional test coverage.


Automated review by moonbox3's agents

@markwallace-microsoft
Copy link
Member

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/orchestrations/agent_framework_orchestrations
   _handoff.py3875885%105–106, 108, 163–173, 175, 177, 179, 184, 279, 285, 317, 340, 362, 425, 450, 508, 540, 598–599, 631, 639, 643–644, 682–684, 689–691, 809, 812, 825, 887, 892, 899, 909, 911, 930, 932, 1014–1015, 1047–1048, 1130, 1137, 1209–1210, 1212
TOTAL22255275687% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
4725 247 💤 0 ❌ 0 🔥 1m 20s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an OpenAI Chat Completions approval-resume failure in HandoffBuilder where prior history providers could re-inject assistant tool_calls messages, creating orphaned tool calls (triggering a 400 from the API).

Changes:

  • Suppress cloned-agent history providers in handoff workflows and replace them with a no-op InMemoryHistoryProvider to prevent duplicate message injection on approval resume.
  • Add a regression test that asserts every function_call (tool_calls) has a matching function_result on resume.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
python/packages/orchestrations/agent_framework_orchestrations/_handoff.py Updates agent cloning to remove BaseHistoryProvider instances and add a disabled InMemoryHistoryProvider to avoid duplicate history replay on resume.
python/packages/orchestrations/tests/test_handoff.py Adds a regression test for “no duplicated tool_calls on approval resume” (but currently contains an inlined/stray docstring issue that will fail lint).
Comments suppressed due to low confidence (1)

python/packages/orchestrations/tests/test_handoff.py:571

  • Line 568 is an indented standalone triple-quoted string after executable statements, so it is not a docstring and will be flagged by Ruff/bugbear as a useless expression (B018). It also indicates the following block (ReplaySafeHandoffClient + assertions) was accidentally inlined into this test when the prior test function header was removed. Please split the subsequent block back into its own test_* function (or move the string to the start of the intended function) so each test has a proper docstring and no stray expression statements.
    assert client.resume_validated is True
    """Returning to the same agent must not replay dict tool outputs."""

    class ReplaySafeHandoffClient(ChatMiddlewareLayer[Any], FunctionInvocationLayer[Any], BaseChatClient[Any]):
        def __init__(self, name: str, handoff_sequence: list[str | None]) -> None:

@moonbox3 moonbox3 enabled auto-merge March 3, 2026 09:54
@moonbox3 moonbox3 self-assigned this Mar 3, 2026
metadata = options.get("metadata")

# Disable parallel tool calls to prevent the agent from invoking multiple handoff tools at once.
cloned_options: dict[str, Any] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit tricky, because it might set options that are not supported by some clients, such as allow_multiple_tool_calls, I think a custom instruction would be a better approach even though it might be a bit less accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an option is not supported by a client, isn't it the responsibility of the client to strip out the option and replace it with a parameter that it supports? allow_multiple_tool_calls is a property in the _ChatOptionBase class.

context_providers: list[BaseContextProvider] = [
p for p in agent.context_providers if not isinstance(p, BaseHistoryProvider)
]
context_providers.append(InMemoryHistoryProvider(load_messages=False, store_inputs=False, store_outputs=False))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this, because it means you cannot have a Agent that stores conversations that were passed to it, to disk, could you not filter and check if there are any History Providers that have load_messages=True and then raise?

@TaoChenOSU
Copy link
Contributor

I think the issue is that we are replacing the cache with the full conversation history: https://github.com/microsoft/agent-framework/blob/main/python/packages/orchestrations/agent_framework_orchestrations/_handoff.py#L479

The cache and the full_conversation are two entities for different purposes. The cache is for staging messages until the next agent invocation. The full_conversation stores a list of messages generated by all agents with tool messages removed and is used as the output of the workflow.

@moonbox3
Copy link
Contributor Author

moonbox3 commented Mar 4, 2026

Per our offline chat, we will fix this by updating handoff logic.

@moonbox3 moonbox3 closed this Mar 4, 2026
auto-merge was automatically disabled March 4, 2026 05:45

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: HandoffBuilder + OpenAIChatClient tool approval causes duplicate tool_calls messages (400 error)

5 participants