Python: Fix duplicate tool_calls messages in HandoffBuilder with tool approval by moonbox3 · Pull Request #4412 · microsoft/agent-framework

moonbox3 · 2026-03-03T08:36:21Z

Motivation and Context

When using HandoffBuilder with OpenAIChatClient and tool approval (approval_mode="always_require"), resuming after approval caused the history provider to inject duplicate assistant messages containing tool_calls. This led to unmatched tool_calls without corresponding tool responses, triggering a 400 error from the OpenAI Chat Completions API.

Fixes #4411

Description

The root cause was that HandoffAgentExecutor already manages the full conversation state via _full_conversation, but the agent's history providers would also load and re-inject prior messages—including the assistant's tool_calls message—on the approval resume call. The fix strips existing history providers from the cloned agent and replaces them with a no-op InMemoryHistoryProvider (with load_messages=False, store_inputs=False, store_outputs=False), preventing duplicate message injection. A regression test validates that every tool_calls message has a matching tool response, which is the invariant the OpenAI API enforces.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

microsoft#4411) HandoffAgentExecutor._clone_chat_agent() created cloned agents with empty context_providers, causing Agent._prepare_run_context() to auto-inject an InMemoryHistoryProvider. This provider independently loaded stored messages alongside the full conversation replay from _full_conversation, producing duplicate assistant tool_calls messages without matching tool responses. The OpenAI Chat Completions API rejects this with a 400 error. Fix: Suppress history providers in cloned handoff agents by replacing them with a disabled InMemoryHistoryProvider (load_messages=False, store_inputs=False, store_outputs=False). This prevents auto-injection while preserving non-history context providers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 3 | Confidence: 96%

✓ Correctness

The diff is a purely cosmetic change that collapses a multi-line function call into a single line. There is no behavioral, logical, or correctness change whatsoever.

✓ Security Reliability

The diff is a purely cosmetic formatting change — collapsing a multi-line function call into a single line. There are no security, reliability, or correctness implications.

✓ Test Coverage

The diff is a purely cosmetic formatting change — collapsing a multi-line function call into a single line with no behavioral modification. There are no new or changed behaviors that require additional test coverage.

Automated review by moonbox3's agents

markwallace-microsoft · 2026-03-03T08:39:05Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/orchestrations/agent_framework_orchestrations
_handoff.py	387	58	85%	105–106, 108, 163–173, 175, 177, 179, 184, 279, 285, 317, 340, 362, 425, 450, 508, 540, 598–599, 631, 639, 643–644, 682–684, 689–691, 809, 812, 825, 887, 892, 899, 909, 911, 930, 932, 1014–1015, 1047–1048, 1130, 1137, 1209–1210, 1212
TOTAL	22255	2756	87%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
4725	247 💤	0 ❌	0 🔥	1m 20s ⏱️

Copilot

Pull request overview

Fixes an OpenAI Chat Completions approval-resume failure in HandoffBuilder where prior history providers could re-inject assistant tool_calls messages, creating orphaned tool calls (triggering a 400 from the API).

Changes:

Suppress cloned-agent history providers in handoff workflows and replace them with a no-op InMemoryHistoryProvider to prevent duplicate message injection on approval resume.
Add a regression test that asserts every function_call (tool_calls) has a matching function_result on resume.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`python/packages/orchestrations/agent_framework_orchestrations/_handoff.py`	Updates agent cloning to remove `BaseHistoryProvider` instances and add a disabled `InMemoryHistoryProvider` to avoid duplicate history replay on resume.
`python/packages/orchestrations/tests/test_handoff.py`	Adds a regression test for “no duplicated tool_calls on approval resume” (but currently contains an inlined/stray docstring issue that will fail lint).

Comments suppressed due to low confidence (1)

python/packages/orchestrations/tests/test_handoff.py:571

Line 568 is an indented standalone triple-quoted string after executable statements, so it is not a docstring and will be flagged by Ruff/bugbear as a useless expression (B018). It also indicates the following block (ReplaySafeHandoffClient + assertions) was accidentally inlined into this test when the prior test function header was removed. Please split the subsequent block back into its own test_* function (or move the string to the start of the intended function) so each test has a proper docstring and no stray expression statements.

    assert client.resume_validated is True
    """Returning to the same agent must not replay dict tool outputs."""

    class ReplaySafeHandoffClient(ChatMiddlewareLayer[Any], FunctionInvocationLayer[Any], BaseChatClient[Any]):
        def __init__(self, name: str, handoff_sequence: list[str | None]) -> None:

eavanvalkenburg · 2026-03-03T10:04:02Z

python/packages/orchestrations/agent_framework_orchestrations/_handoff.py

        metadata = options.get("metadata")

        # Disable parallel tool calls to prevent the agent from invoking multiple handoff tools at once.
        cloned_options: dict[str, Any] = {


this is a bit tricky, because it might set options that are not supported by some clients, such as allow_multiple_tool_calls, I think a custom instruction would be a better approach even though it might be a bit less accurate.

If an option is not supported by a client, isn't it the responsibility of the client to strip out the option and replace it with a parameter that it supports? allow_multiple_tool_calls is a property in the _ChatOptionBase class.

eavanvalkenburg · 2026-03-03T10:04:58Z

python/packages/orchestrations/agent_framework_orchestrations/_handoff.py

+        context_providers: list[BaseContextProvider] = [
+            p for p in agent.context_providers if not isinstance(p, BaseHistoryProvider)
+        ]
+        context_providers.append(InMemoryHistoryProvider(load_messages=False, store_inputs=False, store_outputs=False))


I'm not sure I like this, because it means you cannot have a Agent that stores conversations that were passed to it, to disk, could you not filter and check if there are any History Providers that have load_messages=True and then raise?

TaoChenOSU · 2026-03-03T18:41:52Z

I think the issue is that we are replacing the cache with the full conversation history: https://github.com/microsoft/agent-framework/blob/main/python/packages/orchestrations/agent_framework_orchestrations/_handoff.py#L479

The cache and the full_conversation are two entities for different purposes. The cache is for staging messages until the next agent invocation. The full_conversation stores a list of messages generated by all agents with tool messages removed and is used as the output of the workflow.

moonbox3 · 2026-03-04T05:45:55Z

Per our offline chat, we will fix this by updating handoff logic.

moonbox3 and others added 2 commits March 3, 2026 17:34

Apply pre-commit auto-fixes

eeee665

Copilot AI review requested due to automatic review settings March 3, 2026 08:36

moonbox3 commented Mar 3, 2026

View reviewed changes

markwallace-microsoft added the python label Mar 3, 2026

Copilot started reviewing on behalf of moonbox3 March 3, 2026 08:36 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

moonbox3 enabled auto-merge March 3, 2026 09:54

moonbox3 self-assigned this Mar 3, 2026

eavanvalkenburg reviewed Mar 3, 2026

View reviewed changes

moonbox3 closed this Mar 4, 2026

auto-merge was automatically disabled March 4, 2026 05:45
Pull request was closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Fix duplicate tool_calls messages in HandoffBuilder with tool approval#4412

Python: Fix duplicate tool_calls messages in HandoffBuilder with tool approval#4412
moonbox3 wants to merge 2 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4411-2

moonbox3 commented Mar 3, 2026

Uh oh!

moonbox3 left a comment

Uh oh!

markwallace-microsoft commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

eavanvalkenburg Mar 3, 2026

Uh oh!

TaoChenOSU Mar 3, 2026

Uh oh!

eavanvalkenburg Mar 3, 2026

Uh oh!

TaoChenOSU commented Mar 3, 2026

Uh oh!

moonbox3 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

moonbox3 commented Mar 3, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

Uh oh!

markwallace-microsoft commented Mar 3, 2026

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

eavanvalkenburg Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

TaoChenOSU Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

eavanvalkenburg Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

TaoChenOSU commented Mar 3, 2026

Uh oh!

moonbox3 commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants