Python: Fix duplicate tool_calls messages in HandoffBuilder with tool approval#4412
Python: Fix duplicate tool_calls messages in HandoffBuilder with tool approval#4412moonbox3 wants to merge 2 commits intomicrosoft:mainfrom
Conversation
microsoft#4411) HandoffAgentExecutor._clone_chat_agent() created cloned agents with empty context_providers, causing Agent._prepare_run_context() to auto-inject an InMemoryHistoryProvider. This provider independently loaded stored messages alongside the full conversation replay from _full_conversation, producing duplicate assistant tool_calls messages without matching tool responses. The OpenAI Chat Completions API rejects this with a 400 error. Fix: Suppress history providers in cloned handoff agents by replacing them with a disabled InMemoryHistoryProvider (load_messages=False, store_inputs=False, store_outputs=False). This prevents auto-injection while preserving non-history context providers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 96%
✓ Correctness
The diff is a purely cosmetic change that collapses a multi-line function call into a single line. There is no behavioral, logical, or correctness change whatsoever.
✓ Security Reliability
The diff is a purely cosmetic formatting change — collapsing a multi-line function call into a single line. There are no security, reliability, or correctness implications.
✓ Test Coverage
The diff is a purely cosmetic formatting change — collapsing a multi-line function call into a single line with no behavioral modification. There are no new or changed behaviors that require additional test coverage.
Automated review by moonbox3's agents
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
Fixes an OpenAI Chat Completions approval-resume failure in HandoffBuilder where prior history providers could re-inject assistant tool_calls messages, creating orphaned tool calls (triggering a 400 from the API).
Changes:
- Suppress cloned-agent history providers in handoff workflows and replace them with a no-op
InMemoryHistoryProviderto prevent duplicate message injection on approval resume. - Add a regression test that asserts every
function_call(tool_calls) has a matchingfunction_resulton resume.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
python/packages/orchestrations/agent_framework_orchestrations/_handoff.py |
Updates agent cloning to remove BaseHistoryProvider instances and add a disabled InMemoryHistoryProvider to avoid duplicate history replay on resume. |
python/packages/orchestrations/tests/test_handoff.py |
Adds a regression test for “no duplicated tool_calls on approval resume” (but currently contains an inlined/stray docstring issue that will fail lint). |
Comments suppressed due to low confidence (1)
python/packages/orchestrations/tests/test_handoff.py:571
- Line 568 is an indented standalone triple-quoted string after executable statements, so it is not a docstring and will be flagged by Ruff/bugbear as a useless expression (B018). It also indicates the following block (ReplaySafeHandoffClient + assertions) was accidentally inlined into this test when the prior test function header was removed. Please split the subsequent block back into its own
test_*function (or move the string to the start of the intended function) so each test has a proper docstring and no stray expression statements.
assert client.resume_validated is True
"""Returning to the same agent must not replay dict tool outputs."""
class ReplaySafeHandoffClient(ChatMiddlewareLayer[Any], FunctionInvocationLayer[Any], BaseChatClient[Any]):
def __init__(self, name: str, handoff_sequence: list[str | None]) -> None:
| metadata = options.get("metadata") | ||
|
|
||
| # Disable parallel tool calls to prevent the agent from invoking multiple handoff tools at once. | ||
| cloned_options: dict[str, Any] = { |
There was a problem hiding this comment.
this is a bit tricky, because it might set options that are not supported by some clients, such as allow_multiple_tool_calls, I think a custom instruction would be a better approach even though it might be a bit less accurate.
There was a problem hiding this comment.
If an option is not supported by a client, isn't it the responsibility of the client to strip out the option and replace it with a parameter that it supports? allow_multiple_tool_calls is a property in the _ChatOptionBase class.
| context_providers: list[BaseContextProvider] = [ | ||
| p for p in agent.context_providers if not isinstance(p, BaseHistoryProvider) | ||
| ] | ||
| context_providers.append(InMemoryHistoryProvider(load_messages=False, store_inputs=False, store_outputs=False)) |
There was a problem hiding this comment.
I'm not sure I like this, because it means you cannot have a Agent that stores conversations that were passed to it, to disk, could you not filter and check if there are any History Providers that have load_messages=True and then raise?
|
I think the issue is that we are replacing the cache with the full conversation history: https://github.com/microsoft/agent-framework/blob/main/python/packages/orchestrations/agent_framework_orchestrations/_handoff.py#L479 The cache and the full_conversation are two entities for different purposes. The cache is for staging messages until the next agent invocation. The full_conversation stores a list of messages generated by all agents with tool messages removed and is used as the output of the workflow. |
|
Per our offline chat, we will fix this by updating handoff logic. |
Pull request was closed
Motivation and Context
When using
HandoffBuilderwithOpenAIChatClientand tool approval (approval_mode="always_require"), resuming after approval caused the history provider to inject duplicate assistant messages containingtool_calls. This led to unmatched tool_calls without corresponding tool responses, triggering a 400 error from the OpenAI Chat Completions API.Fixes #4411
Description
The root cause was that
HandoffAgentExecutoralready manages the full conversation state via_full_conversation, but the agent's history providers would also load and re-inject prior messages—including the assistant'stool_callsmessage—on the approval resume call. The fix strips existing history providers from the cloned agent and replaces them with a no-opInMemoryHistoryProvider(withload_messages=False,store_inputs=False,store_outputs=False), preventing duplicate message injection. A regression test validates that everytool_callsmessage has a matching tool response, which is the invariant the OpenAI API enforces.Contribution Checklist