Skip to content

Python: Allow @tool functions to return rich content (images, audio)#4331

Open
giles17 wants to merge 9 commits intomicrosoft:mainfrom
giles17:giles/tool-rich-content-results
Open

Python: Allow @tool functions to return rich content (images, audio)#4331
giles17 wants to merge 9 commits intomicrosoft:mainfrom
giles17:giles/tool-rich-content-results

Conversation

@giles17
Copy link
Contributor

@giles17 giles17 commented Feb 26, 2026

Description

Closes #4272 and #2513

When a @tool function returns a Content object (e.g. Content.from_data(image_bytes, "image/png")), the framework now preserves it as rich content that the model can perceive natively, instead of serializing it to a JSON string.

Problem

Previously, FunctionTool.parse_result() serialized any Content return to JSON text via _make_dumpable(). The model received a text blob, not the actual image. The same issue existed in MCP tool results where ImageContent was JSON-serialized.

Solution

Added an items field to function_result Content that carries rich Content objects (images, audio, files) alongside the text result. Providers format these items using their existing multi-modal content handling.

User API — no decorator changes needed:

@tool
async def capture_screenshot(url: str) -> Content:
    image_bytes = await take_screenshot(url)
    return Content.from_data(data=image_bytes, media_type="image/png")

@tool
async def render_chart(data: str) -> list[Content]:
    image_bytes = render(data)
    return [
        Content.from_text("Chart rendered."),
        Content.from_data(data=image_bytes, media_type="image/png"),
    ]

Changes

Core framework:

  • _types.py: Added items field to Content. Updated from_function_result() to accept str | list[Content] and split text from rich items internally.
  • _tools.py: Updated parse_result() to preserve Content returns instead of JSON-serializing. Updated invoke() return type.
  • _mcp.py: Updated _parse_tool_result_from_mcp() to return list[Content] for image/audio instead of JSON strings. Preserves original content ordering.

All 6 providers updated:

  • OpenAI Responses: Injects rich items as user message with input_image after function_call_output
  • OpenAI Chat Completions: Injects rich items as follow-up user message (Chat Completions API only supports string content in tool messages)
  • Anthropic: Formats rich items as native image blocks in tool_result content array
  • Bedrock/Ollama/Azure-AI: Logs warning when rich items present (unsupported by these APIs)

Tests: 8 new tests + 2 updated existing tests, all passing.

…udio)

Add support for tool functions to return Content objects that the model can perceive natively. Closes microsoft#4272

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 26, 2026 19:48
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Feb 26, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/anthropic/agent_framework_anthropic
   _chat_client.py4234788%427, 430, 511, 598, 600, 718–723, 731–732, 737, 741, 775–776, 850, 880–881, 924–926, 928, 941–942, 949–951, 955–957, 961–964, 1078, 1088, 1140, 1261, 1288–1289, 1306, 1319, 1332, 1357–1358
packages/azure-ai/agent_framework_azure_ai
   _chat_client.py4807684%391–392, 394, 578, 583–584, 586–587, 590, 593, 595, 600, 861–862, 864, 867, 870, 873–878, 881, 883, 891, 903–905, 909, 912–913, 921–924, 934, 942–945, 947–948, 950–951, 958, 966–967, 975–976, 981–982, 986–993, 998, 1001, 1009, 1015, 1023–1025, 1028, 1050–1051, 1184, 1212, 1227, 1343, 1395, 1470
packages/core/agent_framework
   _mcp.py4336485%97–98, 108–113, 124, 129, 181–182, 192–197, 207–208, 230, 277, 286, 349, 357, 508, 575, 610, 612, 616–617, 619–620, 674, 689, 707, 748, 853, 866–871, 893, 942–943, 949–951, 970, 995–996, 1000–1004, 1021–1025, 1169
   _tools.py8879289%167–168, 327, 329, 347–349, 356, 374, 388, 395, 402, 418, 420, 427, 464, 489, 493, 510–512, 559–561, 584, 608, 651, 673, 736–742, 778, 789–800, 822–824, 829, 833, 847–849, 888, 957, 967, 977, 1033, 1064, 1083, 1366, 1424, 1444, 1520–1524, 1647, 1651, 1675, 1701, 1703, 1719, 1721, 1806, 1836, 1856, 1858, 1911, 1974, 2165–2166, 2214, 2282–2283, 2341, 2346, 2353
   _types.py10468491%59, 68–69, 123, 128, 147, 149, 153, 157, 159, 161, 163, 181, 185, 211, 233, 238, 243, 247, 277, 655–656, 1163, 1234, 1251, 1269, 1292, 1302, 1319–1320, 1322, 1340–1341, 1343, 1350–1351, 1353, 1388, 1399–1400, 1402, 1440, 1667, 1719, 1810–1815, 1837, 1842, 2008, 2020, 2272, 2293, 2388, 2617, 2824, 2894, 2906, 2924, 3122–3124, 3127–3129, 3133, 3138, 3142, 3226–3228, 3257, 3311, 3330–3331, 3334–3338, 3344
packages/core/agent_framework/openai
   _chat_client.py2972790%210, 240–241, 245, 363, 370, 446–453, 455–458, 468, 546, 548, 565, 579, 612, 638, 654, 694
   _responses_client.py81113982%312–315, 319–320, 325–326, 336–337, 344, 359–365, 386, 394, 417, 515, 517, 614, 669, 673, 675, 677, 679, 747, 761, 841, 851, 856, 899, 978, 995, 1008, 1069, 1160, 1165, 1169–1171, 1175–1176, 1216–1224, 1231, 1260, 1266, 1276, 1282, 1287, 1293, 1298–1299, 1360, 1382–1383, 1398–1399, 1417–1418, 1459–1462, 1624, 1662–1663, 1679, 1681, 1761–1769, 1891, 1946, 1961, 1981–1991, 2004, 2015–2019, 2033, 2047–2058, 2067, 2099–2102, 2110–2111, 2113–2115, 2129–2131, 2141–2142, 2148, 2163
TOTAL22535281987% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
4567 25 💤 0 ❌ 0 🔥 1m 20s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables @tool-decorated functions to return rich content (images, audio, files) that models can perceive natively, rather than having them serialized to JSON strings. This addresses issue #4272 by allowing vision-in-the-loop workflows where tools like capture_screenshot() or render_chart() can feed image content back into the model for analysis.

Changes:

  • Core framework now preserves Content objects with rich media instead of JSON-serializing them
  • Added items field to function_result Content to carry rich media alongside text results
  • Updated all 6 provider implementations to handle rich content (OpenAI Responses, OpenAI Chat, Anthropic support it natively; Bedrock, Ollama, Azure-AI log warnings)

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
python/packages/core/agent_framework/_types.py Added items parameter to Content.init and from_function_result() to store rich media items; updated to_dict() to serialize items
python/packages/core/agent_framework/_tools.py Updated parse_result() to return str or list[Content] instead of always serializing; added _build_function_result() helper to separate text and rich items; updated invoke() return type
python/packages/core/agent_framework/_mcp.py Updated _parse_tool_result_from_mcp() to return list[Content] for results containing images/audio instead of JSON strings
python/packages/core/agent_framework/openai/_responses_client.py Injects rich items as separate user message with input_image content after function_call_output
python/packages/core/agent_framework/openai/_chat_client.py Formats tool message content as multi-part array with text and image_url/input_audio/file parts when items present
python/packages/anthropic/agent_framework_anthropic/_chat_client.py Formats rich items as native image blocks in tool_result content array; handles both data and uri image types
python/packages/bedrock/agent_framework_bedrock/_chat_client.py Logs warning when rich items present (Bedrock doesn't support them); omits items from tool result
python/packages/ollama/agent_framework_ollama/_chat_client.py Logs warning when rich items present (Ollama doesn't support them); omits items from tool result
python/packages/azure-ai/agent_framework_azure_ai/_chat_client.py Logs warning when rich items present (Azure AI Agents doesn't support them); omits items from tool output
python/packages/core/tests/core/test_types.py Added 8 new tests for parse_result(), _build_function_result(), and Content.from_function_result() with items; updated 2 existing tests to expect list[Content] instead of JSON
python/packages/core/tests/core/test_mcp.py Updated test_parse_tool_result_from_mcp to expect list[Content] for results with images; added test_parse_tool_result_from_mcp_audio_content

Copy link
Member

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we recently made the switch to restrict return types, and one of the reasons was performance, the constant parsing of these results, both for otel and for the client is a bit wasteful. So could you have a look at whether a cache could be used in the parsing function in the different places? And we also need to do integration testing with this because openai chat shouldn't support this, so let's be sure, both with openai, azure openai, ollama and foundry local and maybe others that derive from openai chat

@eavanvalkenburg
Copy link
Member

This is also #2513

giles17 and others added 3 commits February 27, 2026 11:58
…esult, fix Chat client

- Preserve original content order in MCP tool results instead of text-first
- Move _build_function_result logic into Content.from_function_result()
- Chat Completions: inject user message for rich items (API only supports string tool content)
- Update tests for ordering and new from_function_result behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@giles17 giles17 changed the title Python: Allow @tool functions to return rich content (images, audio) Python: Allow @tool functions to return rich content (images, audio) Mar 2, 2026
giles17 and others added 3 commits March 2, 2026 20:02
- Responses client: put rich items directly in function_call_output's
  output field as list (native API support) instead of user message injection
- Chat client: warn and omit rich items (API doesn't support multi-part
  tool results), matching Ollama/Bedrock pattern
- Unify test image: use sample_image.jpg across all integration tests
- Add Azure OpenAI Responses integration test
- Assert model describes house image to verify perception

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 86%

✗ Correctness

This PR adds rich content (images, audio) support in tool results across multiple LLM provider clients. The implementation is well-structured with proper tests. The main correctness issue is a missing test asset file: the Anthropic integration test references a sample_image.jpg in its own tests/assets/ directory, but the diff only adds this file under python/packages/core/tests/assets/. The Azure and OpenAI tests correctly use parent.parent to reach the core assets directory, but the Anthropic test uses parent which resolves to a non-existent path. The remaining changes are logically sound with appropriate fallback/warning behavior for providers that don't support rich tool results.

✓ Security Reliability

This PR adds rich content (images, audio) support to tool results across multiple LLM provider clients. The implementation is generally sound with appropriate fallback warnings for unsupported providers. There are no critical security issues, but there are a few reliability edge cases: Content.from_function_result lacks validation when result is a list, which can cause AttributeError on non-Content items; the Anthropic client can send an empty content array to the API if all rich items are unsupported; and the OpenAI Chat Completions client introduces a continue that may alter the original message-building control flow.

✗ Test Coverage

This diff adds rich content (images, audio) support in tool results across all providers. Core types and parse_result logic have solid unit tests (test_types.py), and MCP parsing is well-covered (test_mcp.py). However, the provider-specific formatting logic for rich content — the most complex new code — lacks unit tests entirely. The Anthropic client's new branching logic in _prepare_message_for_anthropic (data images, URI images, unsupported types) has zero unit tests. The OpenAI Responses client's new output_parts building in _prepare_content_for_openai also has no unit tests. The OpenAI Chat Completions client changed control flow (added continue statement) with no test verifying the warning/behavior with items. All three only have integration tests marked @pytest.mark.flaky, which won't catch regressions in normal CI runs.

Blocking Issues

  • The Anthropic integration test will fail with FileNotFoundError: Path(__file__).parent / "assets" / "sample_image.jpg" resolves to python/packages/anthropic/tests/assets/sample_image.jpg, but the image file is only added at python/packages/core/tests/assets/sample_image.jpg. Either copy the asset to the Anthropic tests directory or fix the path.
  • No unit tests for Anthropic _prepare_message_for_anthropic rich content handling. The new branching logic (lines 716-753 of _chat_client.py) covers three distinct paths — data images, URI images, and unsupported types — none of which are tested. The existing test_prepare_message_for_anthropic_function_result only covers the plain-text fallback path.
  • No unit tests for OpenAI Responses _prepare_content_for_openai rich content in function results. The new output_parts construction (lines 1214-1224 of _responses_client.py) recursively calls _prepare_content_for_openai for each item with no test coverage. Only a flaky integration test covers this path.
  • The OpenAI Chat Completions client (lines 578-583 of openai/_chat_client.py) changed the control flow for ALL function_result messages by adding an explicit append+continue, and added a warning path for items. There is no unit test verifying that function results with items produce a warning and that the result is still correctly appended.

Suggestions

  • In _tools.py parse_result, a Content with type="text" and empty/None text will fall through to JSON serialization via _make_dumpable. Consider returning "" for this edge case.
  • In _mcp.py, consider using Content.from_data (with base64-decoded bytes) instead of Content.from_uri with a synthetic data: URI for ImageContent/AudioContent. This avoids downstream consumers needing to parse the data: URI back out.
  • In _types.py from_function_result, the isinstance(result, list) branch assumes all items are Content objects (accesses .type, .text). If the list contains non-Content items (e.g., strings), this will raise AttributeError. Consider adding a guard like all(isinstance(c, Content) for c in result) or handling non-Content items gracefully, consistent with how parse_result does it.
  • In the Anthropic _chat_client.py, if content.items is truthy but all items have unsupported media types and content.result is falsy, tool_content will be an empty list sent to the API. Consider falling back to the non-rich-content path or adding a text placeholder when tool_content is empty.
  • In _tools.py parse_result, a Content object with type='text' and empty/None text falls through to generic JSON serialization via _make_dumpable, which may produce unexpected results. Consider returning '' for that case.
  • Add a unit test for Content.from_function_result with a list containing only rich items (no text) to verify result is empty string and items are populated.
  • Add unit tests for the warning log paths in Bedrock, Azure AI, and Ollama when content.items is non-empty, to ensure warnings are emitted and results are still correctly formatted.
  • Consider adding a unit test for FunctionTool.parse_result with a list mixing Content and non-Content items to verify the Content.from_text(str(item)) fallback path.
  • The integration test assertions like assert 'house' in response.text.lower() are inherently fragile even with @pytest.mark.flaky. Consider asserting on structural properties (e.g., response contains text, tool was called) rather than model-generated content.

Automated review by moonbox3's agents

image_path = Path(__file__).parent / "assets" / "sample_image.jpg"
image_bytes = image_path.read_bytes()

@tool(approval_mode="never_require")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: This resolves to python/packages/anthropic/tests/assets/sample_image.jpg, but the diff only adds the image at python/packages/core/tests/assets/sample_image.jpg. The Azure and OpenAI tests correctly use Path(__file__).parent.parent / "assets" to reach the core assets directory. Either copy the image to this location or fix the path.

Suggested change
@tool(approval_mode="never_require")
image_path = Path(__file__).parent / "assets" / "sample_image.jpg"

Comment on lines +806 to +808
rich_items = [c for c in result if c.type in ("data", "uri")]
return cls(
"function_result",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list comprehension accesses .type and .text on each item assuming they are Content objects. If result is a list of non-Content objects (e.g., plain strings), this raises AttributeError. Consider validating that items are Content instances, similar to how FunctionTool.parse_result handles mixed lists.

Suggested change
rich_items = [c for c in result if c.type in ("data", "uri")]
return cls(
"function_result",
if isinstance(result, list):
if not all(isinstance(c, Content) for c in result):
return cls(
"function_result",
call_id=call_id,
result=str(result),
items=list(items) if items else None,
exception=exception,
annotations=annotations,
additional_properties=additional_properties,
raw_representation=raw_representation,
)
text_parts = [c.text for c in result if c.type == "text" and c.text]
rich_items = [c for c in result if c.type in ("data", "uri")]

a_content.append({
"type": "tool_result",
"tool_use_id": content.call_id,
"content": tool_content,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all items in content.items have unsupported media types and content.result is falsy, tool_content will be []. The Anthropic API may reject an empty content array in a tool_result. Consider falling through to the else branch (plain string result) when tool_content is empty.

Suggested change
"content": tool_content,
"content": tool_content if tool_content else (content.result if content.result is not None else ""),

Comment on lines +643 to +647
if result.type in ("data", "uri"):
return [result]
if result.type == "text" and result.text:
return result.text
if isinstance(result, list) and any(isinstance(item, Content) for item in result):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Content with type='text' but empty/None text falls through to generic serialization via _make_dumpable, which may produce unexpected output for a Content object.

Suggested change
if result.type in ("data", "uri"):
return [result]
if result.type == "text" and result.text:
return result.text
if isinstance(result, list) and any(isinstance(item, Content) for item in result):
if isinstance(result, Content):
if result.type in ("data", "uri"):
return [result]
if result.type == "text":
return result.text or ""

assert response is not None
assert response.text is not None
assert len(response.text) > 0
assert "house" in response.text.lower(), f"Model did not describe the house image. Response: {response.text}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This integration test is the only coverage for the new rich content handling in _prepare_message_for_anthropic. Please add unit tests for the three new branches: (1) content.items with a data-type image item producing a base64 image block, (2) a uri-type image item producing a URL image block, and (3) an unsupported media type item being skipped. These should be added near the existing test_prepare_message_for_anthropic_function_result test.

Comment on lines +578 to +583
if content.items:
logger.warning(
"OpenAI Chat Completions API does not support rich content (images, audio) "
"in tool results. Rich content items will be omitted. "
"Use the Responses API client for rich tool results."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of the if args: all_messages.append(args) + continue changes control flow for ALL function_result messages, not just those with items. This refactoring has no dedicated unit test. Add a test that creates a function_result Content with non-empty items, calls _prepare_message_for_openai, and asserts: (1) the warning is logged, (2) the tool message is still emitted with the text result, and (3) rich items are excluded.

Suggested change
if content.items:
logger.warning(
"OpenAI Chat Completions API does not support rich content (images, audio) "
"in tool results. Rich content items will be omitted. "
"Use the Responses API client for rich tool results."
)
if content.items:
logger.warning(
"OpenAI Chat Completions API does not support rich content (images, audio) "
"in tool results. Rich content items will be omitted. "
"Use the Responses API client for rich tool results."
)
if args:
all_messages.append(args)
continue

assert response is not None
assert isinstance(response, ChatResponse)
assert response.text is not None
assert len(response.text) > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This integration test is the only coverage for the new output_parts building logic in _prepare_content_for_openai (lines 1214-1224 of _responses_client.py). Add a unit test that creates a Content.from_function_result with items=[Content.from_data(...)], calls _prepare_content_for_openai, and asserts the output is a list containing both input_text and image parts.

content_list = [
Content.from_text("Chart rendered."),
Content.from_data(data=b"image_bytes", media_type="image/png"),
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test, but also add a case for a list with only rich items (no text). E.g., Content.from_function_result(call_id='x', result=[Content.from_data(data=b'img', media_type='image/png')]) should produce result='' and items with one data item.

Suggested change
]
def test_from_function_result_with_only_rich_content_list():
"""Test Content.from_function_result with only image items and no text."""
content_list = [
Content.from_data(data=b"image_bytes", media_type="image/png"),
]
result = Content.from_function_result(call_id="test-456", result=content_list)
assert result.type == "function_result"
assert result.result == ""
assert result.items is not None
assert len(result.items) == 1
assert result.items[0].type == "data"

Comment on lines 207 to +222
case _:
parts.append(str(item))
if not parts:
text_parts.append(str(item))

if rich_items:
# Return rich content list preserving original order
result: list[Content] = []
text_idx = 0
rich_idx = 0
for item in mcp_type.content:
match item:
case types.ImageContent() | types.AudioContent():
result.append(rich_items[rich_idx])
rich_idx += 1
case _:
if text_idx < len(text_parts):
result.append(Content.from_text(text_parts[text_idx]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering reconstruction re-iterates mcp_type.content and uses index counters for text_parts and rich_items. This is fragile — if a new content type is added to the first loop but not matched the same way in the second loop, ordering could silently drop items. Consider building the result list in a single pass instead of two passes with index tracking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Feature]: Allow @tool functions to return image content that the model can analyze

5 participants