[问题/Issue] 章节12.3.2：简短问题描述 / Chapter12.3: benchmark-GAIA 数据集新旧版本格式不一致

### 1. 遇到问题的章节 / Affected Chapter

Chapter12.3

### 2. 问题类型 / Issue Type

代码错误 / Code Error

### 3. 具体问题描述 / Problem Description

gaia-benchmark目前最新的数据集中没有metadata.jsonl文件，而是用metadata.parquet格式的文件代替，但这两种文件可以互相转换，我在本地已修复该问题，如有需要我可以提交pr。
最新文档对此也做了说明：https://huggingface.co/datasets/gaia-benchmark/GAIA

<img width="498" height="281" alt="Image" src="https://github.com/user-attachments/assets/cb9c970c-4dd5-41ea-9c82-db9e35145edc" />

### 4. 问题重现材料 / Reproduction Materials

代码：
from hello_agents import SimpleAgent, HelloAgentsLLM
from hello_agents.tools import GAIAEvaluationTool

# GAIA官方系统提示词（来自论文）
GAIA_SYSTEM_PROMPT = """You are a general AI assistant. I will ask you a question. Report your thoughts, and finish your answer with the following template: FINAL ANSWER: [YOUR FINAL ANSWER].

YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings.

If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise.

If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise.

If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string."""

# 1. 创建智能体（使用GAIA官方系统提示词）
llm = HelloAgentsLLM()
agent = SimpleAgent(
    name="TestAgent",
    llm=llm,
    system_prompt=GAIA_SYSTEM_PROMPT  # 关键：使用GAIA官方提示词
)

# 2. 创建GAIA评估工具
gaia_tool = GAIAEvaluationTool()

# 3. 一键运行评估
results = gaia_tool.run(
    agent=agent,
    level=1,  # Level 1: 简单任务
    max_samples=5,  # 评估5个样本
    export_results=True,  # 导出GAIA格式结果
    generate_report=True  # 生成评估报告
)

# 4. 查看结果
print(f"精确匹配率: {results['exact_match_rate']:.2%}")
print(f"部分匹配率: {results['partial_match_rate']:.2%}")
print(f"正确数: {results['exact_matches']}/{results['total_samples']}")




日志：
============================================================
GAIA一键评估
============================================================

配置:
   智能体: TestAgent
   难度级别: 1
   样本数量: 5

============================================================
步骤1: 运行HelloAgents评估
============================================================
   正在从HuggingFace下载: gaia-benchmark/GAIA
   📥 下载GAIA数据集...
Fetching 119 files: 100%|██████████| 119/119 [00:00<00:00, 1954.12it/s]
Traceback (most recent call last):
  File "D:\pyApp\evaluation\.venv\Lib\site-packages\hello_agents\tools\builtin\gaia_evaluation_tool.py", line 115, in run
    results = self._run_evaluation(agent, level, max_samples, local_data_dir)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\pyApp\evaluation\.venv\Lib\site-packages\hello_agents\tools\builtin\gaia_evaluation_tool.py", line 169, in _run_evaluation
    raise ValueError("数据集加载失败或为空")
ValueError: 数据集加载失败或为空
Traceback (most recent call last):
  File "D:\pyApp\evaluation\gaia_evaluate.py", line 36, in <module>
    print(f"精确匹配率: {results['exact_match_rate']:.2%}")
                         ~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 'exact_match_rate'
   ✓ 数据集下载完成: D:\pyApp\evaluation\data\gaia
   ⚠️ 未找到metadata文件: D:\pyApp\evaluation\data\gaia\2023\validation\metadata.jsonl
✅ GAIA数据集加载完成
   数据源: gaia-benchmark/GAIA
   分割: validation
   级别: 1
   样本数: 0

❌ 评估失败: 数据集加载失败或为空

### 5. 补充信息 / Additional Information

_No response_

### 确认事项 / Verification

- [x] 我已阅读过相关章节的文档 / I have read the relevant chapter documentation
- [x] 我已搜索过现有的Issues,确认此问题未被报告 / I have searched existing Issues and confirmed this hasn't been reported
- [x] 我已尝试过基本的故障排除(如重启、重新安装依赖等) / I have tried basic troubleshooting (restart, reinstall dependencies, etc.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[问题/Issue] 章节12.3.2：简短问题描述 / Chapter12.3: benchmark-GAIA 数据集新旧版本格式不一致 #347

1. 遇到问题的章节 / Affected Chapter

2. 问题类型 / Issue Type

3. 具体问题描述 / Problem Description

4. 问题重现材料 / Reproduction Materials

GAIA官方系统提示词（来自论文）

1. 创建智能体（使用GAIA官方系统提示词）

2. 创建GAIA评估工具

3. 一键运行评估

4. 查看结果

日志：

GAIA一键评估

============================================================
步骤1: 运行HelloAgents评估

5. 补充信息 / Additional Information

确认事项 / Verification

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[问题/Issue] 章节12.3.2：简短问题描述 / Chapter12.3: benchmark-GAIA 数据集新旧版本格式不一致 #347

Description

1. 遇到问题的章节 / Affected Chapter

2. 问题类型 / Issue Type

3. 具体问题描述 / Problem Description

4. 问题重现材料 / Reproduction Materials

GAIA官方系统提示词（来自论文）

1. 创建智能体（使用GAIA官方系统提示词）

2. 创建GAIA评估工具

3. 一键运行评估

4. 查看结果

日志：

GAIA一键评估

============================================================ 步骤1: 运行HelloAgents评估

5. 补充信息 / Additional Information

确认事项 / Verification

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

============================================================
步骤1: 运行HelloAgents评估