-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
1. 遇到问题的章节 / Affected Chapter
Chapter12.3
2. 问题类型 / Issue Type
代码错误 / Code Error
3. 具体问题描述 / Problem Description
gaia-benchmark目前最新的数据集中没有metadata.jsonl文件,而是用metadata.parquet格式的文件代替,但这两种文件可以互相转换,我在本地已修复该问题,如有需要我可以提交pr。
最新文档对此也做了说明:https://huggingface.co/datasets/gaia-benchmark/GAIA
4. 问题重现材料 / Reproduction Materials
代码:
from hello_agents import SimpleAgent, HelloAgentsLLM
from hello_agents.tools import GAIAEvaluationTool
GAIA官方系统提示词(来自论文)
GAIA_SYSTEM_PROMPT = """You are a general AI assistant. I will ask you a question. Report your thoughts, and finish your answer with the following template: FINAL ANSWER: [YOUR FINAL ANSWER].
YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings.
If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise.
If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise.
If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string."""
1. 创建智能体(使用GAIA官方系统提示词)
llm = HelloAgentsLLM()
agent = SimpleAgent(
name="TestAgent",
llm=llm,
system_prompt=GAIA_SYSTEM_PROMPT # 关键:使用GAIA官方提示词
)
2. 创建GAIA评估工具
gaia_tool = GAIAEvaluationTool()
3. 一键运行评估
results = gaia_tool.run(
agent=agent,
level=1, # Level 1: 简单任务
max_samples=5, # 评估5个样本
export_results=True, # 导出GAIA格式结果
generate_report=True # 生成评估报告
)
4. 查看结果
print(f"精确匹配率: {results['exact_match_rate']:.2%}")
print(f"部分匹配率: {results['partial_match_rate']:.2%}")
print(f"正确数: {results['exact_matches']}/{results['total_samples']}")
日志:
GAIA一键评估
配置:
智能体: TestAgent
难度级别: 1
样本数量: 5
============================================================
步骤1: 运行HelloAgents评估
正在从HuggingFace下载: gaia-benchmark/GAIA
📥 下载GAIA数据集...
Fetching 119 files: 100%|██████████| 119/119 [00:00<00:00, 1954.12it/s]
Traceback (most recent call last):
File "D:\pyApp\evaluation.venv\Lib\site-packages\hello_agents\tools\builtin\gaia_evaluation_tool.py", line 115, in run
results = self._run_evaluation(agent, level, max_samples, local_data_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\pyApp\evaluation.venv\Lib\site-packages\hello_agents\tools\builtin\gaia_evaluation_tool.py", line 169, in _run_evaluation
raise ValueError("数据集加载失败或为空")
ValueError: 数据集加载失败或为空
Traceback (most recent call last):
File "D:\pyApp\evaluation\gaia_evaluate.py", line 36, in
print(f"精确匹配率: {results['exact_match_rate']:.2%}")
~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 'exact_match_rate'
✓ 数据集下载完成: D:\pyApp\evaluation\data\gaia
✅ GAIA数据集加载完成
数据源: gaia-benchmark/GAIA
分割: validation
级别: 1
样本数: 0
❌ 评估失败: 数据集加载失败或为空
5. 补充信息 / Additional Information
No response
确认事项 / Verification
- 我已阅读过相关章节的文档 / I have read the relevant chapter documentation
- 我已搜索过现有的Issues,确认此问题未被报告 / I have searched existing Issues and confirmed this hasn't been reported
- 我已尝试过基本的故障排除(如重启、重新安装依赖等) / I have tried basic troubleshooting (restart, reinstall dependencies, etc.)