Failure attribution in LLM multi-agent systems—identifying the agent and step responsible for task failures—provides crucial clues for systems debugging but remains underexplored and labor-intensive. In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. To support this initiative, we introduce the Who&When dataset, comprising extensive failure logs from 127 LLM multi-agent systems with fine-grained annotations linking failures to specific agents and decisive error steps. Using the Who\&When, we develop and evaluate three automated failure attribution methods, summarizing their corresponding pros and cons. The best method achieves 53.5\% accuracy in identifying failure-responsible agents but only 14.2\% in pinpointing failure steps, with some methods performing below random. Even SOTA reasoning models, such as OpenAI o1 and DeepSeek R1, fail to achieve practical usability. These results highlight the task's complexity and the need for further research in this area.
Modern LLM multi-agent systems often fail in subtle ways: the final answer is wrong, but it is unclear which agent caused the failure or which step first went wrong. In practice, developers must manually inspect long, multi-agent execution logs to identify the root cause, a process that is slow, error-prone, and requires significant domain expertise.
This manual failure attribution step has become a major bottleneck in agent system development. As agent teams grow larger and interactions become longer, simply knowing that a system failed is no longer actionable without knowing who failed and when.
We argue that failure attribution should be treated as a first-class research problem. This work introduces automated failure attribution: using LLMs to automatically identify the failure-responsible agent and the decisive error step directly from execution logs.
We introduce Who&When, the first benchmark for automated failure attribution in LLM multi-agent systems. The dataset contains failure logs from 127 multi-agent systems, each annotated with: (i) the failure-responsible agent, and (ii) the decisive error step, the earliest mistake whose correction would change failure into success. On top of this benchmark, we study three representative failure attribution strategies that expose fundamental trade-offs:
Our experiments reveal that automated failure attribution is substantially harder than standard evaluation tasks. Even the best-performing approach achieves only 53.5% accuracy in identifying the failure-responsible agent and 14.2% accuracy in pinpointing the exact failure step.
Notably, stronger reasoning models do not solve the problem. State-of-the-art reasoning models such as OpenAI o1 and DeepSeek R1 fail to achieve practical step-level accuracy, and in some settings perform worse than simpler baselines. These results indicate that failure attribution is not merely a matter of stronger reasoning or larger models. Instead, it exposes fundamental limitations in how LLMs retrieve, localize, and reason about errors within long, multi-agent interaction histories.
@inproceedings{
zhang2025which,
title={Which Agent Causes Task Failures and When? On Automated Failure Attribution of {LLM} Multi-Agent Systems},
author={Shaokun Zhang and Ming Yin and Jieyu Zhang and Jiale Liu and Zhiguang Han and Jingyang Zhang and Beibin Li and Chi Wang and Huazheng Wang and Yiran Chen and Qingyun Wu},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=GazlTYxZss}
}