Kunlun Zhu†, Hongyi Du†, Zhaochen Hong†, Xiaocheng Yang†, Shuyi Guo†, Zhe Wang†, Zhenhailong Wang, Cheng Qian, Xiangru Tang, Heng Ji, Jiaxuan You († core contributors)
ACL 2025
In this paper, we introduce MultiAgentBench, a comprehensive benchmark designed to evaluate LLM-based multi-agent systems across diverse, interactive scenarios. Our framework measures not only task completion but also the quality of collaboration and competition using novel, milestone-based key performance indicators.
Kunlun Zhu†, Hongyi Du†, Zhaochen Hong†, Xiaocheng Yang†, Shuyi Guo†, Zhe Wang†, Zhenhailong Wang, Cheng Qian, Xiangru Tang, Heng Ji, Jiaxuan You († core contributors)
ACL 2025
In this paper, we introduce MultiAgentBench, a comprehensive benchmark designed to evaluate LLM-based multi-agent systems across diverse, interactive scenarios. Our framework measures not only task completion but also the quality of collaboration and competition using novel, milestone-based key performance indicators.
Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, Lingming Zhang
ICML 2024
In this paper, we introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters.
Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, Lingming Zhang
ICML 2024
In this paper, we introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters.