Key Points
1. Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving, yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.
2. The paper presents OPTIMA, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness in LLM-based MAS through LLM training.
3. OPTIMA employs an iterative generate, rank, select, and train paradigm with a reward function balancing task performance, token efficiency, and communication readability.
4. The paper explores various RL algorithms, including Supervised Fine-Tuning, Direct Preference Optimization, and their hybrid approaches, providing insights into their effectiveness-efficiency trade-offs.
5. OPTIMA integrates Monte Carlo Tree Search-inspired techniques for DPO data generation, treating conversation turns as tree nodes to explore diverse interaction paths.
6. Evaluated on common multi-agent tasks, including information-asymmetric question answering and complex reasoning, OPTIMA shows consistent and substantial improvements over single-agent baselines and vanilla MAS based on Llama 3 8B.
7. OPTIMA achieves up to 2.8x performance gain with less than 10% tokens on tasks requiring heavy information exchange.
8. OPTIMA's efficiency gains open new possibilities for leveraging inference-compute more effectively, leading to improved inference-time scaling laws.
9. By addressing fundamental challenges in LLM-based MAS, OPTIMA shows the potential towards scalable, efficient, and effective MAS."
Summary
Introduction of OPTIMA
The paper introduces a novel framework called OPTIMA for improving the communication efficiency and task effectiveness of Large Language Model (LLM)-based multi-agent systems (MAS). LLM-based MAS have shown remarkable potential in collaborative problem-solving, but they still face critical challenges such as low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.
Addressing Issues with OPTIMA
OPTIMA addresses these issues through an iterative generate, rank, select, and train paradigm with a reward function that balances task performance, token efficiency, and communication readability. The framework explores various reinforcement learning algorithms, including Supervised Fine-Tuning, Direct Preference Optimization, and hybrid approaches, to provide insights into their effectiveness-efficiency trade-offs. It also integrates Monte Carlo Tree Search-inspired techniques for generating high-quality data for Direct Preference Optimization. Evaluated on common multi-agent tasks like information-asymmetric question answering and complex reasoning, OPTIMA demonstrates consistent and substantial improvements over single-agent baselines and vanilla MAS. It achieves up to a 2.8x performance gain with less than 10% of the tokens used by baselines on tasks requiring heavy information exchange. The efficiency gains of OPTIMA also open new possibilities for leveraging inference-compute more effectively, potentially leading to improved inference-time scaling laws.
Advancements in OPTIMA
By addressing the fundamental challenges in LLM-based MAS, OPTIMA shows the potential towards scalable, efficient, and effective multi-agent systems. The framework's ability to simultaneously optimize communication efficiency and task effectiveness represents an important advance in leveraging the collective intelligence of LLM-based agents for collaborative problem-solving.
Reference: https://arxiv.org/abs/2410.08115