Key Points

1. Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving, yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.

2. The paper presents OPTIMA, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness in LLM-based MAS through LLM training.

3. OPTIMA employs an iterative generate, rank, select, and train paradigm with a reward function balancing task performance, token efficiency, and communication readability.

4. The paper explores various RL algorithms, including Supervised Fine-Tuning, Direct Preference Optimization, and their hybrid approaches, providing insights into their effectiveness-efficiency trade-offs.

5. OPTIMA integrates Monte Carlo Tree Search-inspired techniques for DPO data generation, treating conversation turns as tree nodes to explore diverse interaction paths.

6. Evaluated on common multi-agent tasks, including information-asymmetric question answering and complex reasoning, OPTIMA shows consistent and substantial improvements over single-agent baselines and vanilla MAS based on Llama 3 8B.

7. OPTIMA achieves up to 2.8x performance gain with less than 10% tokens on tasks requiring heavy information exchange.

8. OPTIMA's efficiency gains open new possibilities for leveraging inference-compute more effectively, leading to improved inference-time scaling laws.

9. By addressing fundamental challenges in LLM-based MAS, OPTIMA shows the potential towards scalable, efficient, and effective MAS."

Summary

Introduction of OPTIMA
The paper introduces a novel framework called OPTIMA for improving the communication efficiency and task effectiveness of Large Language Model (LLM)-based multi-agent systems (MAS). LLM-based MAS have shown remarkable potential in collaborative problem-solving, but they still face critical challenges such as low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.

Addressing Issues with OPTIMA
OPTIMA addresses these issues through an iterative generate, rank, select, and train paradigm with a reward function that balances task performance, token efficiency, and communication readability. The framework explores various reinforcement learning algorithms, including Supervised Fine-Tuning, Direct Preference Optimization, and hybrid approaches, to provide insights into their effectiveness-efficiency trade-offs. It also integrates Monte Carlo Tree Search-inspired techniques for generating high-quality data for Direct Preference Optimization. Evaluated on common multi-agent tasks like information-asymmetric question answering and complex reasoning, OPTIMA demonstrates consistent and substantial improvements over single-agent baselines and vanilla MAS. It achieves up to a 2.8x performance gain with less than 10% of the tokens used by baselines on tasks requiring heavy information exchange. The efficiency gains of OPTIMA also open new possibilities for leveraging inference-compute more effectively, potentially leading to improved inference-time scaling laws.

Advancements in OPTIMA
By addressing the fundamental challenges in LLM-based MAS, OPTIMA shows the potential towards scalable, efficient, and effective multi-agent systems. The framework's ability to simultaneously optimize communication efficiency and task effectiveness represents an important advance in leveraging the collective intelligence of LLM-based agents for collaborative problem-solving.

Reference: https://arxiv.org/abs/2410.08115