Key Points

1. The paper proposes a novel method called L ONG AGENT, which leverages multi-agent collaboration to scale large language models (LLMs) to handle a context of 128K. This method demonstrates potential superiority in long-text processing compared to advanced models like GPT-4 and Claude2, addressing the challenges of expensive training costs and high inference latency associated with LLMs with long context windows.

2. Existing large language models (LLMs) have limitations in handling long texts, with approaches like positional encoding adaptation and intricate mechanisms offering advantages in computational efficiency but having limitations in capturing long-term dependencies.

3. L ONG AGENT comprises a leader responsible for understanding user intent and directing team members to acquire information from documents, along with a mechanism for inter-member communication to resolve response conflicts caused by hallucinations through information sharing.

4. The contributions of the research work include proposing L ONG AGENT to effectively handle long texts exceeding 100k tokens, constructing a larger benchmark called Needle in the Haystack PLUS for more comprehensive evaluation of LLMs’ long-text capabilities, and demonstrating through experimental results that L ONG AGENT exhibits potential surpassing GPT-4 in long text processing.

5. The working mechanism of L ONG AGENT involves selecting members based on task requirements and coordinating them to collaborate in searching for clues from their respective text chunks, resolving conflicts caused by member hallucinations, and deducing the final answer.

6. Member selection for L ONG AGENT involves utilizing expert models to construct task-specific agent teams, with the paper discussing the construction of expert models and the process of selecting members based on a natural language description of a task to be processed.

7. The research compares and evaluates the performance of L ONG AGENT against commercial models like GPT-4 Turbo and Claude 2.1, as well as state-of-the-art academic methods such as PI, YARN, and ReRoPE, showcasing significant improvements in accuracy across different document lengths and settings.

8. L ONG AGENT's inter-member communication mechanism and chunking of long texts are shown to effectively alleviate model hallucination problems and improve accuracy across different input text lengths.

9. The proposed L ONG AGENT method offers a promising alternative for long-text processing, demonstrating potential superiority over existing models in handling long texts and offering efficiency benefits in time and memory usage.

Summary

LONGAGENT: A Novel Method for Long-Text Processing
The research paper proposes a novel method called LONGAGENT, which is based on multi-agent collaboration, to scale large language models (LLMs) with 4k context size to effectively handle long texts exceeding 100k tokens. The method demonstrates potential superiority in long-text processing compared to GPT-4. In LONGAGENT, a leader is responsible for understanding user intent and directing team members to acquire information from documents. To address response conflicts caused by hallucinations, an intermember communication mechanism is introduced. Experimental results show significant improvements in tasks such as 128k-long text retrieval and multi-hop question answering compared to GPT-4.

Challenges of Extending Pre-trained LLMs' Context Window
The paper discusses the challenges of extending the context window of pre-trained LLMs and reviews existing methods. It introduces the LONGAGENT collaboration scheme, which involves the leader coordinating members to process text and acquire relevant information to reason the final response. Furthermore, it selects task-specific expert models to instantiate members and demonstrates improved accuracy through inter-member communication. The paper also introduces a new benchmark, Needle-in-a-Haystack PLUS, for comprehensive evaluation of LLMs' long-text capabilities and compares LONGAGENT with other state-of-the-art models and methods.

Experimental Results and Comparative Analysis
The experiment demonstrates that LONGAGENT outperforms GPT-4 in single-document and multi-document settings, as well as in various synthetic tasks. The method also exhibits superior time and memory efficiency compared to directly performing full attention on long texts. However, LONGAGENT still has some limitations, such as the higher construction cost of interaction trajectories and the high demand on the leader's reasoning and generalization abilities for complex problems.

Conclusion and Future Considerations
In conclusion, the research paper introduces LONGAGENT as a promising alternative for long-text processing, demonstrating its potential to effectively handle long texts exceeding 100k tokens through multi-agent collaboration and intermember communication.

Reference: https://arxiv.org/abs/2402.115...