AgentGym: Evolving Large Language Model-based Agents across Diverse Environments (AI summary)

Key Points

1. The paper presents AGENT GYM, a new framework designed to help develop generally-capable LLM-based agents and explore their self-evolution.

2. AGENT GYM includes an interactive platform with 14 diverse environments, 89 types of tasks spanning web, embodied, text games, and more, supporting real-time feedback and concurrent interactions.

3. The framework includes expanded instructions, a benchmark suite AGENT EVAL, and high-quality trajectory sets AGENT TRAJ and AGENT TRAJ-L collected through crowdsourcing and state-of-the-art models.

4. The paper proposes a novel method called AGENT EVOL to explore self-evolution in generally-capable LLM-based agents, allowing them to adapt to previously unseen tasks and instructions.

5. The experiments show AGENT EVOL can achieve comparable or better performance than state-of-the-art closed-source and open-source models on the diverse tasks.

6. Ablation studies demonstrate the effectiveness of the data merging strategy, the exploration scope, and the use of both successful and failed trajectories in the evolution process.

7. The method can generalize to different backbone LLM models beyond the primary one used, showcasing its broad applicability.

8. The paper highlights the need for developing generally-capable agents that can handle a wide range of tasks, going beyond current specialized or imitation-based approaches.

9. The AGENT GYM framework, including the platform, dataset, benchmark, and algorithm implementations, will be publicly released to support further research in this direction.

Summary

This research paper presents a new framework called AGENT GYM for developing and evaluating generally-capable large language model (LLM) based agents that can evolve across diverse environments. The key goals are to build agents that can handle a wide range of tasks through imitation learning and then self-evolve by interacting with and learning from different environments.

The paper identifies three key ingredients for this research: 1) diverse environments and tasks to allow broad exploration and learning, 2) a dataset of high-quality trajectories to bootstrap the agents with basic skills and knowledge, and 3) an effective self-evolution method that can adapt to various environments.

To address these needs, the paper introduces the AGENT GYM framework, which includes:
Building on this framework, the paper proposes a novel self-evolution method called AGENT EVOL. This approach alternates between exploration, where the agent interacts with the environments to collect new trajectories, and learning, where the agent updates its policy to maximize the expected rewards on the collected data.

Experimental results show that the AGENT EVOL agents are able to achieve comparable or better performance than state-of-the-art closed-source and open-source models, as well as agents trained only through behavioral cloning. This demonstrates the promise of self-evolution for developing generally-capable LLM-based agents.

The paper also conducts extensive ablation studies to analyze the impact of factors like data merging strategies, number of exploration iterations, and the use of both successful and failed trajectories. These analyses provide insights into the workings of the AGENT EVOL method.

In summary, this work takes an important step towards the long-standing goal of building generalist AI agents with the ability to evolve across diverse environments. The AGENT GYM framework and the AGENT EVOL method offer a new research direction and tools for the community to advance this frontier.

Reference: https://arxiv.org/abs/2406.04151

ML and AI papers

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments (AI summary)

Recent posts

Foundational Models Defining a New Era in Vision: A Survey and Outlook (AI summary)

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning (AI summary)

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents (AI summary)