Policy Improvement using Language Policy Improvement using Language Feedback Models (AI summary)

Key Points

1. The article introduces Language Feedback Models (LFMs) for policy improvement in instruction-following tasks, using Large Language Models (LLMs) to provide feedback on visual trajectories verbalized to language descriptions.

2. LFMs improve task-completion rate over strong behavioral cloning baselines on three distinct language grounding environments: Touchdown, ScienceWorld, and ALFWorld.

3. LFMs outperform using LLMs as experts to directly predict actions when controlling for the number of LLM output tokens and generalize to unseen environments, improving task-completion rate by 3.5-12.0% through one round of adaptation.

4. LFMs are shown to be cost-effective and sample-efficient, requiring few LLM interactions to collect an off-line dataset during training, as opposed to many LLM interactions on-line during policy improvement.

5. The detailed feedback provided by LFMs, offering explanations and interpretations of desirable behavior, promotes user trust in the quality of the imitation learning data and subsequent policy behavior.

6. LFMs are more cost-effective than using LLMs for action prediction and generalize to new environments, allowing for policy adaptation without additional LLM usage nor demonstrations.

7. The article presents a comparison of LFMs to directly using LLMs as experts for imitation learning, and LFMs consistently outperform both behavioral cloning and using LLMs as experts for imitation learning, without using LLMs during policy improvement.

8. LFMs can provide detailed human-interpretable feedback that humans can inspect and verify to create more trustworthy policies.

9. The article outlines the potential beneficial societal impacts of LFMs, such as the development of cost-effective computer agents that quickly learn to accurately follow human commands, while also acknowledging potential negative consequences, including hallucinations by LLMs that mislead feedback model training.

Summary

Introduction
The research paper introduces Language Feedback Models (LFMs) for improving instruction-following through imitation learning. LFMs are trained to identify desirable behavior for imitation by obtaining feedback from Large Language Models (LLMs) on visual trajectories verbalized to language descriptions.

The study demonstrates that using LFMs leads to improved task-completion rates over strong behavioral cloning baselines on three distinct language grounding environments. It also outperforms using LLMs as experts to directly predict actions and generalizes to unseen environments, achieving significant gains through one round of adaptation.

Moreover, the study modifies LFMs to provide human-interpretable feedback without performance loss, allowing human verification of desirable behavior for imitation learning without additional LLM usage. The paper emphasizes the cost-effectiveness and generalizability of LFMs for learning instruction-following agents in grounded environments. The summary also highlights the comparison between LFMs and prompt-LLMs for direct action prediction, demonstrating the higher policy improvement achieved with LFMs even with a fixed allocation of LLM output tokens.

The study suggests the potential societal consequences, including potential issues such as hallucinations by LLMs that may mislead feedback model training.

Research Findings
In addition, the research found that LFMs not only improve policy performance but also generalize to new environments, offering significant policy adaptation gains without additional LLM usage or demonstrations.

Furthermore, the paper explores the provision of detailed human-interpretable feedback by LFMs, which is proven to be cost-effective and accurate, allowing human verification of imitation data for creating trustworthy policies. The conclusion of the study highlights the potential societal consequences, including hallucinations by LLMs that mislead feedback model training, and suggests future research directions to enhance the robustness and trustworthiness of language feedback models.

Reference: https://arxiv.org/abs/2402.078...

ML and AI papers

Policy Improvement using Language Policy Improvement using Language Feedback Models (AI summary)

Recent posts

Foundational Models Defining a New Era in Vision: A Survey and Outlook (AI summary)

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning (AI summary)

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents (AI summary)