Key Points
1. The paper proposes RISE (Recursive IntroSpEction), an approach for fine-tuning large language models (LLMs) to enable them to introspect on their behavior and sequentially improve their responses.
2. Even the strongest proprietary LLMs do not exhibit the ability to continuously improve their responses in scenarios where they are explicitly told they are making a mistake.
3. RISE poses fine-tuning for a single-turn prompt as solving a multi-turn Markov decision process (MDP), where the initial state is the prompt.
4. RISE utilizes an iterative fine-tuning procedure that attempts to teach the model how to alter its response after having executed previously unsuccessful attempts to solve a hard test-time problem.
5. RISE enables LLMs like LLaMa2, LLaMa3, and Mistral to improve themselves with more turns on math reasoning tasks, outperforming several single-turn strategies given an equal amount of inference-time computation.
6. RISE scales well, often attaining larger benefits with more capable models.
7. RISE makes meaningful improvements to responses to arrive at the correct solution for challenging prompts, without disrupting one-turn abilities.
8. Recursive introspection and self-improvement is crucial for enabling intelligent agentic behavior in foundation models.
9. Prior work has hypothesized that the capability to continuously improve responses may not be possible to attain, but RISE demonstrates that it is indeed possible to train models with this capability.
Summary
This paper introduces RISE: Recursive IntroSpEction, an approach for fine-tuning large language models (LLMs) to enable them to introspect on their behavior, reason, and correct their mistakes through iterative responses.
The paper discusses the challenges of existing LLMs in exhibiting this capability and how RISE addresses it. 1. Existing strong LLMs do not exhibit the ability to continually improve their responses sequentially, even when explicitly told they are making a mistake. RISE aims to address this limitation. 2. RISE prescribes an iterative fine-tuning procedure, which attempts to teach the model how to alter its response after having executed previously unsuccessful attempts to solve a hard test-time problem, with optional additional environment feedback. 3. RISE poses the fine-tuning as solving a multi-turn Markov decision process (MDP), where the initial state is the prompt. Inspired by principles in online imitation learning and reinforcement learning, RISE proposes strategies for multi-turn data collection and training. 4. Experiments show that RISE enables LLaMA2, LLaMA3, and Mistral models to improve themselves with more turns on math reasoning tasks, outperforming several single-turn strategies given an equal amount of inference-time computation. RISE scales well, often attaining larger benefits with more capable models. 5. Analysis shows that RISE makes meaningful improvements to responses to arrive at the correct solution for challenging prompts, without disrupting one-turn abilities as a result of expressing more complex distributions.
The paper concludes that RISE is an effective approach for training LLMs to exhibit introspective and self-improving capabilities, a crucial aspect for enabling intelligent agentic behavior in foundation models. The results demonstrate the potential of RISE to significantly enhance the mathematical reasoning capabilities of language models.
Reference: https://arxiv.org/abs/2407.18219