Key Points

1. The paper proposes the Iteration of Thought (IoT) framework for enhancing LLM responses by dynamically generating "thought"-provoking prompts based on the input query and the current iteration of the LLM's response.

2. IoT is composed of three main components: an Inner Dialogue Agent (IDA) that generates context-sensitive prompts, an LLM Agent (LLMA) that processes the prompts to refine its responses, and an iterative prompting loop between the IDA and LLMA.

3. The paper introduces two variants of IoT: Autonomous Iteration of Thought (AIoT), where the LLM decides when to stop iterating, and Guided Iteration of Thought (GIoT), which enforces a fixed number of iterations.

4. Experiments on the GPQA Diamond dataset show that AIoT outperforms Chain-of-Thought (CoT) and the baseline Input-Output approach, with a 14.11% improvement in average accuracy and lower variance.

5. On explorative problem-solving tasks like Game of 24 and Mini Crosswords, GIoT on average outperforms AIoT, CoT, and the Input-Output approach, demonstrating the benefits of forced exploration.

6. On the HotpotQA-Hard multi-hop question answering dataset, AIoT achieves higher Exact Match, F1, and ROUGE-L scores compared to CoT, showcasing the advantages of dynamic, adaptive reasoning.

7. The paper compares AIoT's performance on HotpotQA-Hard to the AgentLite framework, finding that AIoT outperforms even the most capable models used in AgentLite.

8. The paper discusses the inherent transparency and explainability of the IoT framework, as well as opportunities for further extensions, such as incorporating ensemble-based IDA agents.

9. The paper highlights IoT's potential benefits in autonomous reasoning scenarios where human intervention is impractical, as well as its value in fine-tuning existing models by leveraging the generated thought sequences.

Summary

Introduction: Iteration of Thought (IoT) Framework
This paper introduces the Iteration of Thought (IoT) framework, which aims to enhance the reasoning capabilities of large language models (LLMs) through an iterative process of generating thought-provoking prompts and refining the model's output. The IoT framework consists of three key components: 1. Inner Dialogue Agent (IDA): The IDA generates context-sensitive prompts based on the original query and the LLM's previous response, in order to guide the LLM towards more refined and accurate answers. 2. LLM Agent (LLMA): The LLMA embodies the core reasoning capabilities of the LLM and processes the prompts generated by the IDA, using the model's internal knowledge base to refine its responses. 3. Iterative prompting loop: The framework involves a back-and-forth between the IDA and LLMA, with the IDA generating new prompts based on the previous response, and the LLMA providing a refined output. This iterative process continues until a satisfactory answer is found or the maximum iteration count is reached.

Variants of the IoT Framework
The paper presents two variants of the IoT framework: Autonomous Iteration of Thought (AIoT) and Guided Iteration of Thought (GIoT). In AIoT, the LLMA decides autonomously when to stop iterating, while in GIoT, the number of iterations is fixed.

Evaluation on Various Datasets and Tasks
The researchers evaluate the IoT framework on various datasets and tasks, including complex reasoning tasks from the GPQA dataset, explorative problem-solving in the Game of 24 and Mini Crosswords, and multi-hop question answering from the HotpotQA dataset. The results demonstrate significant improvements over static approaches like Chain of Thought (CoT), with the IoT framework showcasing more adaptive and efficient reasoning capabilities.

Specifically, the paper finds that the AIoT variant outperforms CoT on the GPQA dataset, achieving a 14.11% improvement in average accuracy. On the Game of 24 and Mini Crosswords tasks, the GIoT variant shows notable gains over CoT, with a 266.4% and 92.6% improvement, respectively. Finally, on the HotpotQA-Hard dataset, the AIoT approach outperforms the AgentLite framework, achieving higher F1 and Exact Match scores.

The authors conclude that the IoT framework represents a viable paradigm for autonomous response refinement in LLMs, enabling more adaptive and efficient reasoning systems that minimize the need for human intervention. The paper also discusses potential future directions, such as expanding the knowledge base of the IDA and addressing challenges like hallucination and premature termination of iterations.

Reference: https://arxiv.org/abs/2409.12618