Summary
Introduction and Proposed Approach
The study introduces an innovative approach to enhancing the mathematical reasoning abilities of large language models (LLMs) by leveraging the Monte Carlo Tree Search (MCTS) framework to generate process supervision and evaluation signals automatically. The study aims to address the limitations of LLMs in complex mathematical reasoning by automatically generating process supervision and evaluation signals, eliminating the need for manual annotation. The proposed approach involves training a step-level value model designed to improve the LLM’s inference process in mathematical domains, ultimately enhancing proficiency in dealing with intricate mathematical reasoning tasks.
Utilization of LLMs with MCTS Framework
The approach utilizes the LLMs integrated with the MCTS framework to strike a more effective balance between exploration and exploitation, allowing the generation of high-quality training data without professional human annotations. The study extends the research line of Tree of Thoughts, taking it beyond a purely LLM inference framework and demonstrating the potential for autonomous evolution of the model without human knowledge. By applying the MCTS framework, the study aims to autonomously produce high-quality mathematics reasoning data.
Detailed Methodology
The study presents detailed methodology involving the iterative training approach to improve LLMs' mathematical reasoning capabilities. It involves the process of segmenting the solution into multiple reasoning steps within the context of reinforcement learning, where the policy model is embodied by a large language model (LLM), and the transition from one state to the next is accomplished through a simple operation of concatenation. The study also focuses on developing a step-level value model capable of assessing the confidence in the correctness of partial solutions and guiding the LLM in generating subsequent reasoning steps.
The experimental results demonstrate that the integration of LLMs with the value model and the MCTS framework can progressively generate high-quality math reasoning data autonomously. The study evaluates the proposed approach on in-domain and out-of-domain test sets, showing enhanced performance in intricate mathematical reasoning tasks. The results indicate an improvement of over 20 points for challenging problems in the MATH and Gaokao2023 datasets, and an improvement of more than 10 points for grade school math problems. Additionally, the study presents an analysis of computational efficiency and compares different inference strategies, emphasizing the effectiveness of the proposed approach even in the absence of high-quality GPT-4 or human-annotated solution processes.
Overall, the study provides a comprehensive methodology for enhancing the mathematical reasoning capabilities of LLMs and demonstrates the effectiveness of the proposed approach through experimental evaluations on various datasets. The findings indicate that the proposed approach remains competitive with or surpasses the performance of the state-of-the-art (SOTA) on 7B LLMs, offering an innovative solution to address the limitations of LLMs in intricate mathematical reasoning tasks.
Reference: https://arxiv.org/abs/2405.035...