Key Points
- The paper introduces a novel post-pretraining method, called block expansion, for Large Language Models (LLMs), to enhance domain-specific abilities while preserving the original general capabilities.
- The proposed method involves expanding the pre-trained LLM using copied Transformer blocks and tuning the added blocks with only domain-specific corpus while freezing the original blocks.
- Through experimentation and evaluation, the paper demonstrates the effectiveness of the proposed method with LLAMA PRO, an 8.3B parameter model, excelling in general tasks, programming, and mathematics.
- The paper provides experimental findings and comparisons, showing that LLAMA PRO maintains strong general performance while achieving competitive code-specific performance.
- The study emphasizes the importance of balancing general and domain-specific abilities in LLMs and demonstrates a promising approach to achieving this balance with the block expansion method.
- The paper also explores the scalability of the block expansion method and analyzes the impact of adding blocks at different positions within the model.
Summary
The research paper proposes a novel post-pretraining method for Large Language Models (LLMs) called block expansion. This method aims to inject domain-specific knowledge into pre-trained LLMs while preserving their general capabilities. The authors showcase the effectiveness of the approach by extending the pre-trained LLM, LLAMA2-7B, to create LLAMA PRO, a foundation model with 8.3B parameters and enhanced performance in programming, coding, and reasoning. Additionally, they introduce LLAMA PRO -INSTRUCT and demonstrate the model's state-of-the-art performance across general, code, and math tasks, as well as its proficiency as a language agent in various scenarios. The paper emphasizes three main contributions: the proposal of the block expansion method for LLMs, the introduction of LLAMA PRO and LLAMA PRO -INSTRUCT, and the benchmarking of LLAMA PRO across extensive datasets, showcasing its potential for broader complex applications.
Implementation and Performance of the Block Expansion Method
The paper starts by highlighting the shortcomings of traditional LLMs in specialized domains and the challenges of injecting domain-specific knowledge without compromising general capabilities. The proposed block expansion method involves expanding the pre-trained LLM using copied Transformer blocks and then fine-tuning the added blocks with domain-specific corpus while freezing the original blocks. The resulting LLAMA PRO model with 8.3B parameters excels in both general and domain-specific tasks, showcasing the efficacy of the block expansion method.
Extensive experiments and evaluations demonstrate the superiority of LLAMA PRO and LLAMA PRO -INSTRUCT over existing LLMs, both in general and domain-specific tasks. The authors compare the performance of their method to other prominent LLMs and demonstrate its scalability and efficacy in preserving general capabilities while acquiring domain-specific knowledge. The paper also provides insights into the impact of different training strategies and the positioning of added blocks in the model. Additionally, the paper showcases the potential of LLAMA PRO as a language agent through its performance in conversation and reasoning tasks.
Overall, the paper introduces a promising method for enhancing LLMs with domain-specific knowledge while maintaining their general capabilities, paving the way for broader and more effective applications of large language models.
Reference: https://arxiv.org/abs/2401.02415