LLM Augmented LLMs: Expanding Capabilities through Composition (AI summary)

Key Points

1. Large Language Models (LLMs) have demonstrated non-trivial skills in various domains, but their monolithic structure makes it challenging and expensive to augment or impart new skills. As a result, several new instances of these models are being trained to new domains and tasks.

2. The proposed CALM framework aims to efficiently and practically compose existing foundation models with specific models to enable newer capabilities. It introduces cross-attention between models to compose their representations and enable new capabilities.

3. CALM scales up LLMs on new tasks by reusing existing LLMs along with a few additional parameters and data. It preserves existing capabilities and applies to diverse domains and settings.

4. The paper demonstrates that augmenting a PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English and arithmetic reasoning for low-resource languages.

5. When PaLM2-S is augmented with a code-specific model, there is a relative improvement of 40% over the base model for code generation and explanation tasks, on par with fully fine-tuned counterparts.

6. The proposed CALM framework addresses the challenge of efficiently and practically composing existing foundation models with specific models, enabling the LLMs to be adapted to completely new domains using an augmenting model.

7. The research provides insights into the efficiency and practicality of CALM, particularly in the context of language inclusivity and code generation, demonstrating improved performance for translation, arithmetic reasoning, and code explanation tasks.

8. The paper also discusses the challenges and limitations of existing approaches, such as fine-tuning models and model merging, and presents CALM as a more effective solution to capture the composition setting mentioned above.

9. The research demonstrates the effectiveness of CALM in avoiding catastrophic forgetting and achieving significant performance gains across various tasks and domains, highlighting its potential for enabling new capabilities in large language models.

Summary

The research paper proposes a Composition to Augment Language Models (CALM) framework to enhance large language models' (LLMs) capabilities. It introduces cross-attention between a large language model (LLM) and an augmenting model to enable new tasks more accurately than either model alone, while preserving the individual models' capabilities. The paper presents practical applications of CALM, including language inclusivity and code generation. For example, when augmenting a model with low-resource language training data, CALM significantly outperforms both the base models for translation and arithmetic reasoning tasks. Similarly, the composed model improves performance for code explanation and code completion tasks when combined with a code-specific model. The paper also evaluates the CALM framework's performance in low-resource translation, arithmetic reasoning, and code completion, demonstrating its effectiveness in diverse scenarios. The proposed composition approach outperforms other methods and enables the integration of distinct knowledge from multiple augmenting models.

Efficiency and Comparison of CALM
The paper also discusses the efficiency of the CALM framework by introducing a small number of trainable parameters over both models' intermediate layer representations. Additionally, it highlights the influence of the augmenting model on the composition's performance, the iterative decoding's impact, and the comparison with other fine-tuning methods. Lastly, the paper includes qualitative examples demonstrating CALM's effective composition and provides a detailed computation of the expected parametric and training overhead when using the CALM framework.

Composition to Augment Language Models
The research paper introduces the Composition to Augment Language Models (CALM) framework as a solution for the general model composition setting. The proposed framework aims to combine a large language model (LLM) with domain-specific augmenting models to enable new capabilities, such as code generation and language inclusivity. By introducing a small number of trainable parameters over both the augmenting and anchor models' intermediate layer representations, the CALM framework can perform new challenging tasks more accurately than either of the models alone while preserving the capabilities of individual models.

Practical Applications of CALM
The paper also demonstrates the practical applications of CALM in language inclusivity and code generation, showcasing improved performance in translation, arithmetic reasoning tasks for low-resource languages, and code explanation and code completion tasks. The authors highlight the potential of CALM in addressing real-world challenges related to language inclusivity and code generation.

Training Costs of CALM
Furthermore, the paper discusses the training costs associated with CALM, noting that the total training costs are significantly lesser than training the entire anchor model. The addition of parameters during composition is found to be negligible, and the training cost of CALM is minimal with respect to training the entire anchor model. The experiments consider a scenario in which the net cost of training the augmenting model and CALM is significantly lesser than training the entire anchor model, providing insights into the cost-effectiveness of the proposed framework.

Overall, the paper introduces the CALM framework as a promising approach for model composition, demonstrating its potential in enabling new capabilities, improving performance in various tasks, and offering cost-effective training when compared to training the entire anchor model.

Reference: https://arxiv.org/abs/2401.02412

ML and AI papers

LLM Augmented LLMs: Expanding Capabilities through Composition (AI summary)

Recent posts

Foundational Models Defining a New Era in Vision: A Survey and Outlook (AI summary)

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning (AI summary)

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents (AI summary)