Key points
1. Gemma introduces a family of lightweight, state-of-the art open models based on Google's Gemini models, with 2 billion and 7 billion parameter sizes, demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety.
2. Gemma models are trained on up to 6T tokens of text, achieving strong generalist capabilities in text domains and state-of-the-art understanding and reasoning skills at scale. Both pretrained and fine-tuned checkpoints are released, along with an open-source codebase for inference and serving.
3. Gemma outperforms similarly sized open models across a wide range of domains such as question answering, commonsense reasoning, mathematics, coding, and more. It builds on recent work on sequence models, transformers, deep learning methods based on neural networks, and techniques for large-scale training on distributed systems.
4. The Gemma model architecture is based on the transformer decoder, with specific improvements such as multi-query attention, RoPE embeddings, GeGLU activations, and RMSNorm.
5. Gemma 2B and 7B models are trained on 2T and 6T tokens respectively of primarily-English data from web documents, mathematics, and code. Filtering methods are used to reduce the risk of unwanted or unsafe utterances, and rigorous evaluations of the models are conducted.
6. Gemma is finetuned with supervised fine-tuning using a mix of text-only, English-only synthetic and human-generated prompt-response pairs, as well as reinforcement learning from human feedback to improve performance on downstream automatic evaluations and human preference evaluations of model outputs.
7. Gemma models demonstrate strong performance on mathematics and coding benchmarks, outperforming other models in several metrics. They are tested for memorization and the inclusion of personal and sensitive data, with measures to mitigate these risks.
8. Safety and responsible deployment are key considerations, with extensive evaluations, ethics and safety assessments, and a structured approach to responsible development and deployment being followed.
9. Continuous research and development of robust mitigation strategies is emphasized, along with the release of a Generative AI Responsible Toolkit to support developers in implementing responsible AI best practices and keep users safe.
Summary
The research paper introduces Gemma, a family of lightweight, state-of-the-art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. The paper discusses the release of two sizes of models (2 billion and 7 billion parameters) with pretrained and fine-tuned checkpoints, emphasizing the responsible release of LLMs for improving the safety of frontier models.
Generalist capabilities and safety evaluations
The models are based on Google’s Gemini models and achieve strong generalist capabilities in text domains, alongside state-of-the-art understanding and reasoning skills at scale. Gemma comes in two sizes to address different computational constraints, applications, and developer requirements. The models are thoroughly evaluated for safety and responsibility aspects, including safety benchmark evaluations and measures to filter unwanted or unsafe utterances.
Advancements and responsible deployment of Gemma models
Gemma advances state-of-the-art performance relative to comparable-scale open models across a wide range of domains, including question answering, commonsense reasoning, mathematics, and coding. Detailed evaluations are provided, including human side-by-side evaluations and comparisons with other open-source language models. The paper also discusses responsible deployment, safety evaluations, and mitigations.
Model architecture, training, and mitigation strategies
The research outlines the model architecture, training infrastructure, tokenization, filtering methodologies, supervised fine-tuning, reinforcement learning from human feedback, energy usage and carbon emissions, and mitigation strategies for potential risks. Additionally, the paper provides detailed information on the release, limitations, broader implications, and conclusions of Gemma models.
The release of Gemma models is presented as a significant benefit to the AI community, enabling downstream developers to create a wide array of applications, while ensuring responsible AI best practices. The paper also emphasizes the need for continuous research and development of robust mitigation strategies for open models and the importance of a nuanced, collaborative approach to risks and benefits in the AI ecosystem.
Reference: https://arxiv.org/abs/2403.08295