Key Points

1. The paper introduces the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages, ranging in size from 3 to 34 billion parameters. These models are designed to support enterprise software development across a wide range of coding tasks, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more.

2. The evaluation on a comprehensive set of tasks demonstrates that the Granite Code models consistently reach state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software development workflows and performs well across a range of coding tasks, making it a versatile "all around" code model.

3. The Granite Code models family was found to outperform other open-source code LLMs of similar size on various benchmarks, demonstrating strong performance across different kinds of code-related tasks, including code generation, explanation, fixing, editing, translation, etc.

4. The models were released under an Apache 2.0 license for both research and commercial use, and their training data collection, filtering, and preprocessing pipeline were described in detail.

5. The models were trained on 3.5T to 4.5T tokens of code data and natural language datasets related to code, using a two-phase training strategy. The models were trained using the causal language modeling objective and Fill-In-the-Middle (FIM) objective to improve their ability to reason.

6. The research also included the process of crawling and filtering the pretraining code data, aggressive deduplication, and efforts to filter out hateful, abusive, or profane language from the training set. Moreover, high-quality natural language datasets were also included for improving the model's language understanding and mathematical reasoning skills.

7. The study further evaluated the models' performance on a wide variety of tasks, including code generation, code explanation, code fixing, code editing, math reasoning, etc., and showcased their competitive or state-of-the-art performance on different kinds of code-related tasks across various programming languages.

8. The paper presented evaluations of the Granite Code models on other challenging scenarios beyond synthesis, including instruction following, function calling abilities, translation accuracy, and robustness to perturbations and practical robustness in code synthesis.

9. The paper acknowledged numerous teams and individuals for their contributions and support in the realization and evaluation of the Granite Code models, as well as plans for future releases and updates to improve the models' performance.

Summary

The research paper introduces the Granite series of decoder-only code models for code generative tasks. The models are trained with code written in 116 programming languages and range in size from 3 to 34 billion parameters. The research evaluates the performance of these models on various tasks and emphasizes their versatility for enterprise software development workflows. The capabilities of these models include code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. The Granite Code model family was optimized for enterprise software development workflows and performs well across a range of coding tasks, making it a versatile "all around" code model. Furthermore, the paper highlights the release of the Granite Code models under an Apache 2.0 license for both research and commercial use.

The paper also discusses the availability and potential use of these models for enterprise software development workflows, including code generation, code explanation, code fixing, unit test, and documentation generation, application modernization, and vulnerability detection. The paper emphasizes the need for versatile models that can improve the productivity of human programmers and support complex tasks autonomously.

The research evaluates the performance of the Granite Code models on various benchmarks, including HumanEvalPack, MBPP(+) RepoBench, ReCode, MultiPL-E, DS-1000, CodeLingua, CRUXEval-I, and others, to demonstrate their competitiveness against existing open-source code LLMs. The paper also discusses the extensive training strategy for the Granite Code models, including two-phase training, training objectives, optimization techniques, and infrastructure used in pretraining the models. The models were evaluated to demonstrate their capabilities in code generation, code explanation, code fixing, code translation, and mathematical reasoning, among other tasks. These evaluations confirmed that the Granite Code models consistently exhibit strong performance across a wide range of tasks and programming languages.

Acknowledgments and Future Efforts
The research paper acknowledges the contributions of numerous teams and individuals at IBM Research AI and Hybrid Cloud Platform, as well as leaders and teams involved in the development and release of the Granite Code models. The authors express gratitude for the support and efforts of various individuals and teams in driving the development, evaluation, and release of the models. The paper concludes with an acknowledgment of the ongoing efforts to improve and update the Granite Code models, with a focus on their potential use in various applications and their availability under the Apache 2.0 license.

Reference: https://arxiv.org/abs/2405.043...