Automated Unit Test Improvement using Large Language Models at Meta (AI summary)

Key Points

1. The research paper introduces TestGen-LLM, a tool developed by Meta, that uses Large Language Models (LLMs) to automatically improve existing human-written tests by generating additional test cases.

2. TestGen-LLM was deployed at Meta's test-a-thons for the Instagram and Facebook platforms, where it improved 11.5% of all classes to which it was applied, with 73% of its test improvements being accepted for production deployment by Meta software engineers.

3. The tool uses Assured Offline LLM-Based Software Engineering (Assured Offline LLMSE) to embed the language models as a service in a larger software engineering workflow that ultimately recommends fully formed software improvements backed by verifiable guarantees for improvement and non-regression of existing behavior.

4. TestGen-LLM verified that its generated test cases successfully clear a set of filters, assuring measurable improvement over the original test suite and eliminating problems due to LLM hallucination.

5. The filtration process can be used to evaluate the performance of a specific LLM, prompt strategy, or choice of hyper-parameters.

6. During the evaluation on Reels and Stories products for Instagram, 75% of TestGen-LLM's test cases built correctly, 57% passed reliably, and 25% increased coverage.

7. TestGen-LLM improved 10% of the test classes to which it was applied and 73% of its test improvements were accepted by developers and landed into production across different Instagram and Facebook test-a-thons.

8. TestGen-LLM applies a series of progressively demanding semantic filters to candidate solutions generated by the language models.

9. TestGen-LLM successfully mimicked existing test writing styles and enabled the engineers to accept or reject the recommendations per test case.

Summary

The research paper describes Meta's TestGen-LLM tool, which uses Large Language Models (LLMs) to automatically improve existing human-written tests. The tool verifies that the generated test classes successfully clear a set of filters to assure measurable improvement over the original test suite. The paper highlights the deployment and evaluation of TestGen-LLM at Meta's test-a-thons for the Instagram and Facebook platforms.

During the evaluation on Reels and Stories products for Instagram, 75% of TestGen-LLM's test cases built correctly, 57% passed reliably, and 25% increased coverage. The tool was successful in improving 11.5% of all classes to which it was applied during the Instagram and Facebook test-a-thons, with 73% of its recommendations being accepted for production deployment by Meta software engineers. This represents the first report on industrial scale deployment of LLM-generated code backed by assurances of code improvement.

TestGen-LLM uses Assured Offline LLM-Based Software Engineering (Assured Offline LLMSE) to embed the language models, as a service, in a larger software engineering workflow that ultimately recommends fully formed software improvements rather than smaller code snippets. Key areas for future work and open problems include assessing improvement, resolving the probability distribution for application-specific use, and further understanding LLMs' tendency to mimic coding styles.

The paper also introduces TestGen-LLM's primary characteristics and its use in automatically generating additional test cases that improve upon the existing code base.

Reference: https://arxiv.org/abs/2402.09171

ML and AI papers

Automated Unit Test Improvement using Large Language Models at Meta (AI summary)

Recent posts

Foundational Models Defining a New Era in Vision: A Survey and Outlook (AI summary)

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning (AI summary)

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents (AI summary)