Key Points
1. Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation.
2. Researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks, which are equipped with the ability to use tools, run commands, observe feedback from the environment, and plan for future actions.
3. The complexity of these agent-based approaches, together with the limited abilities of current LLMs, raises the question of whether we really have to employ complex autonomous software agents.
4. A GENTLESS is an agentless approach that employs a simplistic two-phase process of localization followed by repair, without letting the LLM decide future actions or operate with complex tools.
5. A GENTLESS achieves the highest performance (27.33%) and lowest cost ($0.34) compared with all existing open-source software agents on the SWE-bench Lite benchmark.
6. The SWE-bench Lite benchmark contains problems with exact ground truth patches in the description, missing critical information, and misleading solutions, which are identified and removed to construct SWE-bench Lite-S for more rigorous evaluation.
7. A GENTLESS highlights the overlooked potential of a simple, interpretable technique in autonomous software development, and aims to reset the baseline, starting point, and horizon for future autonomous software agents.
8. A GENTLESS is able to solve unique issues that no other existing open-source agent can resolve, and even offers unique fixes compared to top commercial solutions.
9. The detailed classification and analysis of problems in SWE-bench Lite provide insights on the types of problems that can be solved by existing and future approaches."
Summary
This research paper presents a new approach called AGENTLESS that tackles software development tasks using large language models (LLMs), in contrast to complex autonomous software agents. While recent advancements in LLMs have enabled the automation of various software development tasks like code synthesis, program repair, and test generation, researchers and industry practitioners have developed complex agent-based approaches that use tools, run commands, observe feedback, and plan future actions.
However, the authors argue that the complexity of these agent-based approaches, along with the limited abilities of current LLMs, raises the question of whether we really need to employ such complex autonomous software agents. To address this, the authors introduce AGENTLESS, an agentless approach that follows a simple two-phase process of localization and repair, without allowing the LLM to autonomously decide future actions or operate with complex tools.
The evaluation on the popular SWE-bench Lite benchmark shows that AGENTLESS is able to achieve the highest performance (27.33%) among all open-source approaches, while incurring the lowest cost ($0.34) compared to prior agent-based methods. The authors attribute this success to the simplistic and straightforward design of AGENTLESS, which avoids the limitations of agent-based approaches, such as complex tool usage/design, lack of control in decision planning, and limited ability to self-reflect.
Furthermore, the authors conduct a detailed manual analysis of the SWE-bench Lite dataset and find issues with exact ground truth patches or insufficient/misleading issue descriptions in a non-trivial percentage of problems. To address this, they construct SWE-bench Lite-S, a more rigorous benchmark that excludes such problematic problems, and use it to further evaluate and compare approaches.
Overall, this work highlights the overlooked potential of a simple, interpretable technique in autonomous software development, and hopes to reset the baseline and inspire future work in this direction, moving away from the trend of increasingly complex agent-based approaches.
Reference: https://arxiv.org/abs/2407.01489