AI-Powered Unit Test Generation via Multi-LLM Chaining: A Case Study With GPT-4o, Gemini, and Claude-3.5
Document Type
Article
Publication Title
IEEE Access
Abstract
Software testing is a crucial activity in the software development cycle, as it verifies code correctness, reliability, and maintainabilily. Unit testing involves verifying the correctness of the individual components of a program. Manually writing these tests is resource-intensive and time-consuming. Researchers have proposed various automated test generation methods to overcome this problem. Large Language Models have shown great success in automatically generating unit tests recently. Although a single LLM configuration can produce satisfactory responses, there are possible risks to the effectiveness of the generated tests. We present a novel method, LLM Chaining, that uses the collaboration of several LLMs, namely Gemini, GPT-4o, and Claude-3.5 Sonnet, to collaborate and iteratively enhance tests. We began by testing a single LLM writing unit tests, but after noting their suboptimal performance, we explored LLM chaining as a method to enhance the accuracy and coverage of the produced tests. Gemini and GPT-4o yielded the most reliable results of all the configurations we tested. Under this configuration, the LLM system uses Gemini to generate the initial JUnit tests, then sends the generated results to GPT-4o to improve accuracy. The HumanEval dataset was used to create JUnit test cases for this study. JaCoCo is used to collect coverage statistics, and PIT is used for mutation-based testing to measure an LLM’s ability to detect faults in tests. Our approach achieved 99.05% branch coverage, 90.48% line coverage, and 94.32% mutation coverage. In contrast, Randoop, a well-known automated test generation tool, obtained 68.64% branch coverage and 78.84% line coverage. This demonstrates how multi-step LLM refinement works well to advance automated test generation.
First Page
204058
Last Page
204071
DOI
10.1109/ACCESS.2025.3637221
Publication Date
1-1-2025
Recommended Citation
Kumar, Chandan; Sri Ponaka, Usha; Naidu, Pula V.Lakshmi Narasimha; and Bhuvaneswari, Pyatlo, "AI-Powered Unit Test Generation via Multi-LLM Chaining: A Case Study With GPT-4o, Gemini, and Claude-3.5" (2025). Open Access archive. 14604.
https://impressions.manipal.edu/open-access-archive/14604