Open Access archive

AI-Powered Unit Test Generation via Multi-LLM Chaining: A Case Study With GPT-4o, Gemini, and Claude-3.5

Chandan Kumar, Amrita Vishwa Vidyapeetham, Amaravati Campus
Usha Sri Ponaka, Amrita Vishwa Vidyapeetham, Amaravati Campus
Pula V.Lakshmi Narasimha Naidu, Amrita Vishwa Vidyapeetham, Amaravati Campus
Pyatlo Bhuvaneswari, Amrita Vishwa Vidyapeetham, Amaravati Campus

Document Type

Article

Publication Title

IEEE Access

Abstract

Software testing is a crucial activity in the software development cycle, as it verifies code correctness, reliability, and maintainabilily. Unit testing involves verifying the correctness of the individual components of a program. Manually writing these tests is resource-intensive and time-consuming. Researchers have proposed various automated test generation methods to overcome this problem. Large Language Models have shown great success in automatically generating unit tests recently. Although a single LLM configuration can produce satisfactory responses, there are possible risks to the effectiveness of the generated tests. We present a novel method, LLM Chaining, that uses the collaboration of several LLMs, namely Gemini, GPT-4o, and Claude-3.5 Sonnet, to collaborate and iteratively enhance tests. We began by testing a single LLM writing unit tests, but after noting their suboptimal performance, we explored LLM chaining as a method to enhance the accuracy and coverage of the produced tests. Gemini and GPT-4o yielded the most reliable results of all the configurations we tested. Under this configuration, the LLM system uses Gemini to generate the initial JUnit tests, then sends the generated results to GPT-4o to improve accuracy. The HumanEval dataset was used to create JUnit test cases for this study. JaCoCo is used to collect coverage statistics, and PIT is used for mutation-based testing to measure an LLM’s ability to detect faults in tests. Our approach achieved 99.05% branch coverage, 90.48% line coverage, and 94.32% mutation coverage. In contrast, Randoop, a well-known automated test generation tool, obtained 68.64% branch coverage and 78.84% line coverage. This demonstrates how multi-step LLM refinement works well to advance automated test generation.

First Page

204058

Last Page

204071

DOI

10.1109/ACCESS.2025.3637221

Publication Date

1-1-2025

Recommended Citation

Kumar, Chandan; Sri Ponaka, Usha; Naidu, Pula V.Lakshmi Narasimha; and Bhuvaneswari, Pyatlo, "AI-Powered Unit Test Generation via Multi-LLM Chaining: A Case Study With GPT-4o, Gemini, and Claude-3.5" (2025). Open Access archive. 14604.
https://impressions.manipal.edu/open-access-archive/14604

This document is currently not available here.

COinS

Open Access archive

AI-Powered Unit Test Generation via Multi-LLM Chaining: A Case Study With GPT-4o, Gemini, and Claude-3.5

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Search

Browse

Author Corner

Open Access archive

AI-Powered Unit Test Generation via Multi-LLM Chaining: A Case Study With GPT-4o, Gemini, and Claude-3.5

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Search

Browse

Author Corner