Adaptive Rank Pruning: Dynamic Low-Rank Model Merging and Compression for Efficient AI Deployment
Document Type
Article
Publication Title
IEEE Access
Abstract
Deploying large, pretrained models on resource limited devices remains a fundamental challenge in machine learning. While model merging and low-rank compression represent two common options, they generally employ static approaches such as factorization with a fixed rank (e.g., singular value decomposition) or weight averaging, producing some degradation in performance. This work introduces Adaptive Rank Pruning (ARP), a dynamic, layer-wise optimization of the rank during merging by using a variance-thresholding criterion, creating a unified high quality approach to compression and merging. ARP does not require retraining and is evaluated through rigorous comparisons on both classical methods (SVD) and modern state-of-the-art baselines (LoRA and QLoRA). Extensive experiments on vision (ResNet) and language (BERT) tasks show that ARP achieves a better accuracy–compression trade-off ratio, producing up to 2.5× model size reduction with less than 4% accuracy loss. We further demonstrate ARP on edge hardware (Raspberry Pi 4, Google Pixel 6), validating its ability to reduce inference latency and energy consumption compared to alternative methods. Our results reveal ARP as a robust and effective approach for deploying adaptable AI in real-world constrained environments.
First Page
177036
Last Page
177056
DOI
10.1109/ACCESS.2025.3619975
Publication Date
1-1-2025
Recommended Citation
Vedhanth, M.; Mahadevi, S.; and Kumar, Anil, "Adaptive Rank Pruning: Dynamic Low-Rank Model Merging and Compression for Efficient AI Deployment" (2025). Open Access archive. 14004.
https://impressions.manipal.edu/open-access-archive/14004