Open Access archive

Adaptive Rank Pruning: Dynamic Low-Rank Model Merging and Compression for Efficient AI Deployment

M. Vedhanth, Manipal Institute of Technology
S. Mahadevi, Manipal Institute of Technology
Anil Kumar, Manipal Institute of Technology

Document Type

Article

Publication Title

IEEE Access

Abstract

Deploying large, pretrained models on resource limited devices remains a fundamental challenge in machine learning. While model merging and low-rank compression represent two common options, they generally employ static approaches such as factorization with a fixed rank (e.g., singular value decomposition) or weight averaging, producing some degradation in performance. This work introduces Adaptive Rank Pruning (ARP), a dynamic, layer-wise optimization of the rank during merging by using a variance-thresholding criterion, creating a unified high quality approach to compression and merging. ARP does not require retraining and is evaluated through rigorous comparisons on both classical methods (SVD) and modern state-of-the-art baselines (LoRA and QLoRA). Extensive experiments on vision (ResNet) and language (BERT) tasks show that ARP achieves a better accuracy–compression trade-off ratio, producing up to 2.5× model size reduction with less than 4% accuracy loss. We further demonstrate ARP on edge hardware (Raspberry Pi 4, Google Pixel 6), validating its ability to reduce inference latency and energy consumption compared to alternative methods. Our results reveal ARP as a robust and effective approach for deploying adaptable AI in real-world constrained environments.

First Page

177036

Last Page

177056

DOI

10.1109/ACCESS.2025.3619975

Publication Date

1-1-2025

Recommended Citation

Vedhanth, M.; Mahadevi, S.; and Kumar, Anil, "Adaptive Rank Pruning: Dynamic Low-Rank Model Merging and Compression for Efficient AI Deployment" (2025). Open Access archive. 14004.
https://impressions.manipal.edu/open-access-archive/14004

This document is currently not available here.

COinS

Open Access archive

Adaptive Rank Pruning: Dynamic Low-Rank Model Merging and Compression for Efficient AI Deployment

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Search

Browse

Author Corner

Open Access archive

Adaptive Rank Pruning: Dynamic Low-Rank Model Merging and Compression for Efficient AI Deployment

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Search

Browse

Author Corner