Computer Science > Computer Vision and Pattern Recognition

arXiv:2507.23362 (cs)

[Submitted on 31 Jul 2025]

Title:Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers

Authors:Ji Ma, Wei Suo, Peng Wang, Yanning Zhang

Abstract:Although large vision-language models (LVLMs) have demonstrated impressive capabilities in multi-modal understanding and reasoning, their practical applications are still limited by massive model parameters and high computational costs. Recent efforts from natural language processing (NLP) have shown the effectiveness of layer pruning, offering a plausible training-free compression solution. However, due to the modality divergence between vision and language, it is unclear whether these NLP techniques are still effective in LVLMs. In this paper, we empirically prove that directly applying these layer pruning methods to LVLMs is ineffective. Through extensive experiments, we find that non-essential vision-language (VL) tokens and inter-layer feature gaps pose critical challenges to pruning layers in LVLMs. Based on these insights, we propose a novel framework Short-LVLM (SVL) that can utilize important VL tokens and mitigate the layer-wise feature gaps. Notably, Short-LVLM not only achieves a superior trade-off between performance and efficiency but also exhibits several potential advantages, i.e., training-free, model-agnostic, and highly compatible. The code for this work is publicly available at this https URL.

Comments:	Accepted By ACM MM 25
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2507.23362 [cs.CV]
	(or arXiv:2507.23362v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2507.23362

Submission history

From: Ji Ma [view email]
[v1] Thu, 31 Jul 2025 09:17:53 UTC (1,583 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators