DarwinLM: Evolutionary Structured Pruning of Large Language Models
Abstract
Large Language Models (LLMs) have achieved significant success across various NLP tasks. However, their massive computational costs limit their widespread use, particularly in real-time applications. Structured pruning offers an effective solution by compressing models and directly providing end-to-end speed improvements, regardless of the hardware environment. Meanwhile, different components of the model exhibit varying sensitivities towards pruning, calling for non-uniform model compression. However, a pruning method should not only identify a capable substructure, but also account for post-compression training. To this end, we propose \sysname, a method for training-aware structured pruning. \sysname builds upon an evolutionary search process, generating multiple offspring models in each generation through mutation, and selecting the fittest for survival. To assess the effect of post-training, we incorporate a lightweight, multistep training process within the offspring population, progressively increasing the number of tokens and eliminating poorly performing models in each selection stage. We validate our method through extensive experiments on Llama-2-7B, Llama-3.1-8B and Qwen-2.5-14B-Instruct, achieving state-of-the-art performance for structured pruning. For instance, \sysname surpasses ShearedLlama while requiring 5times less training data during post-compression training.
Community
First time submission.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training (2025)
- FASP: Fast and Accurate Structured Pruning of Large Language Models (2025)
- ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning (2025)
- Lightweight and Post-Training Structured Pruning for On-Device Large Lanaguage Models (2025)
- Instruction-Following Pruning for Large Language Models (2025)
- FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing (2025)
- MultiPruner: Balanced Structure Removal in Foundation Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 6
Browse 6 models citing this paperDatasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper