ammarali32 commited on
Commit
deec991
Β·
verified Β·
1 Parent(s): 1f4be4f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -3
README.md CHANGED
@@ -1,3 +1,107 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # ReplaceMe: Training-Free Transformer Pruning via Layer Removal & Linear Transformations
6
+ [![arXiv](https://img.shields.io/badge/arXiv-2310.12345-b31b1b.svg)]()
7
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
8
+
9
+
10
+ ![ReplaceMe Logo](./figs/logo2.jpg)
11
+
12
+ ## Model Description
13
+ ReplaceMe is a novel method for transformer model compression that enables **training-free** block/layer pruning while maintaining model performance through linear transformations. The approach:
14
+
15
+ - Identifies and removes block of layers
16
+ - Applies mathematically-derived transformations to preserve information flow
17
+ - Requires no fine-tuning or retraining
18
+ - Works with standard transformer architectures (The LTs are merged with the original model weights)
19
+
20
+ ## Key Features
21
+ - πŸš€ **Zero-Training Pruning**: Remove layers without any fine-tuning
22
+ - 🧠 **Performance Preservation**: <8% accuracy drop in most cases
23
+ - ⚑ **Instant Speedup**: less blocks -> faster inference + less memory
24
+ - πŸ”Œ **Plug-and-Play**: Works with existing HuggingFace models
25
+
26
+ ## πŸ”₯ Performance Comparison of Pruning Methods (Llama 2 7B, 25% Compression)
27
+
28
+ | Method | num_pruned_layers | Dataset | State | race 🏁 | winogrande 🎲 | piqa 🧠 | boolq ❓ | openbookqa πŸ“– | sciq πŸ”¬ | lambada_openai πŸ¦™ | Avg-acc πŸ“Š |
29
+ |--------------|-------------------|------------|---------------|--------|--------------|--------|---------|--------------|--------|------------------|------------|
30
+ | | | | | acc | acc | acc_norm | acc | acc_norm | acc_norm | acc | acc |
31
+ | **Llama 3.1** (baseline) | - | - | - | 0.449761 | 0.779006 | 0.809576 | 0.84159 | 0.43 | 0.961 | 0.732195 | 3.403683 | **0.711822** |
32
+ | **UIDL*** | - | 8 | slim_orca | no training | 0.34067 | 0.719021 | 0.68988 | 0.773394 | 0.31 | 0.719 | 0.087328 | 932.0 | 0.591994 |
33
+ | **ReplaceMe** (Ours) βœ… | Cosine | 8 | slim_orca | no training | **0.405742**πŸ† | **0.74191** πŸ†| **0.705658** πŸ†| **0.830275** πŸ†| **0.338** πŸ†| **0.901** πŸ†| **0.470794** πŸ†| 16.759605 πŸ†| **0.653764** πŸ† |
34
+
35
+
36
+ **Key:**
37
+ - πŸ† Best performance in column
38
+ - βœ… Training-free (our methods)
39
+
40
+
41
+ **Metrics Explained:**
42
+ - **Bold**: Best training-free results
43
+ - All numbers are accuracy scores
44
+
45
+ > πŸ”₯ **Our training-free methods achieve 92.5% of baseline performance while other approaches require expensive retraining!**
46
+
47
+ ## Installation
48
+ ```bash
49
+ pip install replaceme
50
+ # or
51
+ git clone https://github.com/mts-ai/ReplaceMe
52
+ cd ReplaceMe
53
+ pip install -e .
54
+ ```
55
+ ## Basic Usage
56
+ ```
57
+ # LSTSQ method (recommended)
58
+ run_replaceme --config ./reproduce/Replace_Me_pipeline_lstsq.yaml
59
+
60
+ # Cosine similarity method
61
+ run_replaceme --config ./reproduce/Replace_Me_pipeline_cosine.yaml
62
+ ```
63
+ There are many parameters you can play with, visit our repo and dscover πŸ”₯πŸ”₯
64
+ ## Load Model
65
+ As we said we are merging the LTs with the original transformer architecture so you just do it as usual
66
+ ```python
67
+ ## EXAMPLE
68
+ from transformers import AutoModelForCausalLM, AutoTokenizer
69
+
70
+ model_name = "MTSAIR/Llama3.1-6B-ReplaceMe"
71
+
72
+ model = AutoModelForCausalLM.from_pretrained(
73
+ model_name,
74
+ torch_dtype="auto",
75
+ device_map="auto"
76
+ )
77
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
78
+
79
+ prompt = "What is ReplaceME pruning method?!"
80
+ messages = [
81
+ {"role": "user", "content": prompt}
82
+ ]
83
+ text = tokenizer.apply_chat_template(
84
+ messages,
85
+ tokenize=False,
86
+ add_generation_prompt=True
87
+ )
88
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
89
+
90
+ output = model.generate(
91
+ **model_inputs,
92
+ max_new_tokens=512
93
+ )
94
+ response = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
95
+
96
+ ```
97
+ # Citation
98
+ If you use ReplaceMe in your research, please cite our paper:
99
+
100
+ ```bibtex
101
+ @article{replaceme2024,
102
+ title={Replace Me: Network Simplification via Block Pruning and Linear Transformations},
103
+ author={Shopkhoev D., Ali A., Zhussip M., Malykh V., Lefkimmiatis S., Komodakis N., Zagoruyko S.},
104
+ journal={},
105
+ year={2025}
106
+ }
107
+ ```