MTSAIR
/

Llama3.1-6B-ReplaceMe

Safetensors

llama

Model card Files Files and versions

xet

Community

ammarali32 commited on Apr 29

Commit

deec991

verified ·

1 Parent(s): 1f4be4f

Update README.md

Browse files

Files changed (1) hide show

README.md +107 -3

README.md CHANGED Viewed

@@ -1,3 +1,107 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# ReplaceMe: Training-Free Transformer Pruning via Layer Removal & Linear Transformations
+[![arXiv](https://img.shields.io/badge/arXiv-2310.12345-b31b1b.svg)]()
+[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+![ReplaceMe Logo](./figs/logo2.jpg)
+## Model Description
+ReplaceMe is a novel method for transformer model compression that enables **training-free** block/layer pruning while maintaining model performance through linear transformations. The approach:
+- Identifies and removes block of layers
+- Applies mathematically-derived transformations to preserve information flow
+- Requires no fine-tuning or retraining
+- Works with standard transformer architectures (The LTs are merged with the original model weights)
+## Key Features
+- 🚀 **Zero-Training Pruning**: Remove layers without any fine-tuning
+- 🧠 **Performance Preservation**: <8% accuracy drop in most cases
+- ⚡ **Instant Speedup**: less blocks -> faster inference + less memory
+- 🔌 **Plug-and-Play**: Works with existing HuggingFace models
+## 🔥 Performance Comparison of Pruning Methods (Llama 2 7B, 25% Compression)
+| Method       | num_pruned_layers | Dataset    | State         | race 🏁 | winogrande 🎲 | piqa 🧠 | boolq ❓ | openbookqa 📖 | sciq 🔬 | lambada_openai 🦙 | Avg-acc 📊 |
+|--------------|-------------------|------------|---------------|--------|--------------|--------|---------|--------------|--------|------------------|------------|
+|              |                   |            |               | acc    | acc          | acc_norm | acc     | acc_norm     | acc_norm | acc              | acc        |
+| **Llama 3.1** (baseline) | - | - | - | 0.449761 | 0.779006 | 0.809576 | 0.84159 | 0.43 | 0.961 | 0.732195 | 3.403683 | **0.711822** |
+| **UIDL*** | - | 8 | slim_orca | no training | 0.34067 | 0.719021 | 0.68988 | 0.773394 | 0.31 | 0.719 | 0.087328 | 932.0 | 0.591994 |
+| **ReplaceMe** (Ours) ✅ | Cosine | 8 | slim_orca | no training | **0.405742**🏆 | **0.74191** 🏆| **0.705658** 🏆| **0.830275** 🏆| **0.338** 🏆| **0.901** 🏆| **0.470794** 🏆| 16.759605 🏆| **0.653764** 🏆 |
+**Key:**
+- 🏆 Best performance in column
+- ✅ Training-free (our methods)
+**Metrics Explained:**
+- **Bold**: Best training-free results
+- All numbers are accuracy scores
+> 🔥 **Our training-free methods achieve 92.5% of baseline performance while other approaches require expensive retraining!**
+## Installation
+```bash
+pip install replaceme
+# or
+git clone https://github.com/mts-ai/ReplaceMe
+cd ReplaceMe
+pip install -e .
+```
+## Basic Usage
+```
+# LSTSQ method (recommended)
+run_replaceme --config ./reproduce/Replace_Me_pipeline_lstsq.yaml
+# Cosine similarity method
+run_replaceme --config ./reproduce/Replace_Me_pipeline_cosine.yaml
+```
+There are many parameters you can play with, visit our repo and dscover 🔥🔥
+## Load Model
+As we said we are merging the LTs with the original transformer architecture so you just do it as usual
+```python
+## EXAMPLE
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "MTSAIR/Llama3.1-6B-ReplaceMe"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "What is ReplaceME pruning method?!"
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+output = model.generate(
+    **model_inputs,
+    max_new_tokens=512
+)
+response = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
+```
+# Citation
+If you use ReplaceMe in your research, please cite our paper:
+```bibtex
+@article{replaceme2024,
+  title={Replace Me: Network Simplification via Block Pruning and Linear Transformations},
+  author={Shopkhoev D., Ali A., Zhussip M., Malykh V., Lefkimmiatis S., Komodakis N., Zagoruyko S.},
+  journal={},
+  year={2025}
+}
+```