File size: 4,172 Bytes
c42c6ea
 
6d0e4bb
 
 
 
c42c6ea
6d0e4bb
c42c6ea
 
 
 
 
 
17982cf
c42c6ea
 
 
 
17982cf
 
6d0e4bb
17982cf
 
 
 
 
 
 
 
 
 
6d0e4bb
 
 
 
 
 
17982cf
c42c6ea
17982cf
c42c6ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6d0e4bb
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
base_model:
- meta-llama/Llama-3-2-3B
library_name: transformers
license: mit
pipeline_tag: text-generation
---

# TokenButler
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->



<div align="center">
  <img src="https://github.com/abdelfattah-lab/TokenButler/blob/main/figs/tokenbutlerlogo.png?raw=true" width="50%" alt="TokenButler" />
</div>
<hr>
<div align="center" style="line-height: 1;">
  <!-- Paper Badge -->
  <a href="https://arxiv.org/abs/2503.07518" target="_blank" style="margin: 2px;">
    <img alt="Paper" 
         src="https://img.shields.io/badge/Paper-View-orange?logo=readthedocs&logoColor=white" 
         style="display: inline-block; vertical-align: middle;"/>
  </a>
  <!-- GitHub Badge -->
  <a href="https://github.com/abdelfattah-lab/TokenButler" target="_blank" style="margin: 2px;">
    <img alt="GitHub" 
         src="https://img.shields.io/badge/GitHub-Repo-black?logo=github&logoColor=white" 
         style="display: inline-block; vertical-align: middle;"/>
  </a>
    <!-- Huggingface Badge -->
  <a href="https://huggingface.co/collections/akhauriyash/tokenbutler-67cf181b5762d0d60e5f312b" target="_blank" style="margin: 2px;">
    <img alt="Huggingface" 
         src="https://img.shields.io/badge/Hugging%20Face-FFD21E?logo=huggingface&logoColor=000" 
         style="display: inline-block; vertical-align: middle;"/>
  </a>
</div>

<br>


The collection of TokenButler models can be found [here](https://huggingface.co/collections/akhauriyash/tokenbutler-67cf181b5762d0d60e5f312b). To run the `meta-llama/Llama-3.2-3B` model, follow:

```
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

question = "If millionaires have butlers, why don't million dollar language models have a butler too? I think its because "

model_name = "akhauriyash/Llama-3.2-3B-Butler"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
response = generator(question, max_new_tokens=200, do_sample=True, top_p=0.95, temperature=0.7)

print(response[0]['generated_text'][len(question):])
```

Note that the 'default' configured sparsity is 50%. Further, there is a 'sliding window' of 128 and 8 'anchor tokens'. To 'change' the sparsity, you can use the following function after loading the model. Please note that the 'fixed' is the only supported strategy at the moment, which 'fixes' the sparsity of each layer (except the first) at the 'pc' (percentage) mentioned. This can also be found at `test_hf.py`. Sliding window and anchor tokens can be changed in a similar manner.

```
def set_sparsity(model, sparsity):
    for module in model.modules():
        if module.__class__.__name__.__contains__("AttentionExperimental"):
            module.token_sparse_method = sparsity
            module.set_token_sparsity()
    return model

model = set_sparsity(model, "fixed_60pc")
```


# Predictor Architecture
<div align="center">
  <img src="https://github.com/abdelfattah-lab/TokenButler/blob/main/figs/mainfig.png?raw=true" width="100%" alt="TokenButlerFigure" />
</div>

# Custom Synthetic Task
<div align="center">
  <img src="https://github.com/abdelfattah-lab/TokenButler/blob/main/figs/datasetfig.png?raw=true" width="100%" alt="Synthetic Tasks" />
</div>

All of our results, traces from experiments are located in `ablation_results/`

Note: Our predictor design has improved since the arXiv paper release (We added a layer-norm to stabilize training). Further, to focus on the main predictor design and training-eval scripts, we have removed the ablation scripts. To reproduce the original results and predictor models, please checkout commit `0412fc24a3b770e4d82e6d7064a8172f24c5fcd3` and download the old models from [Drive Link](https://drive.google.com/drive/folders/1psNZ1SU0LaZJ-x5MQGH59CzYSmeT4yRf?usp=sharing).

For the latest, new models, try the huggingface integration. [Wandb-Logs](https://wandb.ai/akhauriyash/TrainTokenButler) for trained models.