SeerAttention-Llama-3.1-70B-AttnGates
This repo only contains the AttnGates' weights for Llama-3.1-70B-Instruct Model.
SeerAttention introduces learnable AttnGate modules to accelerate the computationally intensive prefill stage of long-context large language models (LLMs) via dynamic block-level sparsity. The AttnGates are trained in a parameter-efficient self-distillation framework, where they learn to mimic the 2D max-pooled attention patterns of the original frozen model, preserving its integrity while avoiding costly retraining. During inference, these gates generate block-sparse binary masks by applying threshold/TopK to their learned soft scores, enabling efficient computation through a custom block-sparse FlashAttention kernel.
Original Github Repo
https://github.com/microsoft/SeerAttention.
Evaluation Results
Perplexity on PG19
Density | 8192 | 16384 | 32768 | 65536 | 131072 |
---|---|---|---|---|---|
0.10 | 7.10 | 6.83 | 6.94 | 7.13 | 7.72 |
0.20 | 6.96 | 6.87 | 7.03 | 7.20 | 7.71 |
0.30 | 6.97 | 6.92 | 7.08 | 7.23 | 7.71 |
0.40 | 7.00 | 6.95 | 7.10 | 7.25 | 7.71 |
0.50 | 7.03 | 6.98 | 7.12 | 7.26 | 7.71 |
1.00 | 7.11 | 7.02 | 7.15 | 7.29 | 7.71 |
LongBench
Threshold=2e-3
Category | 0-4k (Dense/Sparse) | 4-8k (Dense/Sparse) | 8k+ (Dense/Sparse) |
---|---|---|---|
gov_report | 35.76 / 35.79 | 34.57 / 34.53 | 34.17 / 33.86 |
repobench-p | 60.36 / 61.59 | 60.73 / 55.86 | 63.07 / 59.71 |
passage_retrieval_en | 100.0 / 100.0 | 99.00 / 99.00 | 100.0 / 99.00 |
trec | 73.00 / 71.00 | 74.00 / 74.00 | 76.00 / 79.00 |
passage_count | 37.00 / 36.00 | 22.00 / 20.00 | 25.00 / 20.00 |
qasper | 54.18 / 53.82 | 47.34 / 47.37 | 33.49 / 32.25 |
2wikimqa | 71.87 / 71.3 | 70.80 / 70.54 | 53.97 / 54.87 |
lcc | 40.18 / 37.76 | 52.00 / 46.98 | 67.81 / 55.44 |
multi_news | 27.11 / 27.12 | 24.95 / 24.64 | 24.53 / 23.54 |
triviaqa | 90.82 / 91.15 | 94.06 / 93.72 | 93.56 / 93.56 |
hotpotqa | 69.83 / 69.13 | 69.29 / 67.47 | 70.61 / 70.01 |
samsum | 42.69 / 42.36 | 44.18 / 42.89 | 46.34 / 46.02 |
multifieldqa_en | 55.37 / 54.8 | 51.79 / 51.89 | 56.65 / 55.68 |
averaged score | 58.32 / 57.83 | 57.29 / 56.07 | 57.32 / 55.61 |
averaged density | 0.754 | 0.529 | 0.318 |
- Downloads last month
- 58
Model tree for SeerAttention/SeerAttention-Llama-3.1-70B-AttnGates
Base model
meta-llama/Llama-3.1-70B