SeerAttention-Llama-3.1-70B-AttnGates

This repo only contains the AttnGates' weights for Llama-3.1-70B-Instruct Model.

SeerAttention introduces learnable AttnGate modules to accelerate the computationally intensive prefill stage of long-context large language models (LLMs) via dynamic block-level sparsity. The AttnGates are trained in a parameter-efficient self-distillation framework, where they learn to mimic the 2D max-pooled attention patterns of the original frozen model, preserving its integrity while avoiding costly retraining. During inference, these gates generate block-sparse binary masks by applying threshold/TopK to their learned soft scores, enabling efficient computation through a custom block-sparse FlashAttention kernel.

Original Github Repo

https://github.com/microsoft/SeerAttention.

Evaluation Results

Perplexity on PG19

Density	8192	16384	32768	65536	131072
0.10	7.10	6.83	6.94	7.13	7.72
0.20	6.96	6.87	7.03	7.20	7.71
0.30	6.97	6.92	7.08	7.23	7.71
0.40	7.00	6.95	7.10	7.25	7.71
0.50	7.03	6.98	7.12	7.26	7.71
1.00	7.11	7.02	7.15	7.29	7.71

LongBench

Threshold=2e-3

Category	0-4k (Dense/Sparse)	4-8k (Dense/Sparse)	8k+ (Dense/Sparse)
gov_report	35.76 / 35.79	34.57 / 34.53	34.17 / 33.86
repobench-p	60.36 / 61.59	60.73 / 55.86	63.07 / 59.71
passage_retrieval_en	100.0 / 100.0	99.00 / 99.00	100.0 / 99.00
trec	73.00 / 71.00	74.00 / 74.00	76.00 / 79.00
passage_count	37.00 / 36.00	22.00 / 20.00	25.00 / 20.00
qasper	54.18 / 53.82	47.34 / 47.37	33.49 / 32.25
2wikimqa	71.87 / 71.3	70.80 / 70.54	53.97 / 54.87
lcc	40.18 / 37.76	52.00 / 46.98	67.81 / 55.44
multi_news	27.11 / 27.12	24.95 / 24.64	24.53 / 23.54
triviaqa	90.82 / 91.15	94.06 / 93.72	93.56 / 93.56
hotpotqa	69.83 / 69.13	69.29 / 67.47	70.61 / 70.01
samsum	42.69 / 42.36	44.18 / 42.89	46.34 / 46.02
multifieldqa_en	55.37 / 54.8	51.79 / 51.89	56.65 / 55.68
averaged score	58.32 / 57.83	57.29 / 56.07	57.32 / 55.61
averaged density	0.754	0.529	0.318

SeerAttention
/

SeerAttention-Llama-3.1-70B-AttnGates

SeerAttention-Llama-3.1-70B-AttnGates

Original Github Repo

Evaluation Results

Perplexity on PG19

LongBench

Model tree for SeerAttention/SeerAttention-Llama-3.1-70B-AttnGates