SeerAttention-Llama-3.1-70B-AttnGates

This repo only contains the AttnGates' weights for Llama-3.1-70B-Instruct Model.

SeerAttention introduces learnable AttnGate modules to accelerate the computationally intensive prefill stage of long-context large language models (LLMs) via dynamic block-level sparsity. The AttnGates are trained in a parameter-efficient self-distillation framework, where they learn to mimic the 2D max-pooled attention patterns of the original frozen model, preserving its integrity while avoiding costly retraining. During inference, these gates generate block-sparse binary masks by applying threshold/TopK to their learned soft scores, enabling efficient computation through a custom block-sparse FlashAttention kernel.

Original Github Repo

https://github.com/microsoft/SeerAttention.

Evaluation Results

Perplexity on PG19

Density 8192 16384 32768 65536 131072
0.10 7.10 6.83 6.94 7.13 7.72
0.20 6.96 6.87 7.03 7.20 7.71
0.30 6.97 6.92 7.08 7.23 7.71
0.40 7.00 6.95 7.10 7.25 7.71
0.50 7.03 6.98 7.12 7.26 7.71
1.00 7.11 7.02 7.15 7.29 7.71

LongBench

Threshold=2e-3

Category 0-4k (Dense/Sparse) 4-8k (Dense/Sparse) 8k+ (Dense/Sparse)
gov_report 35.76 / 35.79 34.57 / 34.53 34.17 / 33.86
repobench-p 60.36 / 61.59 60.73 / 55.86 63.07 / 59.71
passage_retrieval_en 100.0 / 100.0 99.00 / 99.00 100.0 / 99.00
trec 73.00 / 71.00 74.00 / 74.00 76.00 / 79.00
passage_count 37.00 / 36.00 22.00 / 20.00 25.00 / 20.00
qasper 54.18 / 53.82 47.34 / 47.37 33.49 / 32.25
2wikimqa 71.87 / 71.3 70.80 / 70.54 53.97 / 54.87
lcc 40.18 / 37.76 52.00 / 46.98 67.81 / 55.44
multi_news 27.11 / 27.12 24.95 / 24.64 24.53 / 23.54
triviaqa 90.82 / 91.15 94.06 / 93.72 93.56 / 93.56
hotpotqa 69.83 / 69.13 69.29 / 67.47 70.61 / 70.01
samsum 42.69 / 42.36 44.18 / 42.89 46.34 / 46.02
multifieldqa_en 55.37 / 54.8 51.79 / 51.89 56.65 / 55.68
averaged score 58.32 / 57.83 57.29 / 56.07 57.32 / 55.61
averaged density 0.754 0.529 0.318
Downloads last month
58
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SeerAttention/SeerAttention-Llama-3.1-70B-AttnGates

Adapter
(33)
this model