akhauriyash/DeepSeek-R1-Distill-Llama-8B-Butler
Text Generation
•
Updated
•
16
TokenButler -- Predict token importance for all heads across the transformer in the first layer itself. Enable fine-grained token sparsity!