kernels-community
/

flash-attn2

danieldk HF Staff commited on Apr 16

Commit

a741640

1 Parent(s): f849035

Add license and tag metadata

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,3 +1,9 @@
 # Flash Attention
 Flash Attention is a fast and memory-efficient implementation of the attention mechanism, designed to work with large models and long sequences. This is a Hugging Face compliant kernel build of Flash Attention.
@@ -65,6 +71,7 @@ print(f"Output: {out_kv.shape}")
 ```
 expected output
 ```txt
 Fetching 3 files: 100%|█████████████████████████████████████████████████████| 3/3 [00:00<00:00, 16384.00it/s]
 Flash Attention functions: ['mha_bwd', 'mha_fwd', 'mha_fwd_kvcache', 'mha_varlen_bwd', 'mha_varlen_fwd']
@@ -77,4 +84,5 @@ Output: torch.Size([10, 4, 8])
 3. KV-cache:
 Output: torch.Size([2, 2, 4, 8])
-```

+---
+license: bsd-3-clause
+tags:
+  - kernel
+---
 # Flash Attention
 Flash Attention is a fast and memory-efficient implementation of the attention mechanism, designed to work with large models and long sequences. This is a Hugging Face compliant kernel build of Flash Attention.
 ```
 expected output
 ```txt
 Fetching 3 files: 100%|█████████████████████████████████████████████████████| 3/3 [00:00<00:00, 16384.00it/s]
 Flash Attention functions: ['mha_bwd', 'mha_fwd', 'mha_fwd_kvcache', 'mha_varlen_bwd', 'mha_varlen_fwd']
 3. KV-cache:
 Output: torch.Size([2, 2, 4, 8])
+```