Add license and tag metadata
Browse files
README.md
CHANGED
|
@@ -1,3 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Flash Attention
|
| 2 |
|
| 3 |
Flash Attention is a fast and memory-efficient implementation of the attention mechanism, designed to work with large models and long sequences. This is a Hugging Face compliant kernel build of Flash Attention.
|
|
@@ -65,6 +71,7 @@ print(f"Output: {out_kv.shape}")
|
|
| 65 |
```
|
| 66 |
|
| 67 |
expected output
|
|
|
|
| 68 |
```txt
|
| 69 |
Fetching 3 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:00<00:00, 16384.00it/s]
|
| 70 |
Flash Attention functions: ['mha_bwd', 'mha_fwd', 'mha_fwd_kvcache', 'mha_varlen_bwd', 'mha_varlen_fwd']
|
|
@@ -77,4 +84,5 @@ Output: torch.Size([10, 4, 8])
|
|
| 77 |
|
| 78 |
3. KV-cache:
|
| 79 |
Output: torch.Size([2, 2, 4, 8])
|
| 80 |
-
```
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: bsd-3-clause
|
| 3 |
+
tags:
|
| 4 |
+
- kernel
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
# Flash Attention
|
| 8 |
|
| 9 |
Flash Attention is a fast and memory-efficient implementation of the attention mechanism, designed to work with large models and long sequences. This is a Hugging Face compliant kernel build of Flash Attention.
|
|
|
|
| 71 |
```
|
| 72 |
|
| 73 |
expected output
|
| 74 |
+
|
| 75 |
```txt
|
| 76 |
Fetching 3 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:00<00:00, 16384.00it/s]
|
| 77 |
Flash Attention functions: ['mha_bwd', 'mha_fwd', 'mha_fwd_kvcache', 'mha_varlen_bwd', 'mha_varlen_fwd']
|
|
|
|
| 84 |
|
| 85 |
3. KV-cache:
|
| 86 |
Output: torch.Size([2, 2, 4, 8])
|
| 87 |
+
```
|
| 88 |
+
|