Commit 
							
							·
						
						bf7364f
	
1
								Parent(s):
							
							2d7348d
								
cross referencing other transformer-related implementations
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -11,10 +11,12 @@ language: en 
     | 
|
| 11 | 
         
             
            license: mit
         
     | 
| 12 | 
         
             
            ---
         
     | 
| 13 | 
         | 
| 14 | 
         
            -
            # DeepSeek Multi-Latent Attention
         
     | 
| 15 | 
         | 
| 16 | 
         
             
            This repository provides a PyTorch implementation of the Multi-Head Latent Attention (MLA) mechanism introduced in the DeepSeek-V2 paper. **This is not a trained model, but rather a modular attention implementation** that significantly reduces KV cache for efficient inference while maintaining model performance through its innovative architecture. It can be used as a drop-in attention module in transformer architectures.
         
     | 
| 17 | 
         | 
| 
         | 
|
| 
         | 
|
| 18 | 
         
             
            ## Key Features
         
     | 
| 19 | 
         | 
| 20 | 
         
             
            - **Low-Rank Key-Value Joint Compression**: Reduces memory footprint during inference
         
     | 
| 
         @@ -114,6 +116,18 @@ Key aspects: 
     | 
|
| 114 | 
         
             
            - Position encoding through decoupled RoPE pathway
         
     | 
| 115 | 
         
             
            - Efficient cache management for both pathways
         
     | 
| 116 | 
         | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 117 | 
         
             
            ## Contributing
         
     | 
| 118 | 
         | 
| 119 | 
         
             
            Contributions are welcome! Feel free to:
         
     | 
| 
         | 
|
| 11 | 
         
             
            license: mit
         
     | 
| 12 | 
         
             
            ---
         
     | 
| 13 | 
         | 
| 14 | 
         
            +
            # DeepSeek Multi-Head Latent Attention
         
     | 
| 15 | 
         | 
| 16 | 
         
             
            This repository provides a PyTorch implementation of the Multi-Head Latent Attention (MLA) mechanism introduced in the DeepSeek-V2 paper. **This is not a trained model, but rather a modular attention implementation** that significantly reduces KV cache for efficient inference while maintaining model performance through its innovative architecture. It can be used as a drop-in attention module in transformer architectures.
         
     | 
| 17 | 
         | 
| 18 | 
         
            +
            This repository is part of a series implementing the key architectural innovations from the DeepSeek paper. See the **Related Implementations** section for the complete series.
         
     | 
| 19 | 
         
            +
             
     | 
| 20 | 
         
             
            ## Key Features
         
     | 
| 21 | 
         | 
| 22 | 
         
             
            - **Low-Rank Key-Value Joint Compression**: Reduces memory footprint during inference
         
     | 
| 
         | 
|
| 116 | 
         
             
            - Position encoding through decoupled RoPE pathway
         
     | 
| 117 | 
         
             
            - Efficient cache management for both pathways
         
     | 
| 118 | 
         | 
| 119 | 
         
            +
            ## Related Implementations
         
     | 
| 120 | 
         
            +
             
     | 
| 121 | 
         
            +
            This repository is part of a series implementing the key architectural innovations from the DeepSeek paper:
         
     | 
| 122 | 
         
            +
             
     | 
| 123 | 
         
            +
            1. **[DeepSeek Multi-head Latent Attention](https://huggingface.co/bird-of-paradise/deepseek-mla)**(This Repository): Implementation of DeepSeek's MLA mechanism for efficient KV cache usage during inference.
         
     | 
| 124 | 
         
            +
             
     | 
| 125 | 
         
            +
            2. **[DeepSeek MoE](https://huggingface.co/bird-of-paradise/deepseek-moe)**: Implementation of DeepSeek's Mixture of Experts architecture that enables efficient scaling of model parameters.
         
     | 
| 126 | 
         
            +
             
     | 
| 127 | 
         
            +
            3. **[Transformer Implementation Tutorial](https://huggingface.co/datasets/bird-of-paradise/transformer-from-scratch-tutorial)**: A detailed tutorial on implementing transformer architecture with explanations of key components.
         
     | 
| 128 | 
         
            +
             
     | 
| 129 | 
         
            +
            Together, these implementations cover the core innovations that power DeepSeek's state-of-the-art performance. By combining the MoE architecture with Multi-head Latent Attention, you can build a complete DeepSeek-style model with improved training efficiency and inference performance.
         
     | 
| 130 | 
         
            +
             
     | 
| 131 | 
         
             
            ## Contributing
         
     | 
| 132 | 
         | 
| 133 | 
         
             
            Contributions are welcome! Feel free to:
         
     |