view post Post 2445 The latest paper of DeepSeek is now available on the Daily Papers page ๐You can reach out to the authors directly on this page๐ Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089) See translation 1 reply ยท ๐ 5 5 ๐ 1 1 ๐ 1 1 ๐ฅ 1 1 + Reply
view post Post 1621 ๐ข The LLaMA-3.1-8B distilled 8B version of the R1 DeepSeek AI is available besides the one based on Qwen๐ Notebook for using it in reasoning over series of data ๐ง :https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_llama3.ipynbLoading using the pipeline API of the transformers library:https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_llama.py๐ก GPU Usage: 12.3 GB (FP16/FP32 mode) which is suitable for T4. (a 1.5 GB less than Qwen-distilled version)๐ Perfomance: T4 instance: ~0.19 tokens/sec (FP32 mode) and (FP16 mode) ~0.22-0.30 tokens/sec. Is it should be that slow? ๐คModel name: deepseek-ai/DeepSeek-R1-Distill-Llama-8Bโญ Framework: https://github.com/nicolay-r/bulk-chain๐ Notebooks and models hub: https://github.com/nicolay-r/nlp-thirdgate See translation ๐ฅ 7 7 + Reply