Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper ā¢ 2502.11089 ā¢ Published 9 days ago ā¢ 134
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper ā¢ 2501.12948 ā¢ Published Jan 22 ā¢ 330
view article Article MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era By MiniMax-AI ā¢ Jan 15 ā¢ 41
UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages Paper ā¢ 2411.14343 ā¢ Published Nov 21, 2024 ā¢ 7
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF Text Generation ā¢ Updated Oct 25, 2024 ā¢ 116k ā¢ ā¢ 2.02k
view reply Interesting, but how does this approach generalize to arbitrary user query / document domains? Would you need to train a separate network for each domain / dataset?