-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 143 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 64 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 63
Collections
Discover the best community collections!
Collections including paper arxiv:2403.03507
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 186 -
Mixture-of-Subspaces in Low-Rank Adaptation
Paper • 2406.11909 • Published • 3 -
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Paper • 2406.17660 • Published • 5 -
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients
Paper • 2407.11239 • Published • 8
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 186 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 63 -
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
Paper • 2407.02552 • Published • 4 -
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Paper • 2407.16741 • Published • 71
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 186 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 70 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 70 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 31
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 55 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 186 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 49 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 610