-
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Paper • 2403.06504 • Published • 53 -
Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization
Paper • 2311.10847 • Published • 2 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 58
Collections
Discover the best community collections!
Collections including paper arxiv:2403.10704
-
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 44 -
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Paper • 2403.05438 • Published • 21 -
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper • 2403.07508 • Published • 75 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 40
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 3 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 63
-
AtP*: An efficient and scalable method for localizing LLM behaviour to components
Paper • 2403.00745 • Published • 14 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 610 -
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Paper • 2402.16840 • Published • 26 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 115
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 37 -
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Paper • 2402.16822 • Published • 18 -
FuseChat: Knowledge Fusion of Chat Models
Paper • 2402.16107 • Published • 40 -
Multi-LoRA Composition for Image Generation
Paper • 2402.16843 • Published • 31
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 48 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 58 -
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper • 2403.02884 • Published • 17
-
Diffusion World Model
Paper • 2402.03570 • Published • 8 -
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Paper • 2401.16335 • Published • 1 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Paper • 2402.07319 • Published • 14
-
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 18 -
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Paper • 2402.05546 • Published • 5 -
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 80 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 610
-
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 20 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 108 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 46