-
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Paper • 2406.11839 • Published • 38 -
Pandora: Towards General World Model with Natural Language Actions and Video States
Paper • 2406.09455 • Published • 15 -
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper • 2406.11827 • Published • 15 -
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Paper • 2406.11194 • Published • 15
Collections
Discover the best community collections!
Collections including paper arxiv:2406.02900
-
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 70 -
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper • 2406.06469 • Published • 28 -
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Paper • 2406.04271 • Published • 30 -
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper • 2406.02657 • Published • 40
-
Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 19 -
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Paper • 2405.19332 • Published • 22 -
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Paper • 2405.19107 • Published • 14 -
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper • 2406.00888 • Published • 33
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 28 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 14 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 32
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 65 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 48 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 30
-
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 28 -
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Paper • 2406.02900 • Published • 14 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 20 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 9