SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 10 days ago • 100
Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published 24 days ago • 53
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code Paper • 2410.08196 • Published Oct 10, 2024 • 46
VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation Paper • 2412.10151 • Published Dec 13, 2024 • 5
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8, 2024 • 157
Understanding Reference Policies in Direct Preference Optimization Paper • 2407.13709 • Published Jul 18, 2024 • 17
X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment Paper • 2403.11399 • Published Mar 18, 2024 • 6
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 77
BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining Paper • 2401.06443 • Published Jan 12, 2024 • 2
PERL: Parameter Efficient Reinforcement Learning from Human Feedback Paper • 2403.10704 • Published Mar 15, 2024 • 58
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14, 2024 • 76
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance Paper • 2401.15687 • Published Jan 28, 2024 • 23
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models Paper • 2402.01118 • Published Feb 2, 2024 • 31