Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search Paper • 2503.04412 • Published 8 days ago • 1
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Paper • 2502.19261 • Published 16 days ago • 6
TinySwallow Collection Compact Japanese models trained with "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models" • 5 items • Updated Jan 30 • 16
Agent Skill Acquisition for Large Language Models via CycleQD Paper • 2410.14735 • Published Oct 16, 2024 • 2
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs Paper • 2407.03963 • Published Jul 4, 2024 • 19
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 42