Submitted by akhaliq 28 WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild · 9 authors 1
Submitted by akhaliq 8 Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? · 9 authors
Submitted by akhaliq 6 Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach · 25 authors