Post 5675 benchmarked 9 models in 3 days. they were mostly below average in AHA score. p(doom) probably increased :( See translation 👀 4 4 +