🚀 The World’s First Public Diffusion Model Leaderboard (CLIP Score + FID Benchmarks!)

Community Article Published October 2, 2025

The diffusion space is moving fast, but reproducible evaluation has lagged behind. Until now.

We’re excited to announce the launch of the DreamLayer AI Diffusion Model Leaderboard, a public benchmark where top text-to-image models are scored using standardized metrics like CLIP Score, FID Score, Precision/Recall, and F1 Score.

🏆 Note: We’re hosting an Image Generation Kaggle Challenge!
Join the competition here: Kaggle.com/competitions/text-to-image-challenge

Compete for a chance to win one of 5 cash prizes

Screenshot 2025-10-02 at 2.27.58 PM


Why This Matters

Reproducibility is one of the biggest challenges in generative AI research.

Comparing OpenAI’s DALL·E, Stability AI’s SD Turbo, Google Gemini’s Nano Banana, Black Forest Labs’ Flux, Runway Gen-4, Ideogram, or Luma Labs’ Photon has often been inconsistent due to different prompts, seeds, and scoring setups.

Our pipeline automates:

  • Prompt pack setup across 200 prompts
  • Seed control for identical conditions
  • Metric scoring with CLIP, FID, and composition correctness

âś… The result: 200 prompts benchmarked in just 45 minutes per model, with transparent, reproducible metrics anyone can trust.


Early Results

  • Best CLIP Score: Luma Labs Photon (0.265)
  • Best FID Score: Ideogram V3 (305.60)
  • Best Recall: Stability SD Turbo (0.533)
  • Best Overall F1: Luma Labs Photon (0.463)

📊 See the full leaderboard live on DreamLayer: dreamlayer.io


What’s Next

We’re inviting the community to submit their own diffusion models for evaluation.

Whether you’re a researcher, startup, or lab, this leaderboard is designed to accelerate fair, reproducible benchmarking and push the field forward.

👉 Want your model added? Contact us!

Community

Sign up or log in to comment