TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation
Abstract
Text-conditioned image generation has gained significant attention in recent years and are processing increasingly longer and comprehensive text prompt. In everyday life, dense and intricate text appears in contexts like advertisements, infographics, and signage, where the integration of both text and visuals is essential for conveying complex information. However, despite these advances, the generation of images containing long-form text remains a persistent challenge, largely due to the limitations of existing datasets, which often focus on shorter and simpler text. To address this gap, we introduce TextAtlas5M, a novel dataset specifically designed to evaluate long-text rendering in text-conditioned image generation. Our dataset consists of 5 million long-text generated and collected images across diverse data types, enabling comprehensive evaluation of large-scale generative models on long-text image generation. We further curate 3000 human-improved test set TextAtlasEval across 3 data domains, establishing one of the most extensive benchmarks for text-conditioned generation. Evaluations suggest that the TextAtlasEval benchmarks present significant challenges even for the most advanced proprietary models (e.g. GPT4o with DallE-3), while their open-source counterparts show an even larger performance gap. These evidences position TextAtlas5M as a valuable dataset for training and evaluating future-generation text-conditioned image generation models.
Community
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation
Website: https://textatlas5m.github.io
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model (2025)
- Efficient Scaling of Diffusion Transformers for Text-to-Image Generation (2024)
- TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization (2024)
- RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation (2025)
- SceneBooth: Diffusion-based Framework for Subject-preserved Text-to-Image Generation (2025)
- Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting (2025)
- One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper