Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,165 @@ tags:
|
|
8 |
- model_hub_mixin
|
9 |
- pytorch_model_hub_mixin
|
10 |
---
|
|
|
|
|
11 |
|
12 |
-
|
13 |
-
|
14 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
- model_hub_mixin
|
9 |
- pytorch_model_hub_mixin
|
10 |
---
|
11 |
+
- Library: https://github.com/lucasdegeorge/T2I-ImageNet
|
12 |
+
<div align="center">
|
13 |
|
14 |
+
# How far can we go with ImageNet for Text-to-Image generation?
|
15 |
+
|
16 |
+
<a href="https://lucasdegeorge.github.io/" >Lucas Degeorge</a>, <a href="https://arijit-hub.github.io/" >Arijit Ghosh</a>, <a href="https://nicolas-dufour.github.io/" >Nicolas Dufour</a>, <a href="https://davidpicard.github.io/" >David Picard</a>, <a href="https://vicky.kalogeiton.info/" >Vicky Kalogeiton</a>
|
17 |
+
|
18 |
+
|
19 |
+

|
20 |
+
|
21 |
+
</div>
|
22 |
+
This repo has the code and models for the paper "How far can we go with ImageNet for Text-to-Image generation?"
|
23 |
+
|
24 |
+
The core idea is that text-to-image generation models typically rely on vast datasets, prioritizing quantity over quality. The usual solution is to gather massive amounts of data. We propose a new approach that leverages strategic data augmentation of small, well-curated datasets to enhance the performance of these models. We show that this method improves the quality of the generated images on several benchmarks.
|
25 |
+
|
26 |
+
Paper on Arxiv: https://arxiv.org/pdf/2502.21318
|
27 |
+
|
28 |
+
GitHub repository: https://github.com/lucasdegeorge/T2I-ImageNet
|
29 |
+
|
30 |
+
Project website: https://lucasdegeorge.github.io/projects/t2i_imagenet/
|
31 |
+
|
32 |
+
## Install
|
33 |
+
|
34 |
+
To install, first create a virtual environment with python (at least 3.9), clone the repository and run
|
35 |
+
|
36 |
+
```bash
|
37 |
+
pip install -e .
|
38 |
+
```
|
39 |
+
|
40 |
+
More details [here](https://github.com/lucasdegeorge/T2I-ImageNet/blob/main/README.md)
|
41 |
+
|
42 |
+
## Pretrained models
|
43 |
+
|
44 |
+
### CAD-I model
|
45 |
+
|
46 |
+
To use the pre-trained model do the following:
|
47 |
+
```python
|
48 |
+
from pipe import T2IPipeline
|
49 |
+
pipe = T2IPipeline("Lucasdegeorge/CAD-I").to("cuda")
|
50 |
+
prompt = "An adorable otter, with its sleek, brown fur and bright, curious eyes, playfully interacts with a vibrant bunch of broccoli... "
|
51 |
+
image = pipe(prompt, cfg=15)
|
52 |
+
```
|
53 |
+
|
54 |
+
If you just want to download the models, not the sampling pipeline, you can do:
|
55 |
+
|
56 |
+
```python
|
57 |
+
from pipe import T2IPipeline
|
58 |
+
model = CAD.from_pretrained("Lucasdegeorge/CAD-I")
|
59 |
+
```
|
60 |
+
|
61 |
+
### DiT-I model
|
62 |
+
|
63 |
+
Coming soon ...
|
64 |
+
|
65 |
+
|
66 |
+
## Prompts
|
67 |
+
|
68 |
+
Our models have been specifically trained to handle very long and detailed prompts. To get the best performance and results, we encourage you to use them with prompts that are rich in detail. Short or vague prompts may not fully utilize the model's capabilities.
|
69 |
+
|
70 |
+
Example prompts:
|
71 |
+
|
72 |
+
```
|
73 |
+
A vintage Polaroid camera sits inside a crinkled plastic bag, partially buried in the damp sand of a quiet beach. The transparent plastic clings to the camera, slightly fogged with condensation, distorting the faded colors of the once-pristine device. Gentle waves roll in the background, their foamy edges reaching toward the shore, while the golden hues of the setting sun cast long shadows across the sand. Small flecks of salt and tiny grains of sand stick to the bag, hinting at the passage of time and exposure to the elements.",
|
74 |
+
A raccoon stands upright in a meticulously designed astronaut suit, its ringed tail swaying gracefully beneath the reinforced fabric in defiance of zero gravity. The suit is a masterwork of engineering—pristine white panels seamlessly integrated with brushed titanium accents and carbon-fiber joints, while iridescent mission patches catch starlight on its shoulders. Through the helmet's gold-tinted visor, keen eyes shine with a mix of mischief and wonder, and the reflective surface mirrors an endless ballet of celestial bodies: pinprick stars scattered like diamond dust, the marble-like swirls of a distant gas giant, and ribbons of nebulae painting the void in ethereal purples and blues. A life-support backpack hugs its frame like a technological shell, status indicators pulsing with soft blue light, while around the suited explorer, motes of cosmic dust drift like suspended glitter, catching the light of a distant sun.","A quirky crab, entirely sculpted from various types of yellow cheese, sits proudly on a white plate. Its body is a smooth, golden wheel of cheese, round and rich in color, while soft, creamy cheese legs extend outward in neatly shaped segments, each one gently curled as if the crab is about to scuttle away. The claws are crafted from sharp, crumbly yellow cheese, carefully carved to resemble the pincers, with tiny bits of grated Parmesan scattered across to give it texture. The eyes are tiny olives, carefully set in place with delicate toothpicks, adding a playful touch to the cheese creation. Surrounding the crab, fresh basil leaves are placed to resemble seaweed, completing the amusing and mouthwatering oceanic scene on the plate.",
|
75 |
+
A golden retriever, its fur glowing like molten gold, sits serenely at the rim of a volcano, seemingly unbothered by the intense heat and chaos around it. The dog's soft, flowing coat contrasts sharply with the fiery landscape as it gazes calmly into the fiery abyss. Lava bubbles and pours from the crater, casting an eerie, amber glow that illuminates the dog's fur in a warm, almost ethereal light. The air shimmers with heat, while plumes of dark smoke rise into the sky, swirling like ominous clouds above. Despite the tumultuous eruption, the retriever exudes a quiet confidence, its relaxed posture and wagging tail creating an almost surreal sense of calm within the violent eruption. It's as if this brave dog has claimed the heart of the volcano as its own kingdom, a noble ruler amidst the storm of fire and ash.",
|
76 |
+
An adorable otter, with its sleek, brown fur and bright, curious eyes, playfully interacts with a vibrant bunch of broccoli. The otter, balanced on its back, holds the broccoli in its small paws, gently nibbling on a floret with a look of contentment. The broccoli's deep green florets stand out against the otter's playful, natural hues, creating a fun contrast between the two. The scene unfolds by the edge of a peaceful riverbank, where the otter has momentarily paused its swimming adventure to enjoy a healthy snack. The soft ripples of the water reflect the sunlight, adding to the serene, lighthearted moment of the otter and its green, crunchy companion.",
|
77 |
+
A wooden ostrich stands quietly in the center of an empty museum hall, its smooth, dark timber body gleaming under the soft, ambient light that filters through tall, arched windows. The ostrich's long, slender neck curves gracefully, its polished surface reflecting the muted tones of the space, from deep mahogany browns to hints of golden amber. The intricate grain of the wood adds texture and depth to its form, giving it a life-like quality despite its static position. Around the body, layers of carved wood feathers mimic the natural texture of plumage, adding elegance and a sculptural dimension. The bird's long, powerful legs are crafted with care, positioned in a gentle, poised stance. Its head, elongated and smooth, slopes into a sharply defined beak, with small, round insets of dark wood forming the eyes, giving the sculpture an expressive, almost lifelike gaze. The surrounding museum space is vast and quiet, with empty walls and clean, polished floors, which only amplify the serene presence of the wooden ostrich standing as a timeless piece of art in the middle of the room. The stillness of the museum creates a peaceful contrast to the life captured in the graceful figure of the ostrich, as the wooden sculpture seems to watch over the space, adding a touch of nature to the otherwise silent, empty environment.",
|
78 |
+
A bold, electric green owl with oversized sunglasses lounges on a desk in the center of a bright, modern classroom. Its vivid green feathers pop against the soft, neutral tones of the room, from the dark wood of the desks to the pale blue walls adorned with educational posters. The owl's sharp eyes peek over the edge of its sleek sunglasses, exuding an aura of cool confidence, like it's ready to drop some serious wisdom. Sunlight pours through the classroom windows, making its colorful feathers shimmer and creating a playful contrast against the seriousness of the chalkboard and neatly arranged desks. The owl's wings are slightly raised, as if caught mid-gesture, adding to the sense of relaxed energy it brings to the space. This odd combination of the owl's laid-back vibe and the structured classroom setting creates an amusing, surreal scene—one that feels both out of place and effortlessly cool.",
|
79 |
+
A curious fox sits comfortably in front of a glowing screen, its fluffy tail curled neatly around its paws. The soft light from the screen casts a gentle glow on the fox's bright orange fur, highlighting its pointed ears and sharp eyes that are locked intently on the display. Its fur contrasts beautifully against the dark room, where shadows stretch across the floor and walls. The fox's gaze is focused, almost as if it's trying to make sense of what's happening on the screen, its whiskers twitching with curiosity. The surrounding atmosphere is quiet, save for the soft hum of the device in front of it, adding a sense of peaceful solitude. The entire scene feels whimsical, with the fox's natural grace and the modern technology blending in an unexpectedly charming way.",
|
80 |
+
The image portrays the iconic Iron Throne from *Game of Thrones*, but reimagined entirely out of jagged, crystalline ice, radiating a cold, otherworldly aura. The throne's structure is massive and imposing, with hundreds of sharp, glistening shards of ice jutting out at chaotic angles, resembling frozen blades. The pale blue hue of the ice contrasts starkly against the dark, shadowy hall surrounding it, illuminated only by faint, icy light filtering through narrow windows. Frost creeps across the stone floor, and thin tendrils of mist swirl around the base of the throne, adding an eerie, mystical atmosphere. The seat itself is smooth but treacherous, with a faint glow emanating from within the ice, as if it holds some ancient, frozen power. The scene evokes a sense of both awe and dread, as though the throne itself is alive, a symbol of dominance forged in the cold, unyielding grip of winter.",
|
81 |
+
The image centers on a powerful lion, its golden fur gleaming under a soft, ethereal light as it skillfully controls a bright orange basketball. The lion's massive paws handle the ball with surprising dexterity, its sharp claws carefully tucked away to avoid damaging it. Its muscular body moves with fluid grace, every motion exuding strength and precision as it dribbles with focused intensity. The lion's piercing eyes lock onto the ball, its expression a mix of determination and wild instinct, while its majestic mane flows dynamically with each movement. The basketball, slightly frosted as if touched by the cool night air, contrasts vividly against the lion's natural, earthy tones. The scene captures a striking blend of raw animal power and unexpected playfulness, creating a surreal and captivating moment.",
|
82 |
+
The image showcases a strikingly elegant black cat, its fur an inky, absolute black so deep and rich that it seems to swallow all light, creating a velvety void of darkness. Its eyes, a vivid, electric green, blaze with intensity, glowing like neon against the cat's shadowy form, creating a dramatic, high-contrast visual. The green is so vibrant it almost pulses, drawing immediate attention and giving the cat an otherworldly, hypnotic presence. The cat sits perfectly still, its posture regal and composed, with every sleek line of its body accentuated by the stark interplay of light and shadow. The background is muted and dark, almost indistinguishable, ensuring the cat's jet-black fur and luminous eyes dominate the scene entirely. This bold contrast between the absolute black and the radiant green makes the image mesmerizing, almost surreal, and impossible to look away from.",
|
83 |
+
|
84 |
+
```
|
85 |
+
|
86 |
+
|
87 |
+
|
88 |
+
## Using the Pipeline
|
89 |
+
|
90 |
+
The `T2IPipeline` class provides a comprehensive interface for generating images from text prompts. Here's a detailed guide on how to use it:
|
91 |
+
|
92 |
+
### Basic Usage
|
93 |
+
|
94 |
+
```python
|
95 |
+
from pipe import T2IPipeline
|
96 |
+
# Initialize the pipeline
|
97 |
+
pipe = T2IPipeline("Lucasdegeorge/CAD-I").to("cuda")
|
98 |
+
# Generate an image from a prompt
|
99 |
+
prompt = "An adorable otter, with its sleek, brown fur and bright, curious eyes, playfully interacts with a vibrant bunch of broccoli... "
|
100 |
+
image = pipe(prompt, cfg=15)
|
101 |
+
```
|
102 |
+
|
103 |
+
### Advanced Configuration
|
104 |
+
|
105 |
+
The pipeline can be initialized with several customization options:
|
106 |
+
|
107 |
+
```python
|
108 |
+
pipe = T2IPipeline(
|
109 |
+
model_path=""Lucasdegeorge/CAD-I"",
|
110 |
+
sampler="ddim", # Options: "ddim", "ddpm", "dpm", "dpm_2S", "dpm_2M"
|
111 |
+
scheduler="sigmoid", # Options: "sigmoid", "cosine", "linear"
|
112 |
+
postprocessing="sd_1_5_vae",
|
113 |
+
scheduler_start=-3,
|
114 |
+
scheduler_end=3,
|
115 |
+
scheduler_tau=1.1,
|
116 |
+
device="cuda"
|
117 |
+
)
|
118 |
+
```
|
119 |
+
|
120 |
+
### Generation Parameters
|
121 |
+
|
122 |
+
The pipeline's `__call__` method accepts various parameters to control the generation process:
|
123 |
+
|
124 |
+
```python
|
125 |
+
image = pipe(
|
126 |
+
cond="A beautiful landscape", # Text prompt or list of prompts
|
127 |
+
num_samples=4, # Number of images to generate
|
128 |
+
cfg=15, # Classifier-free guidance scale
|
129 |
+
guidance_type="constant", # Type of guidance: "constant", "linear"
|
130 |
+
guidance_start_step=0, # Step to start guidance
|
131 |
+
coherence_value=1.0, # Coherence value for sampling
|
132 |
+
uncoherence_value=0.0, # Uncoherence value for sampling
|
133 |
+
thresholding_type="clamp", # Type of thresholding: "clamp", "dynamic_thresholding", "per_channel_dynamic_thresholding"
|
134 |
+
clamp_value=1.0, # Clamp value for thresholding
|
135 |
+
thresholding_percentile=0.995 # Percentile for thresholding
|
136 |
+
)
|
137 |
+
```
|
138 |
+
|
139 |
+
#### Guidance Types
|
140 |
+
- `constant`: Applies uniform guidance throughout the sampling process
|
141 |
+
- `linear`: Linearly increases guidance strength from start to end
|
142 |
+
- `exponential`: Exponentially increases guidance strength from start to end
|
143 |
+
|
144 |
+
#### Thresholding Types
|
145 |
+
- `clamp`: Clamps values to a fixed range using `clamp_value`
|
146 |
+
- `dynamic`: Dynamically adjusts thresholds based on the batch statistics
|
147 |
+
- `percentile`: Uses percentile-based thresholding with `thresholding_percentile`
|
148 |
+
|
149 |
+
### Advanced Parameters
|
150 |
+
|
151 |
+
For more control over the generation process, you can also specify:
|
152 |
+
|
153 |
+
- `x_N`: Initial noise tensor
|
154 |
+
- `latents`: Previous latents for continuation
|
155 |
+
- `num_steps`: Custom number of sampling steps
|
156 |
+
- `sampler`: Custom sampler function
|
157 |
+
- `scheduler`: Custom scheduler function
|
158 |
+
- `guidance_start_step`: Step to start guidance
|
159 |
+
- `generator`: Random number generator for reproducibility
|
160 |
+
- `unconfident_prompt`: Custom unconfident prompt text
|
161 |
+
|
162 |
+
## Citation
|
163 |
+
If you happen to use this repo in your experiments, you can acknowledge us by citing the following paper:
|
164 |
+
|
165 |
+
```bibtex
|
166 |
+
@article{degeorge2025farimagenettexttoimagegeneration,
|
167 |
+
title ={How far can we go with ImageNet for Text-to-Image generation?},
|
168 |
+
author ={Lucas Degeorges and Arijit Ghosh and Nicolas Dufour and David Picard and Vicky Kalogeiton},
|
169 |
+
year ={2025},
|
170 |
+
journal ={arXiv},
|
171 |
+
}
|
172 |
+
```
|