AiArtLab
/

waifu-2b

Model card Files Files and versions Community

waifu-2b / README.md

femboysLover's picture

Update README.md

05db654 verified about 2 months ago

|

history blame contribute delete

2.86 kB

	---
	license: apache-2.0
	pipeline_tag: text-to-image
	---
	# Work / train in progress!
	![image](./promo.png)

	⚡️Waifu: efficient high-resolution waifu synthesis


	waifu is a free text-to-image model that can efficiently generate images in 80 languages. Our goal is to create a small model without compromising on quality.

	## Core designs include:

	(1) [AuraDiffusion/16ch-vae](https://huggingface.co/AuraDiffusion/16ch-vae): A fully open source 16ch VAE. Natively trained in fp16. \
	(2) [Linear DiT](https://github.com/NVlabs/Sana): we use 1.6b DiT transformer with linear attention. \
	(3) [MEXMA-SigLIP](https://huggingface.co/visheratin/mexma-siglip): MEXMA-SigLIP is a model that combines the [MEXMA](https://huggingface.co/facebook/MEXMA) multilingual text encoder and an image encoder from the [SigLIP](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384) model. This allows us to get a high-performance CLIP model for 80 languages.. \
	(4) Other: we use Flow-Euler sampler, Adafactor-Fused optimizer and bf16 precision for training, and combine efficient caption labeling (MoonDream, CogVlM, Human, Gpt's) and danbooru tags to accelerate convergence.

	## Pros
	- Small model that can be trained on a common GPU; fast training process.
	- Supports multiple languages and demonstrates good prompt adherence.
	- Utilizes the best 16-channel VAE (Variational Autoencoder).

	## Cons
	- Trained on only 2 million images (low-budget model, approximately $3,000).
	- Training dataset consists primarily of anime and illustrations (only about 1% realistic images).
	- Only lowres for now (512)

	## Example

	```py
	# 1st, install latest diffusers from source!!
	pip install git+https://github.com/huggingface/diffusers
	```

	```py
	import torch
	from diffusers import DiffusionPipeline
	#from pipeline_waifu import WaifuPipeline

	pipe_id = "AiArtLab/waifu-2b"
	variant = "fp16"

	# Pipeline
	pipeline = DiffusionPipeline.from_pretrained(
	pipe_id,
	variant=variant,
	trust_remote_code = True
	).to("cuda")
	#print(pipeline)

	prompt = 'аниме девушка, waifu, يبتسم جنسيا , sur le fond de la tour Eiffel'
	generator = torch.Generator(device="cuda").manual_seed(42)

	image = pipeline(
	prompt = prompt,
	negative_prompt = "",
	generator=generator,
	)[0]

	for img in image:
	img.show()
	img.save('waifu.png')

	```

	![image](./waifu.png)

	## Donations

	We are a small GPU poor group of enthusiasts (current train budget ~$3k)
	![image](./low.png)

	Please contact with us if you may provide some GPU's on training

	DOGE: DEw2DR8C7BnF8GgcrfTzUjSnGkuMeJhg83


	## Contacts

	[recoilme](https://t.me/recoilme)

	## How to cite

	```bibtex
	@misc{Waifu,
	url = {[https://huggingface.co/AiArtLab/waifu-2b](https://huggingface.co/AiArtLab/waifu-2b)},
	title = {waifu-2b},
	author = {recoilme, muinez, femboysLover}
	}
	```