File size: 2,260 Bytes
654f3b4 3877001 c757942 2365dda 6d4a5a3 2365dda d1249a9 2365dda d1249a9 6d4a5a3 c757942 2365dda 6d4a5a3 2365dda d1249a9 6d4a5a3 2365dda 6d4a5a3 2365dda 6d4a5a3 2365dda 6d4a5a3 2365dda 6d4a5a3 2365dda 6d4a5a3 d1249a9 2365dda c757942 5a55ecb c757942 ff09d63 6d4a5a3 c757942 6d4a5a3 c757942 6d4a5a3 c757942 6d4a5a3 ff09d63 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
datasets:
- yashvoladoddi37/kanjienglish
language:
- en
- ja
library_name: diffusers
pipeline_tag: text-to-image
---
# Kanji Diffusion v1-4 Model Card
Kanji Diffusion is a latent text-to-image diffusion model capable of hallucinating Kanji characters given any English prompt.
## Fine-tuned Model Details
- **Developed by:** Yashpreet Voladoddi
- **Model type:** Diffusion-based text-to-image generation model, fine-tuned on Stable Diffusion v1.4 model.
### Colab
In order to run the pipeline and see how my model generates the kanji characters, follow the code flow below on Colab(on T4 GPU runtime, else it takes a long time to infer each image).
Make sure you have your Huggingface API KEY / ACCESS TOKEN for this.
```python
import os
from google.colab import drive
drive.mount('/content/drive')
os.chdir("/content/drive/MyDrive")
!pip install diffusers
!git clone https://github.com/huggingface/diffusers
!huggingface-cli login
from diffusers import StableDiffusionPipeline
import torch
torch.cuda.empty_cache()
model_path = "yashvoladoddi37/kanji-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16, use_safetensors = True).to("cuda")
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")
prompt = "A Kanji meaning baby robot"
image = pipe(prompt).images[0]
image.save("baby-robot-kanji-v1-4.png")
```
### Limitations
## Training
**Training Data**
**Hardware:** Nvidia GTX 1650 4GB vRAM | 8GB RAM and T4 GPU on Colab
**Training Script:**
```python
!accelerate launch train_text_to_image_lora.py \
--pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
--dataset_name="yashvoladoddi37/kanjienglish" \
--image_column = "image"
--caption_column="text" \
--resolution=512 \
--random_flip \
--train_batch_size=1 \
--num_train_epochs=1 \
--checkpointing_steps=500 \
--learning_rate=1e-04 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--seed=42 \
--output_dir="kanji-diffusion-v1-4" \
--validation_prompt="A kanji meaning Elon Musk" \
--push_to_hub
``` |