File size: 6,517 Bytes
e981486 fa25a23 4506963 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 07f8685 fa25a23 07f8685 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 fa0f216 fa25a23 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
---
language:
- en
tags:
- image-generation
- text-to-image
- conditional-generation
- generative-modeling
- image-synthesis
- image-manipulation
- design-prototyping
- research
- educational
license: mit
metrics:
- FID
- KID
- HWD
- CER
---
# VATr++ (Hugging Face Version)
This is a re-upload of the **VATr++** styled handwritten text generation model to the Hugging Face Model Hub. The original code and more detailed documentation can be found in the [VATr-pp GitHub repository](https://github.com/EDM-Research/VATr-pp).
> **Note**: Please refer to the original repo for:
> - Full training instructions
> - In-depth code details
> - Extended usage and references
This Hugging Face version allows you to directly load the **VATr++** model with `AutoModel.from_pretrained(...)` and use it in your pipelines or scripts without manually handling checkpoints. The usage differs slightly from the original GitHub repository, primarily because we leverage Hugging Face’s `transformers` interface here.
---
## Installation
1. **Create a conda environment (recommended)**:
```bash
conda create --name vatr python=3.9
conda activate vatr
```
2. **Install PyTorch and CUDA (if available)**:
```bash
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
```
3. **Install additional requirements** (including `transformers`, `opencv`, etc.):
```bash
pip install transformers opencv-python
```
*You may need to adjust or add libraries based on your specific environment needs.*
---
## Loading the Model
#### **VATr++**
To load the **VATr++** version:
```python
from transformers import AutoModel
model_vatr_pp = AutoModel.from_pretrained(
"blowing-up-groundhogs/vatrpp",
trust_remote_code=True
)
```
#### **VATr (original)**
To load the **original VATr** model (instead of VATr++), specify the `subfolder` argument:
```python
model_vatr = AutoModel.from_pretrained(
"blowing-up-groundhogs/vatrpp",
subfolder="vatr",
trust_remote_code=True
)
```
---
## Usage (Inference Example)
Below is a **minimal** usage example that demonstrates how to:
1. Load the VATr++ model from the Hugging Face Hub.
2. Preprocess a style image (an image of handwriting).
3. Generate a new handwritten line of text in the style of the provided image.
> **Important**: This model requires `trust_remote_code=True` to properly load its custom generation logic.
```python
import numpy as np
from PIL import Image
import torch
from torchvision import transforms as T
from transformers import AutoModel
# 1. Load the model (VATr++)
model = AutoModel.from_pretrained("blowing-up-groundhogs/vatrpp", trust_remote_code=True)
# 2. Helper functions to load and process style images
def load_image(img, chunk_width=192):
# Convert to grayscale and resize to height 32
img = img.convert("L")
img = img.resize((img.width * 32 // img.height, 32))
arr = np.array(img)
# Setup transforms: invert + normalize
transform = T.Compose([
T.Grayscale(num_output_channels=1),
T.ToTensor(),
T.Normalize((0.5,), (0.5,))
])
# Pad / chunk the image to a fixed width
arr = 255 - arr
height, width = arr.shape
out = np.zeros((height, chunk_width), dtype="float32")
out[:, :width] = arr[:, :chunk_width]
out = 255 - out
# Apply transforms
out = transform(Image.fromarray(out.astype(np.uint8)))
return out, width
def load_image_line(img, chunk_width=192, style_imgs_count=15):
# Convert to grayscale and resize
img = img.convert("L")
img = img.resize((img.width * 32 // img.height, 32))
arr = np.array(img)
# Split into fixed-width chunks
chunks = []
for start in range(0, arr.shape[1], chunk_width):
chunk = arr[:, start:start+chunk_width]
chunks.append(chunk)
# Transform each chunk
transformed = []
for c in chunks:
t, _ = load_image(Image.fromarray(c), chunk_width)
transformed.append(t)
# If fewer than `style_imgs_count` chunks, repeat them
while len(transformed) < style_imgs_count:
transformed += transformed
transformed = transformed[:style_imgs_count]
# Combine
return torch.cat(transformed, 0)
# 3. Load a style image of your handwriting (or any handwriting sample)
style_image_path = "path/to/your_style_image.png"
img = Image.open(style_image_path)
style_imgs = load_image_line(img)
# 4. Generate text in the style of `style_image_path`
generated_pil_image = model.generate(
gen_text="This is a test", # Text to generate
style_imgs=style_imgs, # Preprocessed style chunks
align_words=True, # Align words at baseline
at_once=True, # Generate line at once
)
# 5. Save the generated image
generated_pil_image.save("generated_output.png")
```
- **`style_imgs`**: A batch of fixed-width image chunks from your style reference. In practice, you can supply multiple small style samples or a single line image split into chunks.
- **`gen_text`**: The text to render in the given style.
- **`align_words`** and **`at_once`**: Optional arguments that control how the text is laid out and generated.
---
## Original Repository
This model is built upon the code from [**EDM-Research/VATr-pp**](https://github.com/EDM-Research/VATr-pp), which is itself an improvement on the [VATr](https://github.com/aimagelab/VATr) project. If you need to:
- Train your own model from scratch
- Explore advanced features (like style cycle loss, punctuation modes, or advanced augmentation)
- Examine experimental details or replicate the original paper's setup
Please visit the original GitHub repos for comprehensive documentation and support files.
---
## License and Acknowledgments
- The original code and model are under the license found in [the GitHub repository](https://github.com/EDM-Research/VATr-pp).
- All credit goes to the original authors and maintainers for creating VATr++ and releasing it openly.
- This Hugging Face re-upload is merely intended to **simplify inference** and **model sharing**; no changes have been made to the core training code or conceptual pipeline.
---
**Enjoy generating styled handwritten text!** For any issues specific to this Hugging Face version, feel free to open an issue or pull request here. Otherwise, for deeper technical questions, please consult the original repository or its authors. |