language:
- en
tags:
- image-generation
- text-to-image
- conditional-generation
- generative-modeling
- image-synthesis
- image-manipulation
- design-prototyping
- research
- educational
license: mit
metrics:
- FID
- KID
- HWD
- CER
VATr++ (Hugging Face Version)
This is a re-upload of the VATr++ styled handwritten text generation model to the Hugging Face Model Hub. The original code and more detailed documentation can be found in the VATr-pp GitHub repository.
Note: Please refer to the original repo for:
- Full training instructions
- In-depth code details
- Extended usage and references
This Hugging Face version allows you to directly load the VATr++ model with AutoModel.from_pretrained(...)
and use it in your pipelines or scripts without manually handling checkpoints. The usage differs slightly from the original GitHub repository, primarily because we leverage Hugging Face’s transformers
interface here.
Installation
Create a conda environment (recommended):
conda create --name vatr python=3.9 conda activate vatr
Install PyTorch and CUDA (if available):
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
Install additional requirements (including
transformers
,opencv
, etc.):pip install transformers opencv-python
You may need to adjust or add libraries based on your specific environment needs.
Loading the Model
VATr++
To load the VATr++ version:
from transformers import AutoModel
model_vatr_pp = AutoModel.from_pretrained(
"blowing-up-groundhogs/vatrpp",
trust_remote_code=True
)
VATr (original)
To load the original VATr model (instead of VATr++), specify the subfolder
argument:
model_vatr = AutoModel.from_pretrained(
"blowing-up-groundhogs/vatrpp",
subfolder="vatr",
trust_remote_code=True
)
Usage (Inference Example)
Below is a minimal usage example that demonstrates how to:
- Load the VATr++ model from the Hugging Face Hub.
- Preprocess a style image (an image of handwriting).
- Generate a new handwritten line of text in the style of the provided image.
Important: This model requires
trust_remote_code=True
to properly load its custom generation logic.
import numpy as np
from PIL import Image
import torch
from torchvision import transforms as T
from transformers import AutoModel
# 1. Load the model (VATr++)
model = AutoModel.from_pretrained("blowing-up-groundhogs/vatrpp", trust_remote_code=True)
# 2. Helper functions to load and process style images
def load_image(img, chunk_width=192):
# Convert to grayscale and resize to height 32
img = img.convert("L")
img = img.resize((img.width * 32 // img.height, 32))
arr = np.array(img)
# Setup transforms: invert + normalize
transform = T.Compose([
T.Grayscale(num_output_channels=1),
T.ToTensor(),
T.Normalize((0.5,), (0.5,))
])
# Pad / chunk the image to a fixed width
arr = 255 - arr
height, width = arr.shape
out = np.zeros((height, chunk_width), dtype="float32")
out[:, :width] = arr[:, :chunk_width]
out = 255 - out
# Apply transforms
out = transform(Image.fromarray(out.astype(np.uint8)))
return out, width
def load_image_line(img, chunk_width=192, style_imgs_count=15):
# Convert to grayscale and resize
img = img.convert("L")
img = img.resize((img.width * 32 // img.height, 32))
arr = np.array(img)
# Split into fixed-width chunks
chunks = []
for start in range(0, arr.shape[1], chunk_width):
chunk = arr[:, start:start+chunk_width]
chunks.append(chunk)
# Transform each chunk
transformed = []
for c in chunks:
t, _ = load_image(Image.fromarray(c), chunk_width)
transformed.append(t)
# If fewer than `style_imgs_count` chunks, repeat them
while len(transformed) < style_imgs_count:
transformed += transformed
transformed = transformed[:style_imgs_count]
# Combine
return torch.cat(transformed, 0)
# 3. Load a style image of your handwriting (or any handwriting sample)
style_image_path = "path/to/your_style_image.png"
img = Image.open(style_image_path)
style_imgs = load_image_line(img)
# 4. Generate text in the style of `style_image_path`
generated_pil_image = model.generate(
gen_text="This is a test", # Text to generate
style_imgs=style_imgs, # Preprocessed style chunks
align_words=True, # Align words at baseline
at_once=True, # Generate line at once
)
# 5. Save the generated image
generated_pil_image.save("generated_output.png")
style_imgs
: A batch of fixed-width image chunks from your style reference. In practice, you can supply multiple small style samples or a single line image split into chunks.gen_text
: The text to render in the given style.align_words
andat_once
: Optional arguments that control how the text is laid out and generated.
Original Repository
This model is built upon the code from EDM-Research/VATr-pp, which is itself an improvement on the VATr project. If you need to:
- Train your own model from scratch
- Explore advanced features (like style cycle loss, punctuation modes, or advanced augmentation)
- Examine experimental details or replicate the original paper's setup
Please visit the original GitHub repos for comprehensive documentation and support files.
License and Acknowledgments
- The original code and model are under the license found in the GitHub repository.
- All credit goes to the original authors and maintainers for creating VATr++ and releasing it openly.
- This Hugging Face re-upload is merely intended to simplify inference and model sharing; no changes have been made to the core training code or conceptual pipeline.
Enjoy generating styled handwritten text! For any issues specific to this Hugging Face version, feel free to open an issue or pull request here. Otherwise, for deeper technical questions, please consult the original repository or its authors.