language:
- en
tags:
- image-generation
- text-to-image
- conditional-generation
- generative-modeling
- image-synthesis
- image-manipulation
- design-prototyping
- research
- educational
license: mit
metrics:
- FID
- KID
- HWD
- CER
VATr++ (Local Clone Version)
This is a local-clone-friendly version of the VATr++ styled handwritten text generation model. If you prefer not to rely on transformers
’s trust_remote_code=True
, you can simply clone this repository and load the model directly.
Note: For:
- Full training instructions
- Advanced features (style cycle loss, punctuation modes, etc.)
- Original code details
please see the VATr-pp GitHub repository. This local version is intended primarily for inference and basic usage.
Installation & Setup
Clone this repository (via Git LFS):
git clone https://huggingface.co/blowing-up-groundhogs/vatrpp
Create (and activate) a conda environment (recommended):
conda create --name vatr python=3.9 conda activate vatr
Install PyTorch (with CUDA if available):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
Install additional requirements:
pip install transformers opencv-python matplotlib
Loading the Model Locally
With the repository cloned, you can load either VATr++ or the original VATr model locally.
VATr++
from vatrpp import VATrPP
model_vatr_pp = VATrPP.from_pretrained(
"vatrpp", # Local folder name or path
local_files_only=True
)
VATr (original)
from vatrpp import VATrPP
model_vatr = VATrPP.from_pretrained(
"vatrpp",
local_files_only=True,
subfolder="vatr" # Points to the original VATr checkpoint
)
Usage (Inference Example)
Below is a minimal usage example demonstrating how to:
- Load the VATr++ model from your local clone.
- Preprocess a style image (an image of handwriting).
- Generate new handwritten text in the style of the provided image.
import numpy as np
from PIL import Image
import torch
from torchvision import transforms as T
# 1. Load the model (VATr++)
from vatrpp import VATrPP
model = VATrPP.from_pretrained("vatrpp", local_files_only=True)
model.cuda()
# 2. Helper functions to load and process style images
def load_image(img, chunk_width=192):
# Convert to grayscale and resize to height 32
img = img.convert("L")
img = img.resize((img.width * 32 // img.height, 32))
arr = np.array(img)
# Setup transforms: invert + normalize
transform = T.Compose([
T.Grayscale(num_output_channels=1),
T.ToTensor(),
T.Normalize((0.5,), (0.5,))
])
# Pad / chunk the image to a fixed width
arr = 255 - arr
height, width = arr.shape
out = np.zeros((height, chunk_width), dtype="float32")
out[:, :width] = arr[:, :chunk_width]
out = 255 - out
# Apply transforms
out = transform(Image.fromarray(out.astype(np.uint8)))
return out, width
def load_image_line(img, chunk_width=192, style_imgs_count=15):
# Convert to grayscale and resize
img = img.convert("L")
img = img.resize((img.width * 32 // img.height, 32))
arr = np.array(img)
# Split into fixed-width chunks
chunks = []
for start in range(0, arr.shape[1], chunk_width):
chunk = arr[:, start:start+chunk_width]
chunks.append(chunk)
# Transform each chunk
transformed = []
for c in chunks:
t, _ = load_image(Image.fromarray(c), chunk_width)
transformed.append(t)
# If fewer than `style_imgs_count` chunks, repeat them
while len(transformed) < style_imgs_count:
transformed += transformed
transformed = transformed[:style_imgs_count]
# Combine
return torch.cat(transformed, 0)
# 3. Load a style image of your handwriting (or any handwriting sample)
style_image_path = "path/to/your_style_image.png"
img = Image.open(style_image_path)
style_imgs = load_image_line(img)
# 4. Generate text in the style of `style_image_path`
generated_pil_image = model.generate(
gen_text="This is a test", # Text to generate
style_imgs=style_imgs, # Preprocessed style chunks
align_words=True, # Align words at baseline
at_once=True, # Generate line at once
)
# 5. Save the generated image
generated_pil_image.save("generated_output.png")
style_imgs
: A batch of fixed-width image chunks from your style reference. In practice, you can supply multiple small style samples or a single line image split into chunks.gen_text
: The text to render in the given style.align_words
andat_once
: Optional arguments controlling how the text is laid out and generated.
Original Repository
This model is built upon the code from EDM-Research/VATr-pp, itself an improvement on the VATr project. Please visit those repositories if you need to:
- Train your own model from scratch
- Explore advanced features (like style cycle loss, punctuation modes, or advanced augmentation)
- Examine experimental details or replicate the original paper's setup
License and Acknowledgments
- The original code and model are under the license found in the GitHub repository.
- All credit goes to the original authors and maintainers for creating VATr++ and releasing it openly.
- This local version is intended to simplify offline usage and keep everything self-contained.