vatrpp / README.md
vittoriopippi
Edit README.md
fb73be2
metadata
language:
  - en
tags:
  - image-generation
  - text-to-image
  - conditional-generation
  - generative-modeling
  - image-synthesis
  - image-manipulation
  - design-prototyping
  - research
  - educational
license: mit
metrics:
  - FID
  - KID
  - HWD
  - CER

VATr++ (Local Clone Version)

This is a local-clone-friendly version of the VATr++ styled handwritten text generation model. If you prefer not to rely on transformers’s trust_remote_code=True, you can simply clone this repository and load the model directly.

Note: For:

  • Full training instructions
  • Advanced features (style cycle loss, punctuation modes, etc.)
  • Original code details

please see the VATr-pp GitHub repository. This local version is intended primarily for inference and basic usage.


Installation & Setup

  1. Clone this repository (via Git LFS):

    git clone https://huggingface.co/blowing-up-groundhogs/vatrpp
    
  2. Create (and activate) a conda environment (recommended):

    conda create --name vatr python=3.9
    conda activate vatr
    
  3. Install PyTorch (with CUDA if available):

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
    
  4. Install additional requirements:

    pip install transformers opencv-python matplotlib
    

Loading the Model Locally

With the repository cloned, you can load either VATr++ or the original VATr model locally.

VATr++

from vatrpp import VATrPP

model_vatr_pp = VATrPP.from_pretrained(
    "vatrpp",            # Local folder name or path
    local_files_only=True
)

VATr (original)

from vatrpp import VATrPP

model_vatr = VATrPP.from_pretrained(
    "vatrpp",
    local_files_only=True,
    subfolder="vatr"     # Points to the original VATr checkpoint
)

Usage (Inference Example)

Below is a minimal usage example demonstrating how to:

  1. Load the VATr++ model from your local clone.
  2. Preprocess a style image (an image of handwriting).
  3. Generate new handwritten text in the style of the provided image.
import numpy as np
from PIL import Image
import torch
from torchvision import transforms as T

# 1. Load the model (VATr++)
from vatrpp import VATrPP
model = VATrPP.from_pretrained("vatrpp", local_files_only=True)
model.cuda()

# 2. Helper functions to load and process style images
def load_image(img, chunk_width=192):
    # Convert to grayscale and resize to height 32
    img = img.convert("L")
    img = img.resize((img.width * 32 // img.height, 32))
    arr = np.array(img)

    # Setup transforms: invert + normalize
    transform = T.Compose([
        T.Grayscale(num_output_channels=1),
        T.ToTensor(),
        T.Normalize((0.5,), (0.5,))
    ])

    # Pad / chunk the image to a fixed width
    arr = 255 - arr
    height, width = arr.shape
    out = np.zeros((height, chunk_width), dtype="float32")
    out[:, :width] = arr[:, :chunk_width]
    out = 255 - out

    # Apply transforms
    out = transform(Image.fromarray(out.astype(np.uint8)))
    return out, width

def load_image_line(img, chunk_width=192, style_imgs_count=15):
    # Convert to grayscale and resize
    img = img.convert("L")
    img = img.resize((img.width * 32 // img.height, 32))
    arr = np.array(img)

    # Split into fixed-width chunks
    chunks = []
    for start in range(0, arr.shape[1], chunk_width):
        chunk = arr[:, start:start+chunk_width]
        chunks.append(chunk)

    # Transform each chunk
    transformed = []
    for c in chunks:
        t, _ = load_image(Image.fromarray(c), chunk_width)
        transformed.append(t)

    # If fewer than `style_imgs_count` chunks, repeat them
    while len(transformed) < style_imgs_count:
        transformed += transformed
    transformed = transformed[:style_imgs_count]

    # Combine
    return torch.cat(transformed, 0)

# 3. Load a style image of your handwriting (or any handwriting sample)
style_image_path = "path/to/your_style_image.png"
img = Image.open(style_image_path)
style_imgs = load_image_line(img)

# 4. Generate text in the style of `style_image_path`
generated_pil_image = model.generate(
    gen_text="This is a test",    # Text to generate
    style_imgs=style_imgs,        # Preprocessed style chunks
    align_words=True,             # Align words at baseline
    at_once=True,                 # Generate line at once
)

# 5. Save the generated image
generated_pil_image.save("generated_output.png")
  • style_imgs: A batch of fixed-width image chunks from your style reference. In practice, you can supply multiple small style samples or a single line image split into chunks.
  • gen_text: The text to render in the given style.
  • align_words and at_once: Optional arguments controlling how the text is laid out and generated.

Original Repository

This model is built upon the code from EDM-Research/VATr-pp, itself an improvement on the VATr project. Please visit those repositories if you need to:

  • Train your own model from scratch
  • Explore advanced features (like style cycle loss, punctuation modes, or advanced augmentation)
  • Examine experimental details or replicate the original paper's setup

License and Acknowledgments

  • The original code and model are under the license found in the GitHub repository.
  • All credit goes to the original authors and maintainers for creating VATr++ and releasing it openly.
  • This local version is intended to simplify offline usage and keep everything self-contained.