vatrpp / README.md
vittoriopippi
Edit README.md
fb73be2
---
language:
- en
tags:
- image-generation
- text-to-image
- conditional-generation
- generative-modeling
- image-synthesis
- image-manipulation
- design-prototyping
- research
- educational
license: mit
metrics:
- FID
- KID
- HWD
- CER
---
# VATr++ (Local Clone Version)
This is a local-clone-friendly version of the **VATr++** styled handwritten text generation model. If you prefer not to rely on `transformers`’s `trust_remote_code=True`, you can simply clone this repository and load the model directly.
> **Note**: For:
> - Full training instructions
> - Advanced features (style cycle loss, punctuation modes, etc.)
> - Original code details
please see the [VATr-pp GitHub repository](https://github.com/EDM-Research/VATr-pp). This local version is intended primarily for inference and basic usage.
---
## Installation & Setup
1. **Clone this repository (via Git LFS)**:
```bash
git clone https://huggingface.co/blowing-up-groundhogs/vatrpp
```
2. **Create (and activate) a conda environment (recommended)**:
```bash
conda create --name vatr python=3.9
conda activate vatr
```
3. **Install PyTorch (with CUDA if available)**:
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
```
4. **Install additional requirements**:
```bash
pip install transformers opencv-python matplotlib
```
---
## Loading the Model Locally
With the repository cloned, you can load either **VATr++** or the **original VATr** model locally.
### **VATr++**
```python
from vatrpp import VATrPP
model_vatr_pp = VATrPP.from_pretrained(
"vatrpp", # Local folder name or path
local_files_only=True
)
```
### **VATr (original)**
```python
from vatrpp import VATrPP
model_vatr = VATrPP.from_pretrained(
"vatrpp",
local_files_only=True,
subfolder="vatr" # Points to the original VATr checkpoint
)
```
---
## Usage (Inference Example)
Below is a **minimal** usage example demonstrating how to:
1. Load the **VATr++** model from your local clone.
2. Preprocess a style image (an image of handwriting).
3. Generate new handwritten text in the style of the provided image.
```python
import numpy as np
from PIL import Image
import torch
from torchvision import transforms as T
# 1. Load the model (VATr++)
from vatrpp import VATrPP
model = VATrPP.from_pretrained("vatrpp", local_files_only=True)
model.cuda()
# 2. Helper functions to load and process style images
def load_image(img, chunk_width=192):
# Convert to grayscale and resize to height 32
img = img.convert("L")
img = img.resize((img.width * 32 // img.height, 32))
arr = np.array(img)
# Setup transforms: invert + normalize
transform = T.Compose([
T.Grayscale(num_output_channels=1),
T.ToTensor(),
T.Normalize((0.5,), (0.5,))
])
# Pad / chunk the image to a fixed width
arr = 255 - arr
height, width = arr.shape
out = np.zeros((height, chunk_width), dtype="float32")
out[:, :width] = arr[:, :chunk_width]
out = 255 - out
# Apply transforms
out = transform(Image.fromarray(out.astype(np.uint8)))
return out, width
def load_image_line(img, chunk_width=192, style_imgs_count=15):
# Convert to grayscale and resize
img = img.convert("L")
img = img.resize((img.width * 32 // img.height, 32))
arr = np.array(img)
# Split into fixed-width chunks
chunks = []
for start in range(0, arr.shape[1], chunk_width):
chunk = arr[:, start:start+chunk_width]
chunks.append(chunk)
# Transform each chunk
transformed = []
for c in chunks:
t, _ = load_image(Image.fromarray(c), chunk_width)
transformed.append(t)
# If fewer than `style_imgs_count` chunks, repeat them
while len(transformed) < style_imgs_count:
transformed += transformed
transformed = transformed[:style_imgs_count]
# Combine
return torch.cat(transformed, 0)
# 3. Load a style image of your handwriting (or any handwriting sample)
style_image_path = "path/to/your_style_image.png"
img = Image.open(style_image_path)
style_imgs = load_image_line(img)
# 4. Generate text in the style of `style_image_path`
generated_pil_image = model.generate(
gen_text="This is a test", # Text to generate
style_imgs=style_imgs, # Preprocessed style chunks
align_words=True, # Align words at baseline
at_once=True, # Generate line at once
)
# 5. Save the generated image
generated_pil_image.save("generated_output.png")
```
- **`style_imgs`**: A batch of fixed-width image chunks from your style reference. In practice, you can supply multiple small style samples or a single line image split into chunks.
- **`gen_text`**: The text to render in the given style.
- **`align_words`** and **`at_once`**: Optional arguments controlling how the text is laid out and generated.
---
## Original Repository
This model is built upon the code from [**EDM-Research/VATr-pp**](https://github.com/EDM-Research/VATr-pp), itself an improvement on the [VATr](https://github.com/aimagelab/VATr) project. Please visit those repositories if you need to:
- Train your own model from scratch
- Explore advanced features (like style cycle loss, punctuation modes, or advanced augmentation)
- Examine experimental details or replicate the original paper's setup
---
## License and Acknowledgments
- The original code and model are under the license found in [the GitHub repository](https://github.com/EDM-Research/VATr-pp).
- All credit goes to the original authors and maintainers for creating VATr++ and releasing it openly.
- This local version is intended to simplify offline usage and keep everything self-contained.