|
--- |
|
language: |
|
- en |
|
tags: |
|
- image-generation |
|
- text-to-image |
|
- conditional-generation |
|
- generative-modeling |
|
- image-synthesis |
|
- image-manipulation |
|
- design-prototyping |
|
- research |
|
- educational |
|
license: mit |
|
metrics: |
|
- FID |
|
- KID |
|
- HWD |
|
- CER |
|
--- |
|
|
|
# VATr++ (Local Clone Version) |
|
|
|
This is a local-clone-friendly version of the **VATr++** styled handwritten text generation model. If you prefer not to rely on `transformers`’s `trust_remote_code=True`, you can simply clone this repository and load the model directly. |
|
|
|
> **Note**: For: |
|
> - Full training instructions |
|
> - Advanced features (style cycle loss, punctuation modes, etc.) |
|
> - Original code details |
|
|
|
please see the [VATr-pp GitHub repository](https://github.com/EDM-Research/VATr-pp). This local version is intended primarily for inference and basic usage. |
|
|
|
--- |
|
|
|
## Installation & Setup |
|
|
|
1. **Clone this repository (via Git LFS)**: |
|
```bash |
|
git clone https://huggingface.co/blowing-up-groundhogs/vatrpp |
|
``` |
|
|
|
2. **Create (and activate) a conda environment (recommended)**: |
|
```bash |
|
conda create --name vatr python=3.9 |
|
conda activate vatr |
|
``` |
|
|
|
3. **Install PyTorch (with CUDA if available)**: |
|
```bash |
|
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126 |
|
``` |
|
|
|
4. **Install additional requirements**: |
|
```bash |
|
pip install transformers opencv-python matplotlib |
|
``` |
|
|
|
--- |
|
|
|
## Loading the Model Locally |
|
|
|
With the repository cloned, you can load either **VATr++** or the **original VATr** model locally. |
|
|
|
### **VATr++** |
|
|
|
```python |
|
from vatrpp import VATrPP |
|
|
|
model_vatr_pp = VATrPP.from_pretrained( |
|
"vatrpp", # Local folder name or path |
|
local_files_only=True |
|
) |
|
``` |
|
|
|
### **VATr (original)** |
|
|
|
```python |
|
from vatrpp import VATrPP |
|
|
|
model_vatr = VATrPP.from_pretrained( |
|
"vatrpp", |
|
local_files_only=True, |
|
subfolder="vatr" # Points to the original VATr checkpoint |
|
) |
|
``` |
|
|
|
--- |
|
|
|
## Usage (Inference Example) |
|
|
|
Below is a **minimal** usage example demonstrating how to: |
|
|
|
1. Load the **VATr++** model from your local clone. |
|
2. Preprocess a style image (an image of handwriting). |
|
3. Generate new handwritten text in the style of the provided image. |
|
|
|
```python |
|
import numpy as np |
|
from PIL import Image |
|
import torch |
|
from torchvision import transforms as T |
|
|
|
# 1. Load the model (VATr++) |
|
from vatrpp import VATrPP |
|
model = VATrPP.from_pretrained("vatrpp", local_files_only=True) |
|
model.cuda() |
|
|
|
# 2. Helper functions to load and process style images |
|
def load_image(img, chunk_width=192): |
|
# Convert to grayscale and resize to height 32 |
|
img = img.convert("L") |
|
img = img.resize((img.width * 32 // img.height, 32)) |
|
arr = np.array(img) |
|
|
|
# Setup transforms: invert + normalize |
|
transform = T.Compose([ |
|
T.Grayscale(num_output_channels=1), |
|
T.ToTensor(), |
|
T.Normalize((0.5,), (0.5,)) |
|
]) |
|
|
|
# Pad / chunk the image to a fixed width |
|
arr = 255 - arr |
|
height, width = arr.shape |
|
out = np.zeros((height, chunk_width), dtype="float32") |
|
out[:, :width] = arr[:, :chunk_width] |
|
out = 255 - out |
|
|
|
# Apply transforms |
|
out = transform(Image.fromarray(out.astype(np.uint8))) |
|
return out, width |
|
|
|
def load_image_line(img, chunk_width=192, style_imgs_count=15): |
|
# Convert to grayscale and resize |
|
img = img.convert("L") |
|
img = img.resize((img.width * 32 // img.height, 32)) |
|
arr = np.array(img) |
|
|
|
# Split into fixed-width chunks |
|
chunks = [] |
|
for start in range(0, arr.shape[1], chunk_width): |
|
chunk = arr[:, start:start+chunk_width] |
|
chunks.append(chunk) |
|
|
|
# Transform each chunk |
|
transformed = [] |
|
for c in chunks: |
|
t, _ = load_image(Image.fromarray(c), chunk_width) |
|
transformed.append(t) |
|
|
|
# If fewer than `style_imgs_count` chunks, repeat them |
|
while len(transformed) < style_imgs_count: |
|
transformed += transformed |
|
transformed = transformed[:style_imgs_count] |
|
|
|
# Combine |
|
return torch.cat(transformed, 0) |
|
|
|
# 3. Load a style image of your handwriting (or any handwriting sample) |
|
style_image_path = "path/to/your_style_image.png" |
|
img = Image.open(style_image_path) |
|
style_imgs = load_image_line(img) |
|
|
|
# 4. Generate text in the style of `style_image_path` |
|
generated_pil_image = model.generate( |
|
gen_text="This is a test", # Text to generate |
|
style_imgs=style_imgs, # Preprocessed style chunks |
|
align_words=True, # Align words at baseline |
|
at_once=True, # Generate line at once |
|
) |
|
|
|
# 5. Save the generated image |
|
generated_pil_image.save("generated_output.png") |
|
``` |
|
|
|
- **`style_imgs`**: A batch of fixed-width image chunks from your style reference. In practice, you can supply multiple small style samples or a single line image split into chunks. |
|
- **`gen_text`**: The text to render in the given style. |
|
- **`align_words`** and **`at_once`**: Optional arguments controlling how the text is laid out and generated. |
|
|
|
--- |
|
|
|
## Original Repository |
|
|
|
This model is built upon the code from [**EDM-Research/VATr-pp**](https://github.com/EDM-Research/VATr-pp), itself an improvement on the [VATr](https://github.com/aimagelab/VATr) project. Please visit those repositories if you need to: |
|
|
|
- Train your own model from scratch |
|
- Explore advanced features (like style cycle loss, punctuation modes, or advanced augmentation) |
|
- Examine experimental details or replicate the original paper's setup |
|
|
|
--- |
|
|
|
## License and Acknowledgments |
|
|
|
- The original code and model are under the license found in [the GitHub repository](https://github.com/EDM-Research/VATr-pp). |
|
- All credit goes to the original authors and maintainers for creating VATr++ and releasing it openly. |
|
- This local version is intended to simplify offline usage and keep everything self-contained. |