vatrpp / README.md

vittoriopippi

Edit README.md

fb73be2 12 days ago

5.83 kB

	---
	language:
	- en
	tags:
	- image-generation
	- text-to-image
	- conditional-generation
	- generative-modeling
	- image-synthesis
	- image-manipulation
	- design-prototyping
	- research
	- educational
	license: mit
	metrics:
	- FID
	- KID
	- HWD
	- CER
	---

	# VATr++ (Local Clone Version)

	This is a local-clone-friendly version of the VATr++ styled handwritten text generation model. If you prefer not to rely on `transformers`’s `trust_remote_code=True`, you can simply clone this repository and load the model directly.

	> Note: For:
	> - Full training instructions
	> - Advanced features (style cycle loss, punctuation modes, etc.)
	> - Original code details

	please see the [VATr-pp GitHub repository](https://github.com/EDM-Research/VATr-pp). This local version is intended primarily for inference and basic usage.

	---

	## Installation & Setup

	1. Clone this repository (via Git LFS):
	```bash
	git clone https://huggingface.co/blowing-up-groundhogs/vatrpp
	```

	2. Create (and activate) a conda environment (recommended):
	```bash
	conda create --name vatr python=3.9
	conda activate vatr
	```

	3. Install PyTorch (with CUDA if available):
	```bash
	pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
	```

	4. Install additional requirements:
	```bash
	pip install transformers opencv-python matplotlib
	```

	---

	## Loading the Model Locally

	With the repository cloned, you can load either VATr++ or the original VATr model locally.

	### VATr++

	```python
	from vatrpp import VATrPP

	model_vatr_pp = VATrPP.from_pretrained(
	"vatrpp", # Local folder name or path
	local_files_only=True
	)
	```

	### VATr (original)

	```python
	from vatrpp import VATrPP

	model_vatr = VATrPP.from_pretrained(
	"vatrpp",
	local_files_only=True,
	subfolder="vatr" # Points to the original VATr checkpoint
	)
	```

	---

	## Usage (Inference Example)

	Below is a minimal usage example demonstrating how to:

	1. Load the VATr++ model from your local clone.
	2. Preprocess a style image (an image of handwriting).
	3. Generate new handwritten text in the style of the provided image.

	```python
	import numpy as np
	from PIL import Image
	import torch
	from torchvision import transforms as T

	# 1. Load the model (VATr++)
	from vatrpp import VATrPP
	model = VATrPP.from_pretrained("vatrpp", local_files_only=True)
	model.cuda()

	# 2. Helper functions to load and process style images
	def load_image(img, chunk_width=192):
	# Convert to grayscale and resize to height 32
	img = img.convert("L")
	img = img.resize((img.width * 32 // img.height, 32))
	arr = np.array(img)

	# Setup transforms: invert + normalize
	transform = T.Compose([
	T.Grayscale(num_output_channels=1),
	T.ToTensor(),
	T.Normalize((0.5,), (0.5,))
	])

	# Pad / chunk the image to a fixed width
	arr = 255 - arr
	height, width = arr.shape
	out = np.zeros((height, chunk_width), dtype="float32")
	out[:, :width] = arr[:, :chunk_width]
	out = 255 - out

	# Apply transforms
	out = transform(Image.fromarray(out.astype(np.uint8)))
	return out, width

	def load_image_line(img, chunk_width=192, style_imgs_count=15):
	# Convert to grayscale and resize
	img = img.convert("L")
	img = img.resize((img.width * 32 // img.height, 32))
	arr = np.array(img)

	# Split into fixed-width chunks
	chunks = []
	for start in range(0, arr.shape[1], chunk_width):
	chunk = arr[:, start:start+chunk_width]
	chunks.append(chunk)

	# Transform each chunk
	transformed = []
	for c in chunks:
	t, _ = load_image(Image.fromarray(c), chunk_width)
	transformed.append(t)

	# If fewer than `style_imgs_count` chunks, repeat them
	while len(transformed) < style_imgs_count:
	transformed += transformed
	transformed = transformed[:style_imgs_count]

	# Combine
	return torch.cat(transformed, 0)

	# 3. Load a style image of your handwriting (or any handwriting sample)
	style_image_path = "path/to/your_style_image.png"
	img = Image.open(style_image_path)
	style_imgs = load_image_line(img)

	# 4. Generate text in the style of `style_image_path`
	generated_pil_image = model.generate(
	gen_text="This is a test", # Text to generate
	style_imgs=style_imgs, # Preprocessed style chunks
	align_words=True, # Align words at baseline
	at_once=True, # Generate line at once
	)

	# 5. Save the generated image
	generated_pil_image.save("generated_output.png")
	```

	- `style_imgs`: A batch of fixed-width image chunks from your style reference. In practice, you can supply multiple small style samples or a single line image split into chunks.
	- `gen_text`: The text to render in the given style.
	- `align_words` and `at_once`: Optional arguments controlling how the text is laid out and generated.

	---

	## Original Repository

	This model is built upon the code from [EDM-Research/VATr-pp](https://github.com/EDM-Research/VATr-pp), itself an improvement on the [VATr](https://github.com/aimagelab/VATr) project. Please visit those repositories if you need to:

	- Train your own model from scratch
	- Explore advanced features (like style cycle loss, punctuation modes, or advanced augmentation)
	- Examine experimental details or replicate the original paper's setup

	---

	## License and Acknowledgments

	- The original code and model are under the license found in [the GitHub repository](https://github.com/EDM-Research/VATr-pp).
	- All credit goes to the original authors and maintainers for creating VATr++ and releasing it openly.
	- This local version is intended to simplify offline usage and keep everything self-contained.