language:
- en
tags:
- image-generation
- text-to-image
- conditional-generation
- generative-modeling
- image-synthesis
- image-manipulation
- design-prototyping
- research
- educational
license: mit
metrics:
- FID
- KID
- HWD
- CER
Handwritten Text Generation from Visual Archetypes ++
This repository includes the code for training the VATr++ Styled Handwritten Text Generation model.
Installation
conda create --name vatr python=3.9
conda activate vatr
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
git clone https://github.com/aimagelab/VATr.git && cd VATr
pip install -r requirements.txt
This folder contains the regular IAM dataset IAM-32.pickle
and the modified version with attached punctuation marks IAM-32-pa.pickle
.
The folder also contains the synthetically pretrained weights for the encoder resnet_18_pretrained.pth
.
Please download these files and place them into the files
folder.
Training
To train the regular VATr model, use the following command. This uses the default settings from the paper.
python train.py
Useful arguments:
python train.py
--feat_model_path PATH # path to the pretrained resnet 18 checkpoint. By default this is the synthetically pretrained model
--is_cycle # use style cycle loss for training
--dataset DATASET # dataset to use. Default IAM
--resume # resume training from the last checkpoint with the same name
--wandb # use wandb for logging
Use the following arguments to apply full VATr++ training
python train.py
--d-crop-size 64 128 # Randomly crop input to discriminator to width 64 to 128
--text-augment-strength 0.4 # Text augmentation for adding more rare characters
--file-suffix pa # Use the punctuation attached version of IAM
--augment-ocr # Augment the real images used to train the OCR model
Pretraining dataset
The model resnet_18_pretrained.pth
was pretrained by using this dataset: Font Square
Generate Styled Handwritten Text Images
We added some utility to generate handwritten text images using the trained model. These are used as follows:
python generate.py [ACTION] --checkpoint files/vatrpp.pth
The following actions are available with their respective arguments.
Custom Author
Generate the given text for a custom author.
text --text STRING # String to generate
--text-path PATH # Optional path to text file
--output PATH # Optional output location, default: files/output.png
--style-folder PATH # Optional style folder containing writer samples, default: 'files/style_samples/00'
Style samples for the author are needed. These can be automatically generated from an image of a page using create_style_sample.py
.
python create_style_sample.py --input-image PATH # Path of the image to extract the style samples from.
--output-folder PATH # Folder where the style samples should be saved
All Authors
Generate some text for all authors of IAM. The output is saved to saved_images/author_samples/
authors --test-set # Generate authors of test set, otherwise training set is generated
--checkpoint PATH # Checkpoint used to generate text, files/vatr.pth by default
--align # Detect the bottom lines for each word and align them
--at-once # Generate the whole sentence at once instead of word-by-word
--output-style # Also save the style images used to generate the words
Evaluation Images
fid --target_dataset_path PATH # dataset file for which the test set will be generated
--dataset-path PATH # dataset file from which style samples will be taken, for example the attached punctuation
--output PATH # where to save the images, default is saved_images/fid
--checkpoint PATH # Checkpoint used to generate text, files/vatr.pth by default
--all-epochs # Generate evaluation images for all saved epochs available (checkpoint has to be a folder)
--fake-only # Only output fake images, no ground truth
--test-only # Only generate test set, not train set
--long-tail # Only generate words containing long tail characters