vatrpp / README.md
vittoriopippi
Edit README.md
4506963
|
raw
history blame
4.72 kB
---
language:
- en
tags:
- image-generation
- text-to-image
- conditional-generation
- generative-modeling
- image-synthesis
- image-manipulation
- design-prototyping
- research
- educational
license: mit
metrics:
- FID
- KID
- HWD
- CER
---
# Handwritten Text Generation from Visual Archetypes ++
This repository includes the code for training the VATr++ Styled Handwritten Text Generation model.
## Installation
```bash
conda create --name vatr python=3.9
conda activate vatr
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
git clone https://github.com/aimagelab/VATr.git && cd VATr
pip install -r requirements.txt
```
[This folder](https://drive.google.com/drive/folders/13rJhjl7VsyiXlPTBvnp1EKkKEhckLalr?usp=sharing) contains the regular IAM dataset `IAM-32.pickle` and the modified version with attached punctuation marks `IAM-32-pa.pickle`.
The folder also contains the synthetically pretrained weights for the encoder `resnet_18_pretrained.pth`.
Please download these files and place them into the `files` folder.
## Training
To train the regular VATr model, use the following command. This uses the default settings from the paper.
```bash
python train.py
```
Useful arguments:
```bash
python train.py
--feat_model_path PATH # path to the pretrained resnet 18 checkpoint. By default this is the synthetically pretrained model
--is_cycle # use style cycle loss for training
--dataset DATASET # dataset to use. Default IAM
--resume # resume training from the last checkpoint with the same name
--wandb # use wandb for logging
```
Use the following arguments to apply full VATr++ training
```bash
python train.py
--d-crop-size 64 128 # Randomly crop input to discriminator to width 64 to 128
--text-augment-strength 0.4 # Text augmentation for adding more rare characters
--file-suffix pa # Use the punctuation attached version of IAM
--augment-ocr # Augment the real images used to train the OCR model
```
### Pretraining dataset
The model `resnet_18_pretrained.pth` was pretrained by using this dataset: [Font Square](https://github.com/aimagelab/font_square)
## Generate Styled Handwritten Text Images
We added some utility to generate handwritten text images using the trained model. These are used as follows:
```bash
python generate.py [ACTION] --checkpoint files/vatrpp.pth
```
The following actions are available with their respective arguments.
### Custom Author
Generate the given text for a custom author.
```bash
text --text STRING # String to generate
--text-path PATH # Optional path to text file
--output PATH # Optional output location, default: files/output.png
--style-folder PATH # Optional style folder containing writer samples, default: 'files/style_samples/00'
```
Style samples for the author are needed. These can be automatically generated from an image of a page using `create_style_sample.py`.
```bash
python create_style_sample.py --input-image PATH # Path of the image to extract the style samples from.
--output-folder PATH # Folder where the style samples should be saved
```
### All Authors
Generate some text for all authors of IAM. The output is saved to `saved_images/author_samples/`
```bash
authors --test-set # Generate authors of test set, otherwise training set is generated
--checkpoint PATH # Checkpoint used to generate text, files/vatr.pth by default
--align # Detect the bottom lines for each word and align them
--at-once # Generate the whole sentence at once instead of word-by-word
--output-style # Also save the style images used to generate the words
```
### Evaluation Images
```bash
fid --target_dataset_path PATH # dataset file for which the test set will be generated
--dataset-path PATH # dataset file from which style samples will be taken, for example the attached punctuation
--output PATH # where to save the images, default is saved_images/fid
--checkpoint PATH # Checkpoint used to generate text, files/vatr.pth by default
--all-epochs # Generate evaluation images for all saved epochs available (checkpoint has to be a folder)
--fake-only # Only output fake images, no ground truth
--test-only # Only generate test set, not train set
--long-tail # Only generate words containing long tail characters
```