metadata

language:
  - en
tags:
  - image-generation
  - text-to-image
  - conditional-generation
  - generative-modeling
  - image-synthesis
  - image-manipulation
  - design-prototyping
  - research
  - educational
license: mit
metrics:
  - FID
  - KID
  - HWD
  - CER

Handwritten Text Generation from Visual Archetypes ++

This repository includes the code for training the VATr++ Styled Handwritten Text Generation model.

Installation

conda create --name vatr python=3.9
conda activate vatr
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
git clone https://github.com/aimagelab/VATr.git && cd VATr
pip install -r requirements.txt

This folder contains the regular IAM dataset IAM-32.pickle and the modified version with attached punctuation marks IAM-32-pa.pickle. The folder also contains the synthetically pretrained weights for the encoder resnet_18_pretrained.pth. Please download these files and place them into the files folder.

Training

To train the regular VATr model, use the following command. This uses the default settings from the paper.

python train.py

Useful arguments:

python train.py
        --feat_model_path PATH  # path to the pretrained resnet 18 checkpoint. By default this is the synthetically pretrained model
        --is_cycle              # use style cycle loss for training
        --dataset DATASET       # dataset to use. Default IAM
        --resume                # resume training from the last checkpoint with the same name
        --wandb                 # use wandb for logging

Use the following arguments to apply full VATr++ training

python train.py
        --d-crop-size 64 128          # Randomly crop input to discriminator to width 64 to 128
        --text-augment-strength 0.4   # Text augmentation for adding more rare characters
        --file-suffix pa              # Use the punctuation attached version of IAM
        --augment-ocr                 # Augment the real images used to train the OCR model

Pretraining dataset

The model resnet_18_pretrained.pth was pretrained by using this dataset: Font Square

Generate Styled Handwritten Text Images

We added some utility to generate handwritten text images using the trained model. These are used as follows:

python generate.py [ACTION] --checkpoint files/vatrpp.pth

The following actions are available with their respective arguments.

Custom Author

Generate the given text for a custom author.

text  --text STRING     # String to generate
      --text-path PATH  # Optional path to text file
      --output PATH     # Optional output location, default: files/output.png
      --style-folder PATH    # Optional style folder containing writer samples, default: 'files/style_samples/00'

Style samples for the author are needed. These can be automatically generated from an image of a page using create_style_sample.py.

python create_style_sample.py  --input-image PATH     # Path of the image to extract the style samples from.
                               --output-folder PATH   # Folder where the style samples should be saved

All Authors

Generate some text for all authors of IAM. The output is saved to saved_images/author_samples/

authors --test-set        # Generate authors of test set, otherwise training set is generated
        --checkpoint PATH # Checkpoint used to generate text, files/vatr.pth by default
        --align           # Detect the bottom lines for each word and align them
        --at-once         # Generate the whole sentence at once instead of word-by-word
        --output-style    # Also save the style images used to generate the words

Evaluation Images

fid --target_dataset_path PATH  # dataset file for which the test set will be generated
    --dataset-path PATH         # dataset file from which style samples will be taken, for example the attached punctuation
    --output PATH               # where to save the images, default is saved_images/fid
    --checkpoint PATH           # Checkpoint used to generate text, files/vatr.pth by default
    --all-epochs                # Generate evaluation images for all saved epochs available (checkpoint has to be a folder)
    --fake-only                 # Only output fake images, no ground truth
    --test-only                 # Only generate test set, not train set
    --long-tail                 # Only generate words containing long tail characters