vatrpp

File size: 4,717 Bytes

---
language:
  - en
tags:
    - image-generation
    - text-to-image
    - conditional-generation
    - generative-modeling
    - image-synthesis
    - image-manipulation
    - design-prototyping
    - research
    - educational
license: mit
metrics:
  - FID
  - KID
  - HWD
  - CER
---

# Handwritten Text Generation from Visual Archetypes ++

This repository includes the code for training the VATr++ Styled Handwritten Text Generation model.

## Installation

```bash
conda create --name vatr python=3.9
conda activate vatr
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
git clone https://github.com/aimagelab/VATr.git && cd VATr
pip install -r requirements.txt
```

[This folder](https://drive.google.com/drive/folders/13rJhjl7VsyiXlPTBvnp1EKkKEhckLalr?usp=sharing) contains the regular IAM dataset `IAM-32.pickle` and the modified version with attached punctuation marks `IAM-32-pa.pickle`.
The folder also contains the synthetically pretrained weights for the encoder `resnet_18_pretrained.pth`.
Please download these files and place them into the `files` folder.

## Training

To train the regular VATr model, use the following command. This uses the default settings from the paper.

```bash
python train.py
```

Useful arguments:
```bash
python train.py
        --feat_model_path PATH  # path to the pretrained resnet 18 checkpoint. By default this is the synthetically pretrained model
        --is_cycle              # use style cycle loss for training
        --dataset DATASET       # dataset to use. Default IAM
        --resume                # resume training from the last checkpoint with the same name
        --wandb                 # use wandb for logging
```

Use the following arguments to apply full VATr++ training
```bash
python train.py
        --d-crop-size 64 128          # Randomly crop input to discriminator to width 64 to 128
        --text-augment-strength 0.4   # Text augmentation for adding more rare characters
        --file-suffix pa              # Use the punctuation attached version of IAM
        --augment-ocr                 # Augment the real images used to train the OCR model
```

### Pretraining dataset
The model `resnet_18_pretrained.pth` was pretrained by using this dataset: [Font Square](https://github.com/aimagelab/font_square)


## Generate Styled Handwritten Text Images

We added some utility to generate handwritten text images using the trained model. These are used as follows:

```bash
python generate.py [ACTION] --checkpoint files/vatrpp.pth
```

The following actions are available with their respective arguments.

### Custom Author

Generate the given text for a custom author.

```bash
text  --text STRING     # String to generate
      --text-path PATH  # Optional path to text file
      --output PATH     # Optional output location, default: files/output.png
      --style-folder PATH    # Optional style folder containing writer samples, default: 'files/style_samples/00'
```
Style samples for the author are needed. These can be automatically generated from an image of a page using `create_style_sample.py`.
```bash
python create_style_sample.py  --input-image PATH     # Path of the image to extract the style samples from.
                               --output-folder PATH   # Folder where the style samples should be saved
```

### All Authors

Generate some text for all authors of IAM. The output is saved to `saved_images/author_samples/`

```bash
authors --test-set        # Generate authors of test set, otherwise training set is generated
        --checkpoint PATH # Checkpoint used to generate text, files/vatr.pth by default
        --align           # Detect the bottom lines for each word and align them
        --at-once         # Generate the whole sentence at once instead of word-by-word
        --output-style    # Also save the style images used to generate the words
```

### Evaluation Images

```bash
fid --target_dataset_path PATH  # dataset file for which the test set will be generated
    --dataset-path PATH         # dataset file from which style samples will be taken, for example the attached punctuation
    --output PATH               # where to save the images, default is saved_images/fid
    --checkpoint PATH           # Checkpoint used to generate text, files/vatr.pth by default
    --all-epochs                # Generate evaluation images for all saved epochs available (checkpoint has to be a folder)
    --fake-only                 # Only output fake images, no ground truth
    --test-only                 # Only generate test set, not train set
    --long-tail                 # Only generate words containing long tail characters
```