--- language: - en tags: - image-generation - text-to-image - conditional-generation - generative-modeling - image-synthesis - image-manipulation - design-prototyping - research - educational license: mit metrics: - FID - KID - HWD - CER --- # Handwritten Text Generation from Visual Archetypes ++ This repository includes the code for training the VATr++ Styled Handwritten Text Generation model. ## Installation ```bash conda create --name vatr python=3.9 conda activate vatr conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia git clone https://github.com/aimagelab/VATr.git && cd VATr pip install -r requirements.txt ``` [This folder](https://drive.google.com/drive/folders/13rJhjl7VsyiXlPTBvnp1EKkKEhckLalr?usp=sharing) contains the regular IAM dataset `IAM-32.pickle` and the modified version with attached punctuation marks `IAM-32-pa.pickle`. The folder also contains the synthetically pretrained weights for the encoder `resnet_18_pretrained.pth`. Please download these files and place them into the `files` folder. ## Training To train the regular VATr model, use the following command. This uses the default settings from the paper. ```bash python train.py ``` Useful arguments: ```bash python train.py --feat_model_path PATH # path to the pretrained resnet 18 checkpoint. By default this is the synthetically pretrained model --is_cycle # use style cycle loss for training --dataset DATASET # dataset to use. Default IAM --resume # resume training from the last checkpoint with the same name --wandb # use wandb for logging ``` Use the following arguments to apply full VATr++ training ```bash python train.py --d-crop-size 64 128 # Randomly crop input to discriminator to width 64 to 128 --text-augment-strength 0.4 # Text augmentation for adding more rare characters --file-suffix pa # Use the punctuation attached version of IAM --augment-ocr # Augment the real images used to train the OCR model ``` ### Pretraining dataset The model `resnet_18_pretrained.pth` was pretrained by using this dataset: [Font Square](https://github.com/aimagelab/font_square) ## Generate Styled Handwritten Text Images We added some utility to generate handwritten text images using the trained model. These are used as follows: ```bash python generate.py [ACTION] --checkpoint files/vatrpp.pth ``` The following actions are available with their respective arguments. ### Custom Author Generate the given text for a custom author. ```bash text --text STRING # String to generate --text-path PATH # Optional path to text file --output PATH # Optional output location, default: files/output.png --style-folder PATH # Optional style folder containing writer samples, default: 'files/style_samples/00' ``` Style samples for the author are needed. These can be automatically generated from an image of a page using `create_style_sample.py`. ```bash python create_style_sample.py --input-image PATH # Path of the image to extract the style samples from. --output-folder PATH # Folder where the style samples should be saved ``` ### All Authors Generate some text for all authors of IAM. The output is saved to `saved_images/author_samples/` ```bash authors --test-set # Generate authors of test set, otherwise training set is generated --checkpoint PATH # Checkpoint used to generate text, files/vatr.pth by default --align # Detect the bottom lines for each word and align them --at-once # Generate the whole sentence at once instead of word-by-word --output-style # Also save the style images used to generate the words ``` ### Evaluation Images ```bash fid --target_dataset_path PATH # dataset file for which the test set will be generated --dataset-path PATH # dataset file from which style samples will be taken, for example the attached punctuation --output PATH # where to save the images, default is saved_images/fid --checkpoint PATH # Checkpoint used to generate text, files/vatr.pth by default --all-epochs # Generate evaluation images for all saved epochs available (checkpoint has to be a folder) --fake-only # Only output fake images, no ground truth --test-only # Only generate test set, not train set --long-tail # Only generate words containing long tail characters ```