vatrpp / README.md

vittoriopippi

Edit README.md

4506963 14 days ago

4.72 kB

	---
	language:
	- en
	tags:
	- image-generation
	- text-to-image
	- conditional-generation
	- generative-modeling
	- image-synthesis
	- image-manipulation
	- design-prototyping
	- research
	- educational
	license: mit
	metrics:
	- FID
	- KID
	- HWD
	- CER
	---

	# Handwritten Text Generation from Visual Archetypes ++

	This repository includes the code for training the VATr++ Styled Handwritten Text Generation model.

	## Installation

	```bash
	conda create --name vatr python=3.9
	conda activate vatr
	conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
	git clone https://github.com/aimagelab/VATr.git && cd VATr
	pip install -r requirements.txt
	```

	[This folder](https://drive.google.com/drive/folders/13rJhjl7VsyiXlPTBvnp1EKkKEhckLalr?usp=sharing) contains the regular IAM dataset `IAM-32.pickle` and the modified version with attached punctuation marks `IAM-32-pa.pickle`.
	The folder also contains the synthetically pretrained weights for the encoder `resnet_18_pretrained.pth`.
	Please download these files and place them into the `files` folder.

	## Training

	To train the regular VATr model, use the following command. This uses the default settings from the paper.

	```bash
	python train.py
	```

	Useful arguments:
	```bash
	python train.py
	--feat_model_path PATH # path to the pretrained resnet 18 checkpoint. By default this is the synthetically pretrained model
	--is_cycle # use style cycle loss for training
	--dataset DATASET # dataset to use. Default IAM
	--resume # resume training from the last checkpoint with the same name
	--wandb # use wandb for logging
	```

	Use the following arguments to apply full VATr++ training
	```bash
	python train.py
	--d-crop-size 64 128 # Randomly crop input to discriminator to width 64 to 128
	--text-augment-strength 0.4 # Text augmentation for adding more rare characters
	--file-suffix pa # Use the punctuation attached version of IAM
	--augment-ocr # Augment the real images used to train the OCR model
	```

	### Pretraining dataset
	The model `resnet_18_pretrained.pth` was pretrained by using this dataset: [Font Square](https://github.com/aimagelab/font_square)


	## Generate Styled Handwritten Text Images

	We added some utility to generate handwritten text images using the trained model. These are used as follows:

	```bash
	python generate.py [ACTION] --checkpoint files/vatrpp.pth
	```

	The following actions are available with their respective arguments.

	### Custom Author

	Generate the given text for a custom author.

	```bash
	text --text STRING # String to generate
	--text-path PATH # Optional path to text file
	--output PATH # Optional output location, default: files/output.png
	--style-folder PATH # Optional style folder containing writer samples, default: 'files/style_samples/00'
	```
	Style samples for the author are needed. These can be automatically generated from an image of a page using `create_style_sample.py`.
	```bash
	python create_style_sample.py --input-image PATH # Path of the image to extract the style samples from.
	--output-folder PATH # Folder where the style samples should be saved
	```

	### All Authors

	Generate some text for all authors of IAM. The output is saved to `saved_images/author_samples/`

	```bash
	authors --test-set # Generate authors of test set, otherwise training set is generated
	--checkpoint PATH # Checkpoint used to generate text, files/vatr.pth by default
	--align # Detect the bottom lines for each word and align them
	--at-once # Generate the whole sentence at once instead of word-by-word
	--output-style # Also save the style images used to generate the words
	```

	### Evaluation Images

	```bash
	fid --target_dataset_path PATH # dataset file for which the test set will be generated
	--dataset-path PATH # dataset file from which style samples will be taken, for example the attached punctuation
	--output PATH # where to save the images, default is saved_images/fid
	--checkpoint PATH # Checkpoint used to generate text, files/vatr.pth by default
	--all-epochs # Generate evaluation images for all saved epochs available (checkpoint has to be a folder)
	--fake-only # Only output fake images, no ground truth
	--test-only # Only generate test set, not train set
	--long-tail # Only generate words containing long tail characters
	```