vittoriopippi commited on
Commit
fa25a23
·
1 Parent(s): 6c352fc

Edit README.md

Browse files
Files changed (1) hide show
  1. README.md +148 -95
README.md CHANGED
@@ -1,120 +1,173 @@
1
- ---
2
- language:
3
- - en
4
- tags:
5
- - image-generation
6
- - text-to-image
7
- - conditional-generation
8
- - generative-modeling
9
- - image-synthesis
10
- - image-manipulation
11
- - design-prototyping
12
- - research
13
- - educational
14
- license: mit
15
- metrics:
16
- - FID
17
- - KID
18
- - HWD
19
- - CER
20
- ---
21
 
22
- # Handwritten Text Generation from Visual Archetypes ++
 
 
 
23
 
24
- This repository includes the code for training the VATr++ Styled Handwritten Text Generation model.
 
 
25
 
26
  ## Installation
27
 
28
- ```bash
29
- conda create --name vatr python=3.9
30
- conda activate vatr
31
- conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
32
- git clone https://github.com/aimagelab/VATr.git && cd VATr
33
- pip install -r requirements.txt
34
- ```
35
 
36
- [This folder](https://drive.google.com/drive/folders/13rJhjl7VsyiXlPTBvnp1EKkKEhckLalr?usp=sharing) contains the regular IAM dataset `IAM-32.pickle` and the modified version with attached punctuation marks `IAM-32-pa.pickle`.
37
- The folder also contains the synthetically pretrained weights for the encoder `resnet_18_pretrained.pth`.
38
- Please download these files and place them into the `files` folder.
 
39
 
40
- ## Training
 
 
 
 
41
 
42
- To train the regular VATr model, use the following command. This uses the default settings from the paper.
43
 
44
- ```bash
45
- python train.py
46
- ```
47
 
48
- Useful arguments:
49
- ```bash
50
- python train.py
51
- --feat_model_path PATH # path to the pretrained resnet 18 checkpoint. By default this is the synthetically pretrained model
52
- --is_cycle # use style cycle loss for training
53
- --dataset DATASET # dataset to use. Default IAM
54
- --resume # resume training from the last checkpoint with the same name
55
- --wandb # use wandb for logging
56
- ```
57
 
58
- Use the following arguments to apply full VATr++ training
59
- ```bash
60
- python train.py
61
- --d-crop-size 64 128 # Randomly crop input to discriminator to width 64 to 128
62
- --text-augment-strength 0.4 # Text augmentation for adding more rare characters
63
- --file-suffix pa # Use the punctuation attached version of IAM
64
- --augment-ocr # Augment the real images used to train the OCR model
65
  ```
66
 
67
- ### Pretraining dataset
68
- The model `resnet_18_pretrained.pth` was pretrained by using this dataset: [Font Square](https://github.com/aimagelab/font_square)
 
 
 
 
 
 
 
69
 
 
70
 
71
- ## Generate Styled Handwritten Text Images
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
- We added some utility to generate handwritten text images using the trained model. These are used as follows:
 
 
74
 
75
- ```bash
76
- python generate.py [ACTION] --checkpoint files/vatrpp.pth
77
- ```
78
 
79
- The following actions are available with their respective arguments.
80
 
81
- ### Custom Author
 
 
 
82
 
83
- Generate the given text for a custom author.
84
 
85
- ```bash
86
- text --text STRING # String to generate
87
- --text-path PATH # Optional path to text file
88
- --output PATH # Optional output location, default: files/output.png
89
- --style-folder PATH # Optional style folder containing writer samples, default: 'files/style_samples/00'
90
- ```
91
- Style samples for the author are needed. These can be automatically generated from an image of a page using `create_style_sample.py`.
92
- ```bash
93
- python create_style_sample.py --input-image PATH # Path of the image to extract the style samples from.
94
- --output-folder PATH # Folder where the style samples should be saved
95
- ```
96
 
97
- ### All Authors
98
 
99
- Generate some text for all authors of IAM. The output is saved to `saved_images/author_samples/`
 
 
100
 
101
- ```bash
102
- authors --test-set # Generate authors of test set, otherwise training set is generated
103
- --checkpoint PATH # Checkpoint used to generate text, files/vatr.pth by default
104
- --align # Detect the bottom lines for each word and align them
105
- --at-once # Generate the whole sentence at once instead of word-by-word
106
- --output-style # Also save the style images used to generate the words
107
- ```
108
 
109
- ### Evaluation Images
110
-
111
- ```bash
112
- fid --target_dataset_path PATH # dataset file for which the test set will be generated
113
- --dataset-path PATH # dataset file from which style samples will be taken, for example the attached punctuation
114
- --output PATH # where to save the images, default is saved_images/fid
115
- --checkpoint PATH # Checkpoint used to generate text, files/vatr.pth by default
116
- --all-epochs # Generate evaluation images for all saved epochs available (checkpoint has to be a folder)
117
- --fake-only # Only output fake images, no ground truth
118
- --test-only # Only generate test set, not train set
119
- --long-tail # Only generate words containing long tail characters
120
- ```
 
1
+ # VATr++ (Hugging Face Version)
2
+
3
+ This is a re-upload of the **VATr++** styled handwritten text generation model to the Hugging Face Model Hub. The original code and more detailed documentation can be found in the [VATr-pp GitHub repository](https://github.com/EDM-Research/VATr-pp).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
+ > **Note**: Please refer to the original repo for:
6
+ > - Full training instructions
7
+ > - In-depth code details
8
+ > - Extended usage and references
9
 
10
+ This Hugging Face version allows you to directly load the **VATr++** model with `AutoModel.from_pretrained(...)` and use it in your pipelines or scripts without manually handling checkpoints. The usage differs slightly from the original GitHub repository, primarily because we leverage Hugging Face’s `transformers` interface here.
11
+
12
+ ---
13
 
14
  ## Installation
15
 
16
+ 1. **Create a conda environment (recommended)**:
17
+ ```bash
18
+ conda create --name vatr python=3.9
19
+ conda activate vatr
20
+ ```
 
 
21
 
22
+ 2. **Install PyTorch and CUDA (if available)**:
23
+ ```bash
24
+ conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
25
+ ```
26
 
27
+ 3. **Install additional requirements** (including `transformers`, `Pillow`, `numpy`, etc.):
28
+ ```bash
29
+ pip install transformers Pillow numpy
30
+ ```
31
+ *You may need to adjust or add libraries based on your specific environment needs.*
32
 
33
+ ---
34
 
35
+ ## Loading the Model
 
 
36
 
37
+ #### **VATr++**
38
+ To load the **VATr++** version:
39
+ ```python
40
+ from transformers import AutoModel
 
 
 
 
 
41
 
42
+ model_vatr_pp = AutoModel.from_pretrained(
43
+ "blowing-up-groundhogs/vatrpp",
44
+ trust_remote_code=True
45
+ )
 
 
 
46
  ```
47
 
48
+ #### **VATr (original)**
49
+ To load the **original VATr** model (instead of VATr++), specify the `subfolder` argument:
50
+ ```python
51
+ model_vatr = AutoModel.from_pretrained(
52
+ "blowing-up-groundhogs/vatrpp",
53
+ subfolder="vatr",
54
+ trust_remote_code=True
55
+ )
56
+ ```
57
 
58
+ ---
59
 
60
+ ## Usage (Inference Example)
61
+
62
+ Below is a **minimal** usage example that demonstrates how to:
63
+
64
+ 1. Load the VATr++ model from the Hugging Face Hub.
65
+ 2. Preprocess a style image (an image of handwriting).
66
+ 3. Generate a new handwritten line of text in the style of the provided image.
67
+
68
+ > **Important**: This model requires `trust_remote_code=True` to properly load its custom generation logic.
69
+
70
+ ```python
71
+ import numpy as np
72
+ from PIL import Image
73
+ import torch
74
+ from torchvision import transforms as T
75
+ from transformers import AutoModel
76
+
77
+ # 1. Load the model (VATr++)
78
+ model = AutoModel.from_pretrained("blowing-up-groundhogs/vatrpp", trust_remote_code=True)
79
+
80
+ # 2. Helper functions to load and process style images
81
+ def load_image(img, chunk_width=192):
82
+ # Convert to grayscale and resize to height 32
83
+ img = img.convert("L")
84
+ img = img.resize((img.width * 32 // img.height, 32))
85
+ arr = np.array(img)
86
+
87
+ # Setup transforms: invert + normalize
88
+ transform = T.Compose([
89
+ T.Grayscale(num_output_channels=1),
90
+ T.ToTensor(),
91
+ T.Normalize((0.5,), (0.5,))
92
+ ])
93
+
94
+ # Pad / chunk the image to a fixed width
95
+ arr = 255 - arr
96
+ height, width = arr.shape
97
+ out = np.zeros((height, chunk_width), dtype="float32")
98
+ out[:, :width] = arr[:, :chunk_width]
99
+ out = 255 - out
100
+
101
+ # Apply transforms
102
+ out = transform(Image.fromarray(out.astype(np.uint8)))
103
+ return out, width
104
+
105
+ def load_image_line(img, chunk_width=192, style_imgs_count=15):
106
+ # Convert to grayscale and resize
107
+ img = img.convert("L")
108
+ img = img.resize((img.width * 32 // img.height, 32))
109
+ arr = np.array(img)
110
+
111
+ # Split into fixed-width chunks
112
+ chunks = []
113
+ for start in range(0, arr.shape[1], chunk_width):
114
+ chunk = arr[:, start:start+chunk_width]
115
+ chunks.append(chunk)
116
+
117
+ # Transform each chunk
118
+ transformed = []
119
+ for c in chunks:
120
+ t, _ = load_image(Image.fromarray(c), chunk_width)
121
+ transformed.append(t)
122
+
123
+ # If fewer than `style_imgs_count` chunks, repeat them
124
+ while len(transformed) < style_imgs_count:
125
+ transformed += transformed
126
+ transformed = transformed[:style_imgs_count]
127
+
128
+ # Combine
129
+ return torch.cat(transformed, 0)
130
+
131
+ # 3. Load a style image of your handwriting (or any handwriting sample)
132
+ style_image_path = "path/to/your_style_image.png"
133
+ img = Image.open(style_image_path)
134
+ style_imgs = load_image_line(img)
135
+
136
+ # 4. Generate text in the style of `style_image_path`
137
+ generated_pil_image = model.generate(
138
+ gen_text="This is a test", # Text to generate
139
+ style_imgs=style_imgs, # Preprocessed style chunks
140
+ align_words=True, # Align words at baseline
141
+ at_once=True, # Generate line at once
142
+ )
143
+
144
+ # 5. Save the generated image
145
+ generated_pil_image.save("generated_output.png")
146
+ ```
147
 
148
+ - **`style_imgs`**: A batch of fixed-width image chunks from your style reference. In practice, you can supply multiple small style samples or a single line image split into chunks.
149
+ - **`gen_text`**: The text to render in the given style.
150
+ - **`align_words`** and **`at_once`**: Optional arguments that control how the text is laid out and generated.
151
 
152
+ ---
 
 
153
 
154
+ ## Original Repository
155
 
156
+ This model is built upon the code from [**EDM-Research/VATr-pp**](https://github.com/EDM-Research/VATr-pp), which is itself an improvement on the [VATr](https://github.com/aimagelab/VATr) project. If you need to:
157
+ - Train your own model from scratch
158
+ - Explore advanced features (like style cycle loss, punctuation modes, or advanced augmentation)
159
+ - Examine experimental details or replicate the original paper's setup
160
 
161
+ Please visit the original GitHub repos for comprehensive documentation and support files.
162
 
163
+ ---
 
 
 
 
 
 
 
 
 
 
164
 
165
+ ## License and Acknowledgments
166
 
167
+ - The original code and model are under the license found in [the GitHub repository](https://github.com/EDM-Research/VATr-pp).
168
+ - All credit goes to the original authors and maintainers for creating VATr++ and releasing it openly.
169
+ - This Hugging Face re-upload is merely intended to **simplify inference** and **model sharing**; no changes have been made to the core training code or conceptual pipeline.
170
 
171
+ ---
 
 
 
 
 
 
172
 
173
+ **Enjoy generating styled handwritten text!** For any issues specific to this Hugging Face version, feel free to open an issue or pull request here. Otherwise, for deeper technical questions, please consult the original repository or its authors.