Comment on image splitting (#4)
Browse files- Comment on image splitting (d6e40c5cbeb2fbec8f0463afdd1cd937f1090c79)
Co-authored-by: Hugo Laurençon <[email protected]>
README.md
CHANGED
|
@@ -223,6 +223,8 @@ Given the high resolution supported, the vision part of the model can be memory
|
|
| 223 |
- **deactivate the image splitting.** To do so, add `do_image_splitting=False` when initializing the processor (`AutoProcessor.from_pretrained`). There are no changes required on the model side. Note that only the sft model has been trained with image splitting.
|
| 224 |
- **decrease the maximum image resolution.** To do so, add `size= {"longest_edge": 448, "shortest_edge": 378}` when initializing the processor (`AutoProcessor.from_pretrained`). In particular, the `longest_edge` value can be adapted to fit the need. We recommend using values that are multiples of 14. There are no changes required on the model side.
|
| 225 |
|
|
|
|
|
|
|
| 226 |
**Using Flash-attention 2 to speed up generation**
|
| 227 |
|
| 228 |
<details><summary>Click to expand.</summary>
|
|
|
|
| 223 |
- **deactivate the image splitting.** To do so, add `do_image_splitting=False` when initializing the processor (`AutoProcessor.from_pretrained`). There are no changes required on the model side. Note that only the sft model has been trained with image splitting.
|
| 224 |
- **decrease the maximum image resolution.** To do so, add `size= {"longest_edge": 448, "shortest_edge": 378}` when initializing the processor (`AutoProcessor.from_pretrained`). In particular, the `longest_edge` value can be adapted to fit the need. We recommend using values that are multiples of 14. There are no changes required on the model side.
|
| 225 |
|
| 226 |
+
`do_image_splitting=True` is especially needed to boost performance on OCR tasks where a very large image is used as input. For the regular VQA or captioning tasks, this argument can be safely set to `False` with minimal impact on performance (see the evaluation table above).
|
| 227 |
+
|
| 228 |
**Using Flash-attention 2 to speed up generation**
|
| 229 |
|
| 230 |
<details><summary>Click to expand.</summary>
|