HuggingFaceM4
/

idefics2-8b

HugoLaurencon commited on Apr 16, 2024

Commit

2c25217

verified ·

1 Parent(s): 59e3081

Comment on image splitting (#4)

- Comment on image splitting (d6e40c5cbeb2fbec8f0463afdd1cd937f1090c79)

Co-authored-by: Hugo Laurençon <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -223,6 +223,8 @@ Given the high resolution supported, the vision part of the model can be memory
 - **deactivate the image splitting.** To do so, add `do_image_splitting=False` when initializing the processor (`AutoProcessor.from_pretrained`). There are no changes required on the model side. Note that only the sft model has been trained with image splitting.
 - **decrease the maximum image resolution.** To do so, add `size= {"longest_edge": 448, "shortest_edge": 378}` when initializing the processor (`AutoProcessor.from_pretrained`). In particular, the `longest_edge` value can be adapted to fit the need. We recommend using values that are multiples of 14. There are no changes required on the model side.
 **Using Flash-attention 2 to speed up generation**
 <details><summary>Click to expand.</summary>

 - **deactivate the image splitting.** To do so, add `do_image_splitting=False` when initializing the processor (`AutoProcessor.from_pretrained`). There are no changes required on the model side. Note that only the sft model has been trained with image splitting.
 - **decrease the maximum image resolution.** To do so, add `size= {"longest_edge": 448, "shortest_edge": 378}` when initializing the processor (`AutoProcessor.from_pretrained`). In particular, the `longest_edge` value can be adapted to fit the need. We recommend using values that are multiples of 14. There are no changes required on the model side.
+`do_image_splitting=True` is especially needed to boost performance on OCR tasks where a very large image is used as input. For the regular VQA or captioning tasks, this argument can be safely set to `False` with minimal impact on performance (see the evaluation table above).
 **Using Flash-attention 2 to speed up generation**
 <details><summary>Click to expand.</summary>