ds4sd/SmolDocling-256M-preview · Layout detection mode?

prudant

12 days ago

•

edited 12 days ago

Can be used only to extract bboxes with layouts labels? and no text, i mean use the model for layout detection ?

prudant

9 days ago

u_u

reix2098

6 days ago

Hello Prudant,

It seems their intention is to integrate this model with the LayoutDetector as part of their broader project called Docling. While exploring their repositories, I came across docling-ibm-models, which I believe will eventually be merged into the main project, allowing us to extract text and categorize the layout labels at the same time.

In that repository, you can find an example demonstrating how to detect the layout of a page (bounding boxes with layout labels) without extracting text. The usage is straightforward:

python -m demo.demo_layout_predictor -i <input_dir> -v <viz_dir>

auerchristoph

Docling org 6 days ago

•

edited 6 days ago

Hello,
The SmolDocling model is already integrated in the docling project, please check our updated README. SmolDocling can be used as an alternative conversion path, which replaces all the other specific models we have in docling's standard pipeline. Using SmolDocling for layout analysis is possible but currently it won't be efficient for that, since the content is always produced as well. Training SmolDocling to output purely the structure tokens (including location) through a different query is part of our future plans and experimentation.

prudant

6 days ago

Thanks!

prudant changed discussion status to closed 6 days ago