LayoutXLM

Multimodal (text + layout/format + image) pre-training for document AI

LayoutXLM is a multilingual variant of LayoutLMv2.

The documentation of this model in the Transformers library can be found here.

Introduction

LayoutXLM is a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding. Experiment results show that it has significantly outperformed the existing SOTA cross-lingual pre-trained models on the XFUND dataset.

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei, arXiv Preprint 2021

Downloads last month: 6,626

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for microsoft/layoutxlm-base

Finetunes

22 models

Collection including microsoft/layoutxlm-base

LayoutLM

Collection

The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. • 6 items • Updated May 1, 2025 • 19

Paper for microsoft/layoutxlm-base

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Paper • 2104.08836 • Published Apr 18, 2021