Transformers
PyTorch
layoutlmv2
Inference Endpoints

LayoutXLM

Multimodal (text + layout/format + image) pre-training for document AI

LayoutXLM is a multilingual variant of LayoutLMv2.

The documentation of this model in the Transformers library can be found here.

Microsoft Document AI | GitHub

Introduction

LayoutXLM is a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding. Experiment results show that it has significantly outperformed the existing SOTA cross-lingual pre-trained models on the XFUND dataset.

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei, arXiv Preprint 2021

Downloads last month
30,129
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Model tree for microsoft/layoutxlm-base

Finetunes
19 models

Collection including microsoft/layoutxlm-base