Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2212.02623

Unifying Vision, Text, and Layout for Universal Document Processing

Paper • 2212.02623 • Published Dec 5, 2022 • 10
DocVQA: A Dataset for VQA on Document Images

Paper • 2007.00398 • Published Jul 1, 2020 • 2
microsoft/udop-large

Image-Text-to-Text • Updated Mar 11, 2024 • 5.96k • 111

Papers - Image - Segmentation - Adversarial

Generalizability vs. Robustness: Adversarial Examples for Medical Imaging

Paper • 1804.00504 • Published Mar 23, 2018 • 2
Unifying Vision, Text, and Layout for Universal Document Processing

Paper • 2212.02623 • Published Dec 5, 2022 • 10

Papers - Image - Segmentation

Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images

Paper • 2310.16186 • Published Oct 24, 2023 • 2
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes

Paper • 1709.07330 • Published Sep 21, 2017 • 2
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans

Paper • 1801.08599 • Published Jan 25, 2018 • 2
RTSeg: Real-time Semantic Segmentation Comparative Study

Paper • 1803.02758 • Published Mar 7, 2018 • 2

Papers - Multimodal - Document Analysis

Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents

Paper • 2310.16527 • Published Oct 25, 2023 • 2
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
Unifying Vision, Text, and Layout for Universal Document Processing

Paper • 2212.02623 • Published Dec 5, 2022 • 10

Papers - Image - OCR Handwriting

Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks

Paper • 2311.17128 • Published Nov 28, 2023 • 2
Data Generation for Post-OCR correction of Cyrillic handwriting

Paper • 2311.15896 • Published Nov 27, 2023 • 3
An End-to-End OCR Framework for Robust Arabic-Handwriting Recognition using a Novel Transformers-based Model and an Innovative 270 Million-Words Multi-Font Corpus of Classical Arabic with Diacritics

Paper • 2208.11484 • Published Aug 20, 2022 • 3
Transformer based Urdu Handwritten Text Optical Character Reader

Paper • 2206.04575 • Published Jun 9, 2022 • 2

Papers - Image - Bounding Box

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
Unifying Vision, Text, and Layout for Universal Document Processing

Paper • 2212.02623 • Published Dec 5, 2022 • 10
Grounded Language-Image Pre-training

Paper • 2112.03857 • Published Dec 7, 2021 • 3
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model

Paper • 2404.07773 • Published Apr 11, 2024 • 1

Papers - Multimodal

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22, 2024 • 19
ImageBind: One Embedding Space To Bind Them All

Paper • 2305.05665 • Published May 9, 2023 • 5
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Paper • 2206.02770 • Published Jun 6, 2022 • 3

Document-Extraction

Unifying Vision, Text, and Layout for Universal Document Processing

Paper • 2212.02623 • Published Dec 5, 2022 • 10

UDOP is a general multimodal model for document AI

Unifying Vision, Text, and Layout for Universal Document Processing

Paper • 2212.02623 • Published Dec 5, 2022 • 10
microsoft/udop-large

Image-Text-to-Text • Updated Mar 11, 2024 • 5.96k • 111
microsoft/udop-large-512

Image-Text-to-Text • Updated Mar 11, 2024 • 168 • 5
microsoft/udop-large-512-300k

Image-Text-to-Text • Updated Mar 11, 2024 • 1.16k • 31

DocGraphLM: Documental Graph Language Model for Information Extraction

Paper • 2401.02823 • Published Jan 5, 2024 • 36
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4, 2024 • 63
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Paper • 2309.01131 • Published Sep 3, 2023 • 1

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs