Collections
Discover the best community collections!
Collections including paper arxiv:2212.02623
-
Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images
Paper • 2310.16186 • Published • 2 -
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes
Paper • 1709.07330 • Published • 2 -
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans
Paper • 1801.08599 • Published • 2 -
RTSeg: Real-time Semantic Segmentation Comparative Study
Paper • 1803.02758 • Published • 2
-
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
Unifying Vision, Text, and Layout for Universal Document Processing
Paper • 2212.02623 • Published • 10
-
Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks
Paper • 2311.17128 • Published • 2 -
Data Generation for Post-OCR correction of Cyrillic handwriting
Paper • 2311.15896 • Published • 3 -
An End-to-End OCR Framework for Robust Arabic-Handwriting Recognition using a Novel Transformers-based Model and an Innovative 270 Million-Words Multi-Font Corpus of Classical Arabic with Diacritics
Paper • 2208.11484 • Published • 3 -
Transformer based Urdu Handwritten Text Optical Character Reader
Paper • 2206.04575 • Published • 2
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
Unifying Vision, Text, and Layout for Universal Document Processing
Paper • 2212.02623 • Published • 10 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3 -
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model
Paper • 2404.07773 • Published • 1
-
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper • 2402.14289 • Published • 19 -
ImageBind: One Embedding Space To Bind Them All
Paper • 2305.05665 • Published • 5 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 3
-
Unifying Vision, Text, and Layout for Universal Document Processing
Paper • 2212.02623 • Published • 10 -
microsoft/udop-large
Image-Text-to-Text • Updated • 5.96k • 111 -
microsoft/udop-large-512
Image-Text-to-Text • Updated • 168 • 5 -
microsoft/udop-large-512-300k
Image-Text-to-Text • Updated • 1.16k • 31
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 63 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1