Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.01704

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83

Computer Vision

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published Sep 3, 2024 • 36
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83
CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation

Paper • 2409.03643 • Published Sep 5, 2024 • 19
UniDet3D: Multi-dataset Indoor 3D Object Detection

Paper • 2409.04234 • Published Sep 6, 2024 • 8

Vision-Language

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19, 2024 • 43
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

Paper • 2407.04172 • Published Jul 4, 2024 • 23
facebook/chameleon-7b

Image-Text-to-Text • Updated Jul 23, 2024 • 22.7k • 173
vidore/colpali

Updated 7 days ago • 47.3k • 418

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24, 2024 • 60
Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2, 2024 • 22
LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 51
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 125

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87
SHIC: Shape-Image Correspondences with no Keypoint Supervision

Paper • 2407.18907 • Published Jul 26, 2024 • 41
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83

databricks/dbrx-instruct

Text Generation • Updated Apr 19, 2024 • 8.22k • • 1.11k
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83

Image Processing

briaai/RMBG-1.4

Image Segmentation • Updated Jan 7 • 605k • 1.7k
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14, 2024 • 55
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19, 2024 • 25
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 69
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 76

LocalMamba: Visual State Space Model with Windowed Selective Scan

Paper • 2403.09338 • Published Mar 14, 2024 • 8
GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14, 2024 • 26
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29, 2024 • 33
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 28

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Paper • 2402.11131 • Published Feb 16, 2024 • 43
Generative Representational Instruction Tuning

Paper • 2402.09906 • Published Feb 15, 2024 • 54
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 105
BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15, 2024 • 20

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs