Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.06242

Computer Vision

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 91
FMViT: A multiple-frequency mixing Vision Transformer

Paper • 2311.05707 • Published Nov 9, 2023 • 9
PolyMaX: General Dense Prediction with Mask Transformer

Paper • 2311.05770 • Published Nov 9, 2023 • 11

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 91

Video Understanding

MM-VID: Advancing Video Understanding with GPT-4V(ision)

Paper • 2310.19773 • Published Oct 30, 2023 • 20
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

Paper • 2310.05863 • Published Oct 9, 2023 • 1
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 91
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

Paper • 2311.10126 • Published Nov 16, 2023 • 10

Inject Semantic Concepts into Image Tagging for Open-Set Recognition

Paper • 2310.15200 • Published Oct 23, 2023 • 6
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 91

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

Paper • 2311.05698 • Published Nov 9, 2023 • 14
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 91
PolyMaX: General Dense Prediction with Mask Transformer

Paper • 2311.05770 • Published Nov 9, 2023 • 11

Previous
1
...
3
4
5
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs