-
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Paper • 2403.09626 • Published • 15 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 35 -
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Paper • 2403.13501 • Published • 9 -
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2403.10517
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 4 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 8 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2
-
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper • 2402.14289 • Published • 20 -
ImageBind: One Embedding Space To Bind Them All
Paper • 2305.05665 • Published • 5 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 180 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 3
-
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Paper • 2402.10986 • Published • 78 -
Aria Everyday Activities Dataset
Paper • 2402.13349 • Published • 31 -
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Paper • 2403.04132 • Published • 39 -
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 80
-
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 38 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 80 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48
-
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
Paper • 2401.12474 • Published • 36 -
More Agents Is All You Need
Paper • 2402.05120 • Published • 53 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 35 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 117