ByteTrack: Multi-Object Tracking by Associating Every Detection Box Paper • 2110.06864 • Published Oct 13, 2021
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion Paper • 2111.14690 • Published Nov 29, 2021
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model Paper • 2407.07577 • Published Jul 10, 2024
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals Paper • 2011.12450 • Published Nov 25, 2020
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM Paper • 2412.15156 • Published Dec 19, 2024
Language as Queries for Referring Video Object Segmentation Paper • 2201.00487 • Published Jan 3, 2022
Perception Encoder: The best visual embeddings are not at the output of the network Paper • 2504.13181 • Published Apr 17 • 35
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding Paper • 2504.13180 • Published Apr 17 • 18
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation Paper • 2502.05179 • Published Feb 7 • 24
Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment Paper • 2410.09347 • Published Oct 12, 2024 • 5
ControlAR: Controllable Image Generation with Autoregressive Models Paper • 2410.02705 • Published Oct 3, 2024 • 11
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10, 2024 • 72
Semantic-SAM: Segment and Recognize Anything at Any Granularity Paper • 2307.04767 • Published Jul 10, 2023 • 22
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest Paper • 2307.03601 • Published Jul 7, 2023 • 12