ReferEverything: Towards Segmenting Everything We Can Speak of in Videos Paper • 2410.23287 • Published Oct 30, 2024 • 19
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Paper • 2409.03757 • Published Sep 5, 2024 • 2
Situational Awareness Matters in 3D Vision Language Reasoning Paper • 2406.07544 • Published Jun 11, 2024 • 1
Frozen Transformers in Language Models Are Effective Visual Encoder Layers Paper • 2310.12973 • Published Oct 19, 2023 • 1
Floating No More: Object-Ground Reconstruction from a Single Image Paper • 2407.18914 • Published Jul 26, 2024 • 20