Submitted by akhaliq 52 MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? · 11 authors 3
Submitted by akhaliq 35 Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference · 6 authors 2
Submitted by akhaliq 25 AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks · 5 authors 1
Submitted by akhaliq 20 Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition · 6 authors 1
Submitted by akhaliq 16 GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation · 8 authors 2
Submitted by akhaliq 14 Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering · 2 authors 1
Submitted by akhaliq 10 StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN · 4 authors 1
Submitted by akhaliq 8 Recourse for reclamation: Chatting with generative language models · 4 authors 1