Collections
Discover the best community collections!
Collections including paper arxiv:2508.14879
-
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Paper • 2508.14879 • Published • 63 -
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Paper • 2508.19247 • Published • 38 -
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels
Paper • 2508.17437 • Published • 33 -
Multi-View 3D Point Tracking
Paper • 2508.21060 • Published • 18
-
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes
Paper • 2411.00771 • Published • 9 -
SynCity: Training-Free Generation of 3D Worlds
Paper • 2503.16420 • Published • 26 -
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities
Paper • 2501.08983 • Published • 20 -
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion
Paper • 2407.13759 • Published • 18
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 35 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 28 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 127 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 23
-
Describe Anything: Detailed Localized Image and Video Captioning
Paper • 2504.16072 • Published • 63 -
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
Paper • 2410.09604 • Published -
Geospatial Mechanistic Interpretability of Large Language Models
Paper • 2505.03368 • Published • 10 -
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Paper • 2505.02836 • Published • 7
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 66 -
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Paper • 2404.00399 • Published • 43 -
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 94 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 71
-
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Paper • 2508.14879 • Published • 63 -
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Paper • 2508.19247 • Published • 38 -
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels
Paper • 2508.17437 • Published • 33 -
Multi-View 3D Point Tracking
Paper • 2508.21060 • Published • 18
-
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes
Paper • 2411.00771 • Published • 9 -
SynCity: Training-Free Generation of 3D Worlds
Paper • 2503.16420 • Published • 26 -
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities
Paper • 2501.08983 • Published • 20 -
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion
Paper • 2407.13759 • Published • 18
-
Describe Anything: Detailed Localized Image and Video Captioning
Paper • 2504.16072 • Published • 63 -
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
Paper • 2410.09604 • Published -
Geospatial Mechanistic Interpretability of Large Language Models
Paper • 2505.03368 • Published • 10 -
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Paper • 2505.02836 • Published • 7
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 35 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 28 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 127 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 23
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 66 -
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Paper • 2404.00399 • Published • 43 -
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 94 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 71