Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels
Abstract
PIXIE, a neural network method, predicts physical properties of 3D scenes from visual features, enabling fast and realistic physics simulation using supervised learning and pretrained visual features.
Inferring the physical properties of 3D scenes from visual information is a critical yet challenging task for creating interactive and realistic virtual worlds. While humans intuitively grasp material characteristics such as elasticity or stiffness, existing methods often rely on slow, per-scene optimization, limiting their generalizability and application. To address this problem, we introduce PIXIE, a novel method that trains a generalizable neural network to predict physical properties across multiple scenes from 3D visual features purely using supervised losses. Once trained, our feed-forward network can perform fast inference of plausible material fields, which coupled with a learned static scene representation like Gaussian Splatting enables realistic physics simulation under external forces. To facilitate this research, we also collected PIXIEVERSE, one of the largest known datasets of paired 3D assets and physic material annotations. Extensive evaluations demonstrate that PIXIE is about 1.46-4.39x better and orders of magnitude faster than test-time optimization methods. By leveraging pretrained visual features like CLIP, our method can also zero-shot generalize to real-world scenes despite only ever been trained on synthetic data. https://pixie-3d.github.io/
Community
TL;DR: A fast and generalized feed-forward model of 3D Physics for interactable world reconstruction.
Project page: https://pixie-3d.github.io/
Code: https://github.com/vlongle/pixie
Pretrained models: https://huggingface.co/datasets/vlongle/pixie
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis (2025)
- MASIV: Toward Material-Agnostic System Identification from Videos (2025)
- LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion (2025)
- VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization (2025)
- TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos (2025)
- A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation (2025)
- InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper