SpatialGen
Collection
Layout-guided 3D Indoor Scene Generation
β’
4 items
β’
Updated
β’
1
SpatialGen produces multi-view, multi-modal information from a semantic layout using a multi-view, multi-modal diffusion model.
Model | Download |
---|---|
SpatialGen-1.0 | π€ HuggingFace |
Tested with the following environment:
# clone the repository
git clone https://github.com/manycore-research/SpatialGen.git
cd SpatialGen
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Optional: fix the [flux inference bug](https://github.com/vllm-project/vllm/issues/4392)
pip install nvidia-cublas-cu12==12.4.5.8
We provide SpatialGen-Testset with 48 rooms, which labeled with 3D layout and 4.8K rendered images (48 x 100 views, including RGB, normal, depth maps and semantic maps) for MVD inference.
# Single image-to-3D Scene
bash scripts/infer_spatialgen_i2s.sh
# Text-to-image-to-3D Scene
bash scripts/infer_spatialgen_t2s.sh
SpatialGen-1.0 is derived from Stable-Diffusion-v2.1, which is licensed under the CreativeML Open RAIL++-M License.
We would like to thank the following projects that made this work possible:
Base model
stabilityai/stable-diffusion-2-1