|
|
--- |
|
|
tags: |
|
|
- Human Mesh Recovery |
|
|
- Human Pose and Shape Estimation |
|
|
- Multi-Person Mesh Recovery |
|
|
arxiv: '2411.19824' |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# SAT-HMR |
|
|
|
|
|
Offical [Pytorch](https://pytorch.org/) implementation of our paper: |
|
|
|
|
|
<h3 align="center">SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens <br> (CVPR 2025)</h3> |
|
|
|
|
|
<h4 align="center" style="text-decoration: none;"> |
|
|
<a href="https://github.com/ChiSu001/", target="_blank"><b>Chi Su</b></a> |
|
|
, |
|
|
<a href="https://shirleymaxx.github.io/", target="_blank"><b>Xiaoxuan Ma</b></a> |
|
|
, |
|
|
<a href="https://scholar.google.com/citations?user=DoUvUz4AAAAJ&hl=en", target="_blank"><b>Jiajun Su</b></a> |
|
|
, |
|
|
<a href="https://cfcs.pku.edu.cn/english/people/faculty/yizhouwang/index.htm", target="_blank"><b>Yizhou Wang</b></a> |
|
|
|
|
|
</h4> |
|
|
|
|
|
<h3 align="center"> |
|
|
<a href="https://arxiv.org/abs/2411.19824", target="_blank">Paper</a> | |
|
|
<a href="https://ChiSu001.github.io/SAT-HMR", target="_blank">Project Page</a> | |
|
|
<a href="https://youtu.be/wLfNrDYFAns", target="_blank">Video</a> | |
|
|
<a href="https://github.com/ChiSu001/SAT-HMR", target="_blank">GitHub</a> |
|
|
</h3> |
|
|
|
|
|
<!-- <div align="center"> |
|
|
<img src="figures/results.png" width="70%"> |
|
|
<img src="figures/results_3d.gif" width="29%"> |
|
|
</div> --> |
|
|
|
|
|
|
|
|
<!-- <h3> Overview of SAT-HMR </h3> --> |
|
|
|
|
|
<p align="center"> |
|
|
<img src="figures/pipeline.png"/> |
|
|
</p> |
|
|
|
|
|
<!-- <p align="center"> |
|
|
<img src="figures/pipeline.png" style="height: 300px; object-fit: cover;"/> |
|
|
</p> --> |
|
|
|
|
|
## Installation |
|
|
|
|
|
We tested with python 3.11, PyTorch 2.4.1 and CUDA 12.1. |
|
|
|
|
|
1. Create a conda environment. |
|
|
```bash |
|
|
conda create -n sathmr python=3.11 -y |
|
|
conda activate sathmr |
|
|
``` |
|
|
|
|
|
2. Install [PyTorch](https://pytorch.org/) and [xFormers](https://github.com/facebookresearch/xformers). |
|
|
```bash |
|
|
# Install PyTorch. It is recommended that you follow [official instruction](https://pytorch.org/) and adapt the cuda version to yours. |
|
|
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.1 -c pytorch -c nvidia |
|
|
|
|
|
# Install xFormers. It is recommended that you follow [official instruction](https://github.com/facebookresearch/xformers) and adapt the cuda version to yours. |
|
|
pip install -U xformers==0.0.28.post1 --index-url https://download.pytorch.org/whl/cu121 |
|
|
``` |
|
|
|
|
|
3. Install other dependencies. |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
4. You may need to modify `chumpy` package to avoid errors. For detailed instructions, please check [this guidance](docs/fix_chumpy.md). |
|
|
|
|
|
## Download Models & Weights |
|
|
|
|
|
1. Download SMPL-related weights. |
|
|
- Download `basicModel_f_lbs_10_207_0_v1.0.0.pkl`, `basicModel_m_lbs_10_207_0_v1.0.0.pkl`, and `basicModel_neutral_lbs_10_207_0_v1.0.0.pkl` from [here](https://smpl.is.tue.mpg.de/) (female & male) and [here](http://smplify.is.tue.mpg.de/) (neutral) to `${Project}/weights/smpl_data/smpl`. Please rename them as `SMPL_FEMALE.pkl`, `SMPL_MALE.pkl`, and `SMPL_NEUTRAL.pkl`, respectively. |
|
|
- Download others from [Google drive](https://drive.google.com/drive/folders/1wmd_pjmmDn3eSl3TLgProgZgCQZgtZIC?usp=sharing) and put them to `${Project}/weights/smpl_data/smpl`. |
|
|
|
|
|
2. Download DINOv2 pretrained weights from [their official repository](https://github.com/facebookresearch/dinov2?tab=readme-ov-file#pretrained-models). We use `ViT-B/14 distilled (without registers)`. Please put `dinov2_vitb14_pretrain.pth` to `${Project}/weights/dinov2`. These weights will be used to initialize our encoder. **You can skip this step if you are not going to train SAT-HMR.** |
|
|
|
|
|
3. Download pretrained weights for inference and evaluation from [Google drive](https://drive.google.com/file/d/12tGbqcrJ8YACcrfi5qslZNEciIHxcScZ/view?usp=sharing) or [🤗HuggingFace](https://huggingface.co/ChiSu001/SAT-HMR/blob/main/weights/sat_hmr/sat_644.pth). Please put them to `${Project}/weights/sat_hmr`. |
|
|
|
|
|
Now the `weights` directory structure should be like this. |
|
|
|
|
|
``` |
|
|
${Project} |
|
|
|-- weights |
|
|
|-- dinov2 |
|
|
| `-- dinov2_vitb14_pretrain.pth |
|
|
|-- sat_hmt |
|
|
| `-- sat_644.pth |
|
|
`-- smpl_data |
|
|
`-- smpl |
|
|
|-- body_verts_smpl.npy |
|
|
|-- J_regressor_h36m_correct.npy |
|
|
|-- SMPL_FEMALE.pkl |
|
|
|-- SMPL_MALE.pkl |
|
|
|-- smpl_mean_params.npz |
|
|
`-- SMPL_NEUTRAL.pkl |
|
|
``` |
|
|
|
|
|
## Inference on Images |
|
|
<h4> Inference with 1 GPU</h4> |
|
|
|
|
|
We provide some demo images in `${Project}/demo`. You can run SAT-HMR on all images on a single GPU via: |
|
|
|
|
|
|
|
|
```bash |
|
|
python main.py --mode infer --cfg demo |
|
|
``` |
|
|
|
|
|
Results with overlayed meshes will be saved in `${Project}/demo_results`. |
|
|
|
|
|
You can specify your own inference configuration by modifing `${Project}/configs/run/demo.yaml`: |
|
|
|
|
|
- `input_dir` specifies the input image folder. |
|
|
- `output_dir` specifies the output folder. |
|
|
- `conf_thresh` specifies a list of confidence thresholds used for detection. SAT-HMR will run inference using thresholds in the list, respectively. |
|
|
- `infer_batch_size` specifies the batch size used for inference (on a single GPU). |
|
|
|
|
|
<h4> Inference with Multiple GPUs</h4> |
|
|
|
|
|
You can also try distributed inference on multiple GPUs if your input folder contains a large number of images. |
|
|
Since we use [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) to launch our distributed configuration, first you may need to configure [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) for how the current system is setup for distributed process. To do so run the following command and answer the questions prompted to you: |
|
|
|
|
|
```bash |
|
|
accelerate config |
|
|
``` |
|
|
|
|
|
Then run: |
|
|
```bash |
|
|
accelerate launch main.py --mode infer --cfg demo |
|
|
``` |
|
|
|
|
|
<!-- ## Datasets Preparation |
|
|
|
|
|
Coming soon. |
|
|
|
|
|
## Training and Evaluation |
|
|
|
|
|
Coming soon. --> |
|
|
|
|
|
## Citing |
|
|
|
|
|
If you find this code useful for your research, please consider citing our paper: |
|
|
```bibtex |
|
|
@InProceedings{Su_2025_CVPR, |
|
|
author = {Su, Chi and Ma, Xiaoxuan and Su, Jiajun and Wang, Yizhou}, |
|
|
title = {SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens}, |
|
|
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, |
|
|
month = {June}, |
|
|
year = {2025}, |
|
|
pages = {16796-16806} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgement |
|
|
This repo is built on the excellent work [DINOv2](https://github.com/facebookresearch/dinov2), [DAB-DETR](https://github.com/IDEA-Research/DAB-DETR), [DINO](https://github.com/IDEA-Research/DINO) and [🤗 Accelerate](https://huggingface.co/docs/accelerate/index). Thanks for these great projects. |