|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- qizekun/ShapeLLM |
|
language: |
|
- en |
|
--- |
|
|
|
## ShapeLLM model |
|
|
|
This repository contains the ShapeLLM-13B model presented in [ShapeLLM: Universal 3D Object Understanding for Embodied Interaction](https://huggingface.co/papers/2402.17766). |
|
|
|
## Install |
|
|
|
[//]: # (If you are using Windows, do *NOT* proceed, see instructions [here](https://github.com/qizekun/LLaVA/blob/main/docs/Windows.md).) |
|
|
|
1. Clone this repository and navigate to ShapeLLM folder |
|
```Shell |
|
git clone https://github.com/qizekun/ShapeLLM.git |
|
cd ShapeLLM |
|
``` |
|
2. Install Package |
|
```Shell |
|
conda create -n shapellm python=3.10 -y |
|
conda activate shapellm |
|
pip install --upgrade pip # enable PEP 660 support |
|
pip install -e . |
|
``` |
|
3. Install additional packages for training cases |
|
```Shell |
|
pip install -e ".[train]" |
|
pip install flash-attn --no-build-isolation |
|
``` |
|
4. Install PointNet++ |
|
```Shell |
|
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib" |
|
``` |
|
|
|
|
|
## ShapeLLM |
|
### model weights |
|
Please check out our [Model Zoo](https://github.com/qizekun/ShapeLLM/blob/main/docs/MODEL_ZOO.md) for all public ShapeLLM checkpoints. |
|
|
|
### Demo |
|
#### CLI Inference |
|
Chat about point clouds using CLI interface. It also supports multiple GPUs, 4-bit and 8-bit quantized inference. |
|
```Shell |
|
python -m llava.serve.cli \ |
|
--model-path qizekun/ShapeLLM_13B_general_v1.0 \ |
|
--pts-file assets/instrument.npy |
|
``` |
|
|
|
### Training |
|
Consistent with LLaVA, we adopt a two-stage training approach. In the first stage, we solely fine-tune the projector for semantic alignment. In the second stage, we conduct full fine-tuning using Instruction Following data. |
|
Download data following [DATA](https://github.com/qizekun/ShapeLLM/blob/main/docs/DATA.md), organize the data as follows in `./playground/data/shapellm/`, |
|
``` |
|
βplayground/data/shapellm/ |
|
βββ cap3d_objaverse_785k.json |
|
βββ cap3d_objaverse_sft_45k.json |
|
βββ gapartnet_sft_27k_openai.json |
|
βββ gapartnet_pcs |
|
β βββ Box_100129_0_0.npy |
|
β βββ ... |
|
βββ cap3d_pcs |
|
βββ 00000054c36d44a2a483bdbff31d8edf.pt |
|
βββ ... |
|
``` |
|
Furthermore, ShapeLLM utilizes the Large version of [ReCon++](https://github.com/qizekun/ShapeLLM/blob/main/ReConV2/cfgs/pretrain/large/openshape.yaml) as the point encoder. |
|
You need to download the [ReCon++ weight](https://huggingface.co/qizekun/ReConV2/blob/main/zeroshot/large/best_lvis.pth) and save it to `./checkpoints/recon/large.pth`. |
|
``` |
|
βcheckpoints/recon/ |
|
βββ large.pth |
|
``` |
|
**1. Feature Alignment Stage** |
|
``` |
|
sh scripts/pretrain.sh |
|
``` |
|
**2. Visual Instruction Tuning Stage** |
|
``` |
|
sh scripts/finetune.sh |
|
``` |
|
The training takes around 14 hours for ShapeLLM-13B on 8x A100 (80G). It takes around 7 hours for ShapeLLM-7B. |
|
|
|
### Zero-shot Understanding on 3D MM-Vet |
|
Evaluate 3D MLLMs for integrated capabilities and embodied interaction capabilities, run the script: |
|
``` |
|
sh scripts/eval/mmvet.sh |
|
``` |
|
Using GPT4 to calulate the 3D MM-Vet score: |
|
``` |
|
sh scripts/eval/eval_mmvet.sh |
|
``` |
|
|
|
### Visual Grounding on GApartNet |
|
Evaluate the performance of ShapeLLM on the GApartNet dataset, run the script: |
|
``` |
|
sh scripts/eval/gapartnet_ref.sh |
|
``` |
|
Calucate the generative 3D visual grounding accuracy: |
|
``` |
|
sh scripts/eval/eval_gapartnet.sh |
|
``` |