qizekun
/

ShapeLLM_13B_general_v1.0

Text Generation

Inference Endpoints

Model card Files Files and versions Community

ShapeLLM_13B_general_v1.0 / README.md

qizekun's picture

Link model to paper (#1)

d66859d verified 5 months ago

|

3.31 kB

	---
	license: apache-2.0
	datasets:
	- qizekun/ShapeLLM
	language:
	- en
	---

	## ShapeLLM model

	This repository contains the ShapeLLM-13B model presented in [ShapeLLM: Universal 3D Object Understanding for Embodied Interaction](https://huggingface.co/papers/2402.17766).

	## Install

	[//]: # (If you are using Windows, do NOT proceed, see instructions [here](https://github.com/qizekun/LLaVA/blob/main/docs/Windows.md).)

	1. Clone this repository and navigate to ShapeLLM folder
	```Shell
	git clone https://github.com/qizekun/ShapeLLM.git
	cd ShapeLLM
	```
	2. Install Package
	```Shell
	conda create -n shapellm python=3.10 -y
	conda activate shapellm
	pip install --upgrade pip # enable PEP 660 support
	pip install -e .
	```
	3. Install additional packages for training cases
	```Shell
	pip install -e ".[train]"
	pip install flash-attn --no-build-isolation
	```
	4. Install PointNet++
	```Shell
	pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
	```


	## ShapeLLM
	### model weights
	Please check out our [Model Zoo](https://github.com/qizekun/ShapeLLM/blob/main/docs/MODEL_ZOO.md) for all public ShapeLLM checkpoints.

	### Demo
	#### CLI Inference
	Chat about point clouds using CLI interface. It also supports multiple GPUs, 4-bit and 8-bit quantized inference.
	```Shell
	python -m llava.serve.cli \
	--model-path qizekun/ShapeLLM_13B_general_v1.0 \
	--pts-file assets/instrument.npy
	```

	### Training
	Consistent with LLaVA, we adopt a two-stage training approach. In the first stage, we solely fine-tune the projector for semantic alignment. In the second stage, we conduct full fine-tuning using Instruction Following data.
	Download data following [DATA](https://github.com/qizekun/ShapeLLM/blob/main/docs/DATA.md), organize the data as follows in `./playground/data/shapellm/`,
	```
	│playground/data/shapellm/
	├── cap3d_objaverse_785k.json
	├── cap3d_objaverse_sft_45k.json
	├── gapartnet_sft_27k_openai.json
	├── gapartnet_pcs
	│ ├── Box_100129_0_0.npy
	│ └── ...
	└── cap3d_pcs
	├── 00000054c36d44a2a483bdbff31d8edf.pt
	└── ...
	```
	Furthermore, ShapeLLM utilizes the Large version of [ReCon++](https://github.com/qizekun/ShapeLLM/blob/main/ReConV2/cfgs/pretrain/large/openshape.yaml) as the point encoder.
	You need to download the [ReCon++ weight](https://huggingface.co/qizekun/ReConV2/blob/main/zeroshot/large/best_lvis.pth) and save it to `./checkpoints/recon/large.pth`.
	```
	│checkpoints/recon/
	└── large.pth
	```
	1. Feature Alignment Stage
	```
	sh scripts/pretrain.sh
	```
	2. Visual Instruction Tuning Stage
	```
	sh scripts/finetune.sh
	```
	The training takes around 14 hours for ShapeLLM-13B on 8x A100 (80G). It takes around 7 hours for ShapeLLM-7B.

	### Zero-shot Understanding on 3D MM-Vet
	Evaluate 3D MLLMs for integrated capabilities and embodied interaction capabilities, run the script:
	```
	sh scripts/eval/mmvet.sh
	```
	Using GPT4 to calulate the 3D MM-Vet score:
	```
	sh scripts/eval/eval_mmvet.sh
	```

	### Visual Grounding on GApartNet
	Evaluate the performance of ShapeLLM on the GApartNet dataset, run the script:
	```
	sh scripts/eval/gapartnet_ref.sh
	```
	Calucate the generative 3D visual grounding accuracy:
	```
	sh scripts/eval/eval_gapartnet.sh
	```