BlenderGym Benchmark [CVPR 2025 Highlight]
๐ Homepage | ๐ arXiv | ๐ Leaderboard | ๐ค Hugging Face
This repo contains the evaluation code for the paper "BlenderGym: Benchmarking Foundational Model Systems for 3D Graphics".
๐News
- 2025-4-11: We release BlenderGym, the first VLM System benchmark on 3D graphics editing!
- 2025-4-4: BlenderGym is accepted for Highlight at CVPR 2025 (top 387 over 13,008 valid submissions)!
- 2025-4-2: Our paper is now accessible at https://arxiv.org/abs/2504.01786!
- 2025-2-26: BlenderGym is accepted to CVPR 2025!
BlenderGym Usage
- Jump to Installation to setup the Conda environment and download benchmark data.
- After installation, jump to VLM Evaluation right away to benchmark your VLM!
Installation
# Clone BlenderGym
git clone https://github.com/richard-guyunqi/BlenderGym-Open.git
cd BlenderGym-Open
# Creates conda environment for BlenderGym
conda create -n blendergym python=3.10
conda activate blendergym
# Install environment and download benchmark data
bash starter_setup.sh
VLM Evaluation
VLM Setup
This section sets your VLM up for inference on BlenderGym.
We provide a list of BlenderGym-suppoted models in Supported Models. To run the open-source ones among them, jump directly to Inference on BlenderGym. To run the API-required ones among them, jump to API Key Plug-in to enter your API first.
If the VLM you want to test is not supported, jump to Custom VLM Plug-in.
[Optional] Test your VLM setup
To sanity-check your API / local implementation, you can optionally jump to Testing VLM Setup for some simple tests.
Inference on BlenderGym
This section introduces how to run your VLM on BlenderGym data to generate output edits.
python inference.py --task placement --vlm_only --generator_type [model_id] --evaluator_type [model_id]
# Minimal example:
# python inference.py --task test --vlm_only --generator_type qwen --evaluator_type qwen
where:
--task
: the task your VLM is evaluated on. You may enter task names, or one of "all", "subset", "test."--generator_type
:model_id
for generator VLM.--verifier_type
:model_id
for verifier VLM.
More details about the format of those arguments can be found in inference.py
. Models in Supported Models are provided with their model_id
. For custom VLMs, you may use the model_id
you defined in Custom VLM Plug-in.
Running the command above will generate all the proposal edits and renders in system/outputs/
. Metadata to all those edit scripts will be saved at infosaved/
by default.
Evaluation of VLM-generated results
This section introduces how to evaluate the output from your VLM after generating the output edits.
python evaluation.py --inference_metadata_saved_path [path_to_the_json_metadata]
where
--inference_metadata_saved_path
: path to the inference metadata(paths of proposal edit scripts, winner information, etc.) By default, it's a json file underinfosaved/
.
More details about the arguments can be found in evaluation.py
.
You can check eval_renders/overall_scores.json
for the evaluation scores. Evaluation renders should be saved under eval_renders/
.
Utilities
Testing VLM setup
This section provides unit tests for your VLM setup. We recommend you test your API plug-in or custom VLM plug-in. It starts by testing text-only or vision-language input for your VLM, and then test your VLM on a single instance of BlenderGym.
- We recommend start testing a single query for your VLM pipeline. To do that, we offer two test scripts saved under
Tasksolver/test_scripts/
:text_only.py
andvlm.py
. With them, you can test text-only input and vision-language input for your VLM, respectively. Follow the todos in them to set up the tests.
python Tasksolver/test_scripts/text_only.py # Test on language-only inputs
python Tasksolver/test_scripts/vision_language.py # Test on vision-language inputs
- After testing a single query of your VLM, you can try to run your VLM on one instance of BlenderGym by
cd system
python vlm_single_edit.py --model_id [your_model_id]
You should see one tree of edits for one task instance under system/outputs
.
After you are done with testing the VLM setup, you may jump direclty into to Inference on BlenderGym.
API Key Plug-in
This section introduces how to plug in your API for proprietary models.
You can add your OpenAI/Anthropic/Google/Other API to system/credentials/{API-name}.txt
. An example is attached in system/credentials/api_exmaple.txt
.
After you are done with entering your APIs, you may jump back to VLM Setup.
Custom VLM Plug-in
This section introduces how to plug in your VLM if it's not listed on Supported Models. It's applicable for models either with API calls or local inference.
Create a
TaskSolver/tasksolver/{your_model}.py
which contains a classYourModel
, similar toClaudeModel
inTaskSolver/tasksolver/claude.py
(with API calling) orInternModel
inTaskSolver/tasksolver/intern.py
(local inference). You only have to changeself.ask()
andself.prepare_payload()
to fit the format.Add
YourModel
toTasksolver/tasksolver/agent.py
by following theTODO
s. Note that you will be required to name your model by amodel_id
, which is crucial for later usage.
After you are done with the two steps above, you may jump back to VLM Setup.
Supported Models
Supported Model Name | model_id |
---|---|
GPT-4o | gpt-4o |
GPT4-Turbo | gpt-4-turbo |
GPT-4o-mini | gpt-4o-mini |
Claude-3.7-Sonnet | claude-3.7-sonnet-latest |
Claude 3.5 Sonnet(v2) | claude-3-5-sonnet-latest |
Claude 3.5 Sonnet | claude-3-5-sonnet-20240620 |
Claude 3.5 Haiku | claude-3-5-haiku-latest |
Claude 3 Opus | claude-3-opus-latest |
Gemini 2.0 Flash | gemini-2.0-flash |
Gemini 1.5 Flash | gemini-1.5-flash |
InternVL2(8B) | intern |
InternLlama | internllama |
MiniCPM V2.6 | minicpm |
MiniCPMLlama | minicpmllama |
Phi 3.5 vision | phi |
PhiLlama | phillama |
Qwen2-VL(7B AWQ) | qwen |
QwenLlama | qwenllama |
Llama 3.1(8B) | llama |
Citation
If you find our work useful, please cite the Bibtex below:
@misc{gu2025blendergymbenchmarkingfoundationalmodel,
title={BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing},
author={Yunqi Gu and Ian Huang and Jihyeon Je and Guandao Yang and Leonidas Guibas},
year={2025},
eprint={2504.01786},
archivePrefix={arXiv},
primaryClass={cs.GR},
url={https://arxiv.org/abs/2504.01786},
}