# GameWorld: A Unified Benchmark for Minecraft World Models ## :mega: Overview With the rise of world models, an increasing number of studies have focused on the Minecraft environment, aiming to leverage video generation models to produce videos that not only align with user action inputs, but also adhere to the physical rules inherent in the game. However, existing research lacks a unified evaluation benchmark to consistently measure and compare model performance in the setting with actions input. To address these challenges, we propose **GameWorld**, a unified benchmark that evaluates not only the perceptual quality of generated videos, but also their *controllability* and *physical plausibility*. ## :mortar_board: Evaluation Results #### Overall Performance | Model | Image Quality ↑ | Aesthetic Quality ↑ | Temporal Cons. ↑ | Motion Smooth. ↑ | Keyboard Acc. ↑ | Mouse Acc. ↑ | 3D Cons. ↑ | |-----------|------------------|-------------|-------------------|-------------------|------------------|---------------|-------------| | Oasis | 0.65 | 0.48 | 0.94 | **0.98** | 0.77 | 0.56 | 0.56 | | MineWorld | 0.69 | 0.47 | 0.95 | **0.98** | 0.86 | 0.64 | 0.51 | | **Ours** | **0.72** | **0.49** | **0.97** | **0.98** | **0.95** | **0.95** | **0.76** | ## :hammer: Installation ### Environment Setup First, create an environment if you want and install required dependencied: ```shell # Create the environment (example command) conda create -n GameWorld python=3.10 # Activate the environment conda activate GameWorld # Install dependencies conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.4 -c pytorch -c nvidia pip install -r requirements.txt ``` To install DROID-SLAM, you should ```shell cd GameWorld/third_party/DROID-SLAM python setup.py install ``` `torch_scatter` is needed, for which you should pay attention to your own pytorch and cuda version. In our settings, it is ```shell # demo pip install torch-scatter -f https://data.pyg.org/whl/+.html pip install torch-scatter -f https://data.pyg.org/whl/torch-2.4.1+cu124.html ``` ### Dependencies Setup ```shell # imaging_quality mkdir -p ~/.cache/GameWorld_bench/pyiqa_model wget https://github.com/chaofengc/IQA-PyTorch/releases/download/v0.1-weights/musiq_spaq_ckpt-358bb6af.pth -P ~/.cache/GameWorld_bench/pyiqa_model # motion_smoothness mkdir -p ~/.cache/GameWorld_bench/amt_model wget https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth -P ~/.cache/GameWorld_bench/amt_model # action_control mkdir -p "~/.cache/GameWorld_bench/IDM" wget https://openaipublic.blob.core.windows.net/minecraft-rl/idm/4x_idm.model -P ~/.cache/GameWorld_bench/IDM wget https://openaipublic.blob.core.windows.net/minecraft-rl/idm/4x_idm.weights -P ~/.cache/GameWorld_bench/IDM # 3d_consistency mkdir -p ~/.cache/GameWorld_bench/droid_model gdown 1PpqVt1H4maBa_GbPJp4NwxRsd9jk-elh -O ~/.cache/GameWorld_bench/droid_model/droid.pth ``` ## ✅ Usage #### World Generation Before evaluation, you should have the videos prepared. The videos should have a format of {prefix}_{action_name}.mp4. If you want to evaluate per environment, then the prefix should be index. Otherwise, it could be everything (e.g. basename of init_image). In our benchmark, we test 76 actions for 32 init_images, which generates 2432 videos in a format of follows, with _the same action grouped together_: - data - 0000_attack.mp4 - 0001_attack.mp4 - ... - 0031_attack.mp4 - 0032_attack_camera_dl.mp4 - ... - 2431_right_jump.mp4 #### Evaluation For overall metrics calculation, ```shell bash scripts/evaluate_Matrix.sh ``` If you wants the results of each scene(optional), ```shell bash scripts/evaluate_Matrix_per_scene.sh ``` For results of each action(optional), ```shell bash scripts/evaluate_Matrix_per_action.sh ``` ## 🤗 Acknowledgments Part of our codes are based on [VBench](https://github.com/Vchitect/VBench) and [VPT](https://github.com/openai/Video-Pre-Training). Thanks for their efforts and innovations. Thank you to everyone who contributed their wisdom and efforts to this project.