File size: 6,346 Bytes
84a1c35 dd8bc5d f1803ce dd8bc5d f1803ce dd8bc5d f1803ce dd8bc5d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
---
title: "Matrix"
emoji: π
colorFrom: blue
colorTo: blue
sdk: docker
app_file: server.py
pinned: true
short_description: AI Gaming server
app_port: 8080
disable_embedding: false
---
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
# Matrix-Game: Interactive World Foundation Model
<font size=7><div align='center' > [[π€ Huggingface](https://huggingface.co/Skywork/Matrix-Game)] [[π Technical Report](https://github.com/SkyworkAI/Matrix-Game/blob/main/assets/report.pdf)] [[π Project Website](https://matrix-game-homepage.github.io/)] </div></font>
<div align="center">
<img src="assets/videos/demo.gif" alt="teaser" />
</div>
## π Overview
**Matrix-Game** is a 17B-parameter interactive world foundation model for controllable game world generation.
## β¨ Key Features
- π― **Feature 1**: **Interactive Generation.** A diffusion-based image-to-world model that generates high-quality videos conditioned on keyboard and mouse inputs, enabling fine-grained control and dynamic scene evolution.
- π **Feature 2**: **GameWorld Score.** A comprehensive benchmark for evaluating Minecraft world models across four key dimensions, including visual quality, temporal quality, action controllability, and physical rule understanding.
- π‘ **Feature 3**: **Matrix-Game Dataset** A large-scale Minecraft dataset with fine-grained action annotations, supporting scalable training for interactive and physically grounded world modeling.
## π₯ Latest Updates
* [2025-05] π Initial release of Matrix-Game Model
## π Performance Comparison
### GameWorld Score Benchmark Comparison
| Model | Image Quality β | Aesthetic Quality β | Temporal Cons. β | Motion Smooth. β | Keyboard Acc. β | Mouse Acc. β | 3D Cons. β |
|-----------|------------------|-------------|-------------------|-------------------|------------------|---------------|-------------|
| Oasis | 0.65 | 0.48 | 0.94 | **0.98** | 0.77 | 0.56 | 0.56 |
| MineWorld | 0.69 | 0.47 | 0.95 | **0.98** | 0.86 | 0.64 | 0.51 |
| **Ours** | **0.72** | **0.49** | **0.97** | **0.98** | **0.95** | **0.95** | **0.76** |
**Metric Descriptions**:
- **Image Quality** / **Aesthetic**: Visual fidelity and perceptual appeal of generated frames
- **Temporal Consistency** / **Motion Smoothness**: Temporal coherence and smoothness between frames
- **Keyboard Accuracy** / **Mouse Accuracy**: Accuracy in following user control signals
- **3D Consistency**: Geometric stability and physical plausibility over time
Please check our [GameWorld](https://github.com/SkyworkAI/Matrix-Game/tree/main/GameWorldScore) benchmark for detailed implementation.
### Human Evaluation

> Double-blind human evaluation by two independent groups across four key dimensions: **Overall Quality**, **Controllability**, **Visual Quality**, and **Temporal Consistency**.
> Scores represent the percentage of pairwise comparisons in which each method was preferred. Matrix-Game consistently outperforms prior models across all metrics and both groups.
## π Quick Start
```
# clone the repository:
git clone https://github.com/SkyworkAI/Matrix-Game.git
cd Matrix-Game
# install dependencies:
pip install -r requirements.txt
# install apex and FlashAttention-3
# Our project also depends on [apex](https://github.com/NVIDIA/apex) and [FlashAttention-3](https://github.com/Dao-AILab/flash-attention)
# Run batch inference to generate videos
bash run_inference.sh
# Run interactive websocket server
python server.py --model_root ./models/matrixgame
```
## Interactive WebSocket Server
We've implemented a real-time interactive WebSocket server that uses the Matrix-Game model to generate game frames based on keyboard and mouse inputs:
### Features:
- **Real-time Generation**: Frames are generated on-the-fly based on user inputs
- **Keyboard & Mouse Control**: Move through the virtual world using WASD keys and mouse movements
- **Multiple Scenes**: Choose from different environments (forest, desert, beach, hills, etc.)
- **Fallback Mode**: Automatically falls back to demo mode when GPU resources are unavailable
### Usage:
```bash
# Basic startup
python server.py
# With custom model paths
python server.py --model_root ./models/matrixgame --port 8080
# With individual model component paths
python server.py --dit_path ./custom/dit --vae_path ./custom/vae --textenc_path ./custom/textenc
```
### Connection:
- WebSocket endpoint: ws://localhost:8080/ws
- Web client: http://localhost:8080/
### System Requirements:
- NVIDIA GPU with CUDA support
- 24GB+ VRAM recommended for smooth frame generation
## π§ Hardware Requirements
- **GPU**:
- NVIDIA A100/H100
- **VRAM**:
- Requires **β₯80GB of GPU memory** for a single 65-frame video inference.
## β Acknowledgements
We would like to express our gratitude to:
- [Diffusers](https://github.com/huggingface/diffusers) for their excellent diffusion model framework
- [HunyuanVideo](https://github.com/Tencent/HunyuanVideo) for their strong base model
- [MineDojo](https://minedojo.org/knowledge_base) for their Minecraft video dataset
- [MineRL](https://github.com/minerllabs/minerl) for their excellent gym framework
- [Video-Pre-Training](https://github.com/openai/Video-Pre-Training) for their accurate Inverse Dynamics Model
- [GameFactory](https://github.com/KwaiVGI/GameFactory) for their idea of action control module
We are grateful to the broader research community for their open exploration and contributions to the field of interactive world generation.
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## π Citation
If you find this project useful, please cite our paper:
```bibtex
@article{zhang2025matrixgame,
title = {Matrix-Game: Interactive World Foundation Model},
author = {Yifan Zhang and Chunli Peng and Boyang Wang and Puyi Wang and Qingcheng Zhu and Zedong Gao and Eric Li and Yang Liu and Yahui Zhou},
journal = {arXiv},
year = {2025}
}
```
|