jbilcke-hf's picture
jbilcke-hf HF Staff
up
f1803ce
|
raw
history blame
6.35 kB
metadata
title: Matrix
emoji: 🐟
colorFrom: blue
colorTo: blue
sdk: docker
app_file: server.py
pinned: true
short_description: AI Gaming server
app_port: 8080
disable_embedding: false

Matrix-Game: Interactive World Foundation Model

teaser

πŸ“ Overview

Matrix-Game is a 17B-parameter interactive world foundation model for controllable game world generation.

✨ Key Features

  • 🎯 Feature 1: Interactive Generation. A diffusion-based image-to-world model that generates high-quality videos conditioned on keyboard and mouse inputs, enabling fine-grained control and dynamic scene evolution.
  • πŸš€ Feature 2: GameWorld Score. A comprehensive benchmark for evaluating Minecraft world models across four key dimensions, including visual quality, temporal quality, action controllability, and physical rule understanding.
  • πŸ’‘ Feature 3: Matrix-Game Dataset A large-scale Minecraft dataset with fine-grained action annotations, supporting scalable training for interactive and physically grounded world modeling.

πŸ”₯ Latest Updates

  • [2025-05] πŸŽ‰ Initial release of Matrix-Game Model

πŸš€ Performance Comparison

GameWorld Score Benchmark Comparison

Model Image Quality ↑ Aesthetic Quality ↑ Temporal Cons. ↑ Motion Smooth. ↑ Keyboard Acc. ↑ Mouse Acc. ↑ 3D Cons. ↑
Oasis 0.65 0.48 0.94 0.98 0.77 0.56 0.56
MineWorld 0.69 0.47 0.95 0.98 0.86 0.64 0.51
Ours 0.72 0.49 0.97 0.98 0.95 0.95 0.76

Metric Descriptions:

  • Image Quality / Aesthetic: Visual fidelity and perceptual appeal of generated frames

  • Temporal Consistency / Motion Smoothness: Temporal coherence and smoothness between frames

  • Keyboard Accuracy / Mouse Accuracy: Accuracy in following user control signals

  • 3D Consistency: Geometric stability and physical plausibility over time

    Please check our GameWorld benchmark for detailed implementation.

Human Evaluation

Human Win Rate

Double-blind human evaluation by two independent groups across four key dimensions: Overall Quality, Controllability, Visual Quality, and Temporal Consistency.
Scores represent the percentage of pairwise comparisons in which each method was preferred. Matrix-Game consistently outperforms prior models across all metrics and both groups.

πŸš€ Quick Start

# clone the repository:
git clone https://github.com/SkyworkAI/Matrix-Game.git
cd Matrix-Game

# install dependencies:
pip install -r requirements.txt

# install apex and FlashAttention-3
# Our project also depends on [apex](https://github.com/NVIDIA/apex) and [FlashAttention-3](https://github.com/Dao-AILab/flash-attention)

# Run batch inference to generate videos
bash run_inference.sh

# Run interactive websocket server
python server.py --model_root ./models/matrixgame

Interactive WebSocket Server

We've implemented a real-time interactive WebSocket server that uses the Matrix-Game model to generate game frames based on keyboard and mouse inputs:

Features:

  • Real-time Generation: Frames are generated on-the-fly based on user inputs
  • Keyboard & Mouse Control: Move through the virtual world using WASD keys and mouse movements
  • Multiple Scenes: Choose from different environments (forest, desert, beach, hills, etc.)
  • Fallback Mode: Automatically falls back to demo mode when GPU resources are unavailable

Usage:

# Basic startup
python server.py

# With custom model paths
python server.py --model_root ./models/matrixgame --port 8080

# With individual model component paths
python server.py --dit_path ./custom/dit --vae_path ./custom/vae --textenc_path ./custom/textenc

Connection:

System Requirements:

  • NVIDIA GPU with CUDA support
  • 24GB+ VRAM recommended for smooth frame generation

πŸ”§ Hardware Requirements

  • GPU:
    • NVIDIA A100/H100
  • VRAM:
    • Requires β‰₯80GB of GPU memory for a single 65-frame video inference.

⭐ Acknowledgements

We would like to express our gratitude to:

We are grateful to the broader research community for their open exploration and contributions to the field of interactive world generation.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“Ž Citation

If you find this project useful, please cite our paper:

@article{zhang2025matrixgame,
  title     = {Matrix-Game: Interactive World Foundation Model},
  author    = {Yifan Zhang and Chunli Peng and Boyang Wang and Puyi Wang and Qingcheng Zhu and Zedong Gao and Eric Li and Yang Liu and Yahui Zhou},
  journal   = {arXiv},
  year      = {2025}
}