minecraft-learning-distributed_470k

A Minecraft RL agent trained with PPO (Proximal Policy Optimization) using Stable-Baselines3.

This agent was trained to gather resources in Minecraft.

Training Details

Metric Value
Total Steps 483,923
Episodes 56
Mean Reward 0.64
Best Reward 26.20
Reward Scheme gathering
Learning Rate 0.0003

Hardware

  • Training: NVIDIA RTX 5090 (32GB VRAM)
  • Environment: NVIDIA Jetson Orin AGX (64GB RAM)
  • LLM Server: NVIDIA DGX Spark - GPT-OSS-20B (vLLM)

Architecture

  • Algorithm: PPO (Proximal Policy Optimization)
  • Policy: MLP with [512, 512] hidden layers
  • Observation Space: 82 dimensions (position, velocity, vitals, hotbar, craftable flags)
  • Action Space: 37 discrete actions (movement, mining, crafting, inventory)

Usage

from huggingface_hub import hf_hub_download
from stable_baselines3 import PPO

# Download model
hf_hub_download(
    repo_id='cahlen/minecraft-learning-distributed_470k',
    filename='model.zip',
    local_dir='./models'
)

# Load and use
model = PPO.load('./models/model.zip')

# Run inference
obs = env.reset()
action, _ = model.predict(obs, deterministic=True)

Environment Setup

This model was trained on a custom Minecraft environment using:

  • Mineflayer for bot control
  • Custom Gymnasium wrapper for RL interface
  • Vision features extracted from game data (not computer vision)

Training Configuration

PPO(
    "MlpPolicy",
    env,
    learning_rate=1e-3,
    n_steps=256,
    batch_size=256,
    n_epochs=15,
    gamma=0.99,
    gae_lambda=0.95,
    ent_coef=0.02,
    clip_range=0.2,
    policy_kwargs={"net_arch": [512, 512]},
)

License

MIT

Citation

If you use this model, please cite:

@misc{minecraft_learning_distributed_470k},
  author = {cahlen},
  title = {minecraft-learning-distributed_470k},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/cahlen/minecraft-learning-distributed_470k}}
}
Downloads last month
7
Video Preview
loading

Evaluation results