minecraft-learning-distributed_470k
A Minecraft RL agent trained with PPO (Proximal Policy Optimization) using Stable-Baselines3.
This agent was trained to gather resources in Minecraft.
Training Details
| Metric | Value |
|---|---|
| Total Steps | 483,923 |
| Episodes | 56 |
| Mean Reward | 0.64 |
| Best Reward | 26.20 |
| Reward Scheme | gathering |
| Learning Rate | 0.0003 |
Hardware
- Training: NVIDIA RTX 5090 (32GB VRAM)
- Environment: NVIDIA Jetson Orin AGX (64GB RAM)
- LLM Server: NVIDIA DGX Spark - GPT-OSS-20B (vLLM)
Architecture
- Algorithm: PPO (Proximal Policy Optimization)
- Policy: MLP with [512, 512] hidden layers
- Observation Space: 82 dimensions (position, velocity, vitals, hotbar, craftable flags)
- Action Space: 37 discrete actions (movement, mining, crafting, inventory)
Usage
from huggingface_hub import hf_hub_download
from stable_baselines3 import PPO
# Download model
hf_hub_download(
repo_id='cahlen/minecraft-learning-distributed_470k',
filename='model.zip',
local_dir='./models'
)
# Load and use
model = PPO.load('./models/model.zip')
# Run inference
obs = env.reset()
action, _ = model.predict(obs, deterministic=True)
Environment Setup
This model was trained on a custom Minecraft environment using:
- Mineflayer for bot control
- Custom Gymnasium wrapper for RL interface
- Vision features extracted from game data (not computer vision)
Training Configuration
PPO(
"MlpPolicy",
env,
learning_rate=1e-3,
n_steps=256,
batch_size=256,
n_epochs=15,
gamma=0.99,
gae_lambda=0.95,
ent_coef=0.02,
clip_range=0.2,
policy_kwargs={"net_arch": [512, 512]},
)
License
MIT
Citation
If you use this model, please cite:
@misc{minecraft_learning_distributed_470k},
author = {cahlen},
title = {minecraft-learning-distributed_470k},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/cahlen/minecraft-learning-distributed_470k}}
}
- Downloads last month
- 7