PPO - Ant-v4 🌟

A Proximal Policy Optimization (PPO) agent trained with stable-baselines3 on the MuJoCo Ant-v4 environment.

Details
Environment gymnasium==0.29 & mujoco==2.3 (Ant-v4)
Algorithm PPO (stable-baselines3==2.3.0)
Timesteps 100 000
Policy MlpPolicy (2 Γ— 64 hidden, tanh)
Return (mean Β± std) ~ 964
Seed 0

Hyper-parameters

{
  "n_steps": 128,
  "batch_size": 64,
  "n_epochs": 20,
  "gamma": 0.99,
  "learning_rate": 3e-4,     
  "ent_coef": 0.0,           
  "clip_range": 0.2
}
Downloads last month
3
Video Preview
loading