PPO - Ant-v4 π
A Proximal Policy Optimization (PPO) agent trained with stable-baselines3 on the MuJoCo Ant-v4
environment.
Details | |
---|---|
Environment | gymnasium==0.29 & mujoco==2.3 (Ant-v4 ) |
Algorithm | PPO (stable-baselines3==2.3.0 ) |
Timesteps | 100 000 |
Policy | MlpPolicy (2 Γ 64 hidden, tanh) |
Return (mean Β± std) | ~ 964 |
Seed | 0 |
Hyper-parameters
{
"n_steps": 128,
"batch_size": 64,
"n_epochs": 20,
"gamma": 0.99,
"learning_rate": 3e-4,
"ent_coef": 0.0,
"clip_range": 0.2
}
- Downloads last month
- 3