ppo-Pixelcopter-PLE-v0 / config.json

Commit History

SB3 PPO. Vectorized 16 env. ~ 9_000_000 timesteps of training. mean_reward=163 +/- 103 . Training for an additional 50_000_000 timesteps resulted in a worse reward when evaluating
28a0b97

CoreyMorris commited on