lunar lander model #5, using PPO trained with learning rate 0.0005, gamma 0.995, for 1M timesteps 57e96c5 bguan commited on May 10, 2022
lunar lander model #5, using PPO trained with learning rate 0.0005, gamma 0.995, for 1M timesteps 1e0b940 bguan commited on May 10, 2022