metadata
tags:
- LunarLander-v2
- ppo
- deep-reinforcement-learning
- reinforcement-learning
- custom-implementation
- deep-rl-course
model-index:
- name: PPO
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: LunarLander-v2
type: LunarLander-v2
metrics:
- type: mean_reward
value: 35.48 +/- 126.92
name: mean_reward
verified: false
PPO Agent Playing LunarLander-v2
A PPO agent playing LunarLander-v2 but decides to go for a walk instead.
Do not download it if you are looking for an agent that follows the plan.
Hyperparameters
{"exp_name": "ppo"
"seed": 1
"torch_deterministic": true
"cuda": true
"track": false
"wandb_project_name": "cleanRL"
"wandb_entity": null
"capture_video": false
"env_id": "LunarLander-v2"
"total_timesteps": 1000000
"learning_rate": 0.00025
"num_envs": 4
"num_steps": 128
"anneal_lr": true
"gae": true
"gamma": 0.99
"gae_lambda": 0.95
"num_minibatches": 4
"update_epochs": 4
"norm_adv": true
"clip_coef": 0.2
"clip_vloss": true
"ent_coef": 0.01
"vf_coef": 0.5
"max_grad_norm": 0.5
"target_kl": null
"repo_id": "salym/PPO-CleanRL-LunarLander-v2"
"batch_size": 512
"minibatch_size": 128}