### The environment 🎮

- https://gymnasium.farama.org/environments/classic_control/mountain_car/

### The library used 📚

- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/)

## Install dependencies and create a virtual screen 🔽


In [None]:
!apt install swig cmake

In [None]:
!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt

During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).

Hence the following cell will install virtual screen libraries and create and run a virtual screen 🖥

In [None]:
!sudo apt-get update
!sudo apt-get install -y python3-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay

To make sure the new installed libraries are used, **sometimes it's required to restart the notebook runtime**. The next cell will force the **runtime to crash, so you'll need to connect again and run the code starting from here**. Thanks to this trick, **we will be able to run our virtual screen.**

In [None]:
import os
os.kill(os.getpid(), 9)

In [None]:
# Virtual display
from pyvirtualdisplay import Display

virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

## Import the packages 📦




In [None]:
import gymnasium

from huggingface_sb3 import load_from_hub, package_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.monitor import Monitor

In [None]:
import gymnasium as gym

# First, we create our environment
env = gym.make("ALE/Tetris-v5")

# Then we reset this environment
observation, info = env.reset()

for _ in range(20):
 # Take a random action
 action = env.action_space.sample()
 print("Action taken:", action)

 # Do this action in the environment and get
 # next_state, reward, terminated, truncated and info
 observation, reward, terminated, truncated, info = env.step(action)

 # If the game is terminated (in our case we land, crashed) or truncated (timeout)
 if terminated or truncated:
 # Reset the environment
 print("Environment is reset")
 observation, info = env.reset()

env.close()

Let's see what the Environment looks like:


In [None]:
# We create our environment with gym.make("")
env = gym.make("ALE/Tetris-v5")
env.reset()
print("_____OBSERVATION SPACE_____ \n")
print("Observation Space Shape", env.observation_space.shape)
print("Sample observation", env.observation_space.sample()) # Get a random observation

In [None]:
print("\n _____ACTION SPACE_____ \n")
print("Action Space Shape", env.action_space.n)
print("Action Space Sample", env.action_space.sample()) # Take a random action

#### Vectorized Environment

- We create a vectorized environment (a method for stacking multiple independent environments into a single environment) of 16 environments, this way, **we'll have more diverse experiences during the training.**

In [None]:
# Create the environment
env = make_vec_env('ALE/Tetris-v5', n_envs=16)

#### Model and hyperparameters

In [None]:
model = PPO(
 policy = 'MlpPolicy',
 env = env,
 n_steps = 1024,
 batch_size = 64,
 n_epochs = 4,
 gamma = 0.99,
 gae_lambda = 0.98,
 ent_coef = 0.01,
 verbose=1)

## Train the PPO agent 🏃


In [None]:
model.learn(total_timesteps=100000)
# Save the model
model_name = "Tetris-v5"
model.save(model_name)

#### Evaluate

In [None]:
#@title
eval_env = Monitor(gym.make("ALE/Tetris-v5"))
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

#### Upload to hub

In [None]:
notebook_login()
!git config --global credential.helper store

In [None]:
import gymnasium as gym

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_util import make_vec_env

from huggingface_sb3 import package_to_hub

# PLACE the variables you've just defined two cells above
# Define the name of the environment
env_id = "ALE/Tetris-v5"

# TODO: Define the model architecture we used
model_architecture = "PPO"

## Define a repo_id
## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name}
## CHANGE WITH YOUR REPO ID
repo_id = "chirbard/ppo-Tetris-v5" # Change with your repo id, you can't push with mine 😄

## Define the commit message
commit_message = "Upload PPO Tetris-v5 trained agent"

# Create the evaluation env and set the render_mode="rgb_array"
eval_env = DummyVecEnv([lambda: gym.make(env_id, render_mode="rgb_array")])

# PLACE the package_to_hub function you've just filled here
package_to_hub(model=model, # Our trained model
 model_name=model_name, # The name of our trained model
 model_architecture=model_architecture, # The model architecture we used: in our case PPO
 env_id=env_id, # Name of the environment
 eval_env=eval_env, # Evaluation Environment
 repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name}
 commit_message=commit_message)
