{ "cells": [ { "cell_type": "markdown", "source": [ "### The environment 🎮\n", "\n", "- https://gymnasium.farama.org/environments/classic_control/mountain_car/\n", "\n", "### The library used 📚\n", "\n", "- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/)" ], "metadata": { "id": "x7oR6R-ZIbeS" } }, { "cell_type": "markdown", "metadata": { "id": "jeDAH0h0EBiG" }, "source": [ "## Install dependencies and create a virtual screen 🔽\n" ] }, { "cell_type": "code", "source": [ "!apt install swig cmake" ], "metadata": { "id": "yQIGLPDkGhgG" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9XaULfDZDvrC" }, "outputs": [], "source": [ "!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt" ] }, { "cell_type": "markdown", "source": [ "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n", "\n", "Hence the following cell will install virtual screen libraries and create and run a virtual screen 🖥" ], "metadata": { "id": "BEKeXQJsQCYm" } }, { "cell_type": "code", "source": [ "!sudo apt-get update\n", "!sudo apt-get install -y python3-opengl\n", "!apt install ffmpeg\n", "!apt install xvfb\n", "!pip3 install pyvirtualdisplay" ], "metadata": { "id": "j5f2cGkdP-mb" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "To make sure the new installed libraries are used, **sometimes it's required to restart the notebook runtime**. The next cell will force the **runtime to crash, so you'll need to connect again and run the code starting from here**. Thanks to this trick, **we will be able to run our virtual screen.**" ], "metadata": { "id": "TCwBTAwAW9JJ" } }, { "cell_type": "code", "source": [ "import os\n", "os.kill(os.getpid(), 9)" ], "metadata": { "id": "cYvkbef7XEMi" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Virtual display\n", "from pyvirtualdisplay import Display\n", "\n", "virtual_display = Display(visible=0, size=(1400, 900))\n", "virtual_display.start()" ], "metadata": { "id": "BE5JWP5rQIKf" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "wrgpVFqyENVf" }, "source": [ "## Import the packages 📦\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cygWLPGsEQ0m" }, "outputs": [], "source": [ "import gymnasium\n", "\n", "from huggingface_sb3 import load_from_hub, package_to_hub\n", "from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.\n", "\n", "from stable_baselines3 import PPO\n", "from stable_baselines3.common.env_util import make_vec_env\n", "from stable_baselines3.common.evaluation import evaluate_policy\n", "from stable_baselines3.common.monitor import Monitor" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "w7vOFlpA_ONz" }, "outputs": [], "source": [ "import gymnasium as gym\n", "\n", "# First, we create our environment\n", "env = gym.make(\"ALE/Tetris-v5\")\n", "\n", "# Then we reset this environment\n", "observation, info = env.reset()\n", "\n", "for _ in range(20):\n", " # Take a random action\n", " action = env.action_space.sample()\n", " print(\"Action taken:\", action)\n", "\n", " # Do this action in the environment and get\n", " # next_state, reward, terminated, truncated and info\n", " observation, reward, terminated, truncated, info = env.step(action)\n", "\n", " # If the game is terminated (in our case we land, crashed) or truncated (timeout)\n", " if terminated or truncated:\n", " # Reset the environment\n", " print(\"Environment is reset\")\n", " observation, info = env.reset()\n", "\n", "env.close()" ] }, { "cell_type": "markdown", "metadata": { "id": "poLBgRocF9aT" }, "source": [ "Let's see what the Environment looks like:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZNPG0g_UGCfh" }, "outputs": [], "source": [ "# We create our environment with gym.make(\"\")\n", "env = gym.make(\"ALE/Tetris-v5\")\n", "env.reset()\n", "print(\"_____OBSERVATION SPACE_____ \\n\")\n", "print(\"Observation Space Shape\", env.observation_space.shape)\n", "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "We5WqOBGLoSm" }, "outputs": [], "source": [ "print(\"\\n _____ACTION SPACE_____ \\n\")\n", "print(\"Action Space Shape\", env.action_space.n)\n", "print(\"Action Space Sample\", env.action_space.sample()) # Take a random action" ] }, { "cell_type": "markdown", "metadata": { "id": "dFD9RAFjG8aq" }, "source": [ "#### Vectorized Environment\n", "\n", "- We create a vectorized environment (a method for stacking multiple independent environments into a single environment) of 16 environments, this way, **we'll have more diverse experiences during the training.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "99hqQ_etEy1N" }, "outputs": [], "source": [ "# Create the environment\n", "env = make_vec_env('ALE/Tetris-v5', n_envs=16)" ] }, { "cell_type": "markdown", "metadata": { "id": "QAN7B0_HCVZC" }, "source": [ "#### Model and hyperparameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "543OHYDfcjK4" }, "outputs": [], "source": [ "model = PPO(\n", " policy = 'MlpPolicy',\n", " env = env,\n", " n_steps = 1024,\n", " batch_size = 64,\n", " n_epochs = 4,\n", " gamma = 0.99,\n", " gae_lambda = 0.98,\n", " ent_coef = 0.01,\n", " verbose=1)" ] }, { "cell_type": "markdown", "metadata": { "id": "ClJJk88yoBUi" }, "source": [ "## Train the PPO agent 🏃\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "poBCy9u_csyR" }, "outputs": [], "source": [ "model.learn(total_timesteps=100000)\n", "# Save the model\n", "model_name = \"Tetris-v5\"\n", "model.save(model_name)" ] }, { "cell_type": "markdown", "metadata": { "id": "BqPKw3jt_pG5" }, "source": [ "#### Evaluate" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "zpz8kHlt_a_m" }, "outputs": [], "source": [ "#@title\n", "eval_env = Monitor(gym.make(\"ALE/Tetris-v5\"))\n", "mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)\n", "print(f\"mean_reward={mean_reward:.2f} +/- {std_reward}\")" ] }, { "cell_type": "markdown", "source": [ "#### Upload to hub" ], "metadata": { "id": "7YFBLHXDPuH5" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "GZiFBBlzxzxY" }, "outputs": [], "source": [ "notebook_login()\n", "!git config --global credential.helper store" ] }, { "cell_type": "code", "source": [ "import gymnasium as gym\n", "\n", "from stable_baselines3 import PPO\n", "from stable_baselines3.common.vec_env import DummyVecEnv\n", "from stable_baselines3.common.env_util import make_vec_env\n", "\n", "from huggingface_sb3 import package_to_hub\n", "\n", "# PLACE the variables you've just defined two cells above\n", "# Define the name of the environment\n", "env_id = \"ALE/Tetris-v5\"\n", "\n", "# TODO: Define the model architecture we used\n", "model_architecture = \"PPO\"\n", "\n", "## Define a repo_id\n", "## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name}\n", "## CHANGE WITH YOUR REPO ID\n", "repo_id = \"chirbard/ppo-Tetris-v5\" # Change with your repo id, you can't push with mine 😄\n", "\n", "## Define the commit message\n", "commit_message = \"Upload PPO Tetris-v5 trained agent\"\n", "\n", "# Create the evaluation env and set the render_mode=\"rgb_array\"\n", "eval_env = DummyVecEnv([lambda: gym.make(env_id, render_mode=\"rgb_array\")])\n", "\n", "# PLACE the package_to_hub function you've just filled here\n", "package_to_hub(model=model, # Our trained model\n", " model_name=model_name, # The name of our trained model\n", " model_architecture=model_architecture, # The model architecture we used: in our case PPO\n", " env_id=env_id, # Name of the environment\n", " eval_env=eval_env, # Evaluation Environment\n", " repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name}\n", " commit_message=commit_message)\n" ], "metadata": { "id": "I2E--IJu8JYq" }, "execution_count": null, "outputs": [] } ], "metadata": { "accelerator": "GPU", "colab": { "private_outputs": true, "provenance": [], "collapsed_sections": [ "QAN7B0_HCVZC", "BqPKw3jt_pG5" ] }, "gpuClass": "standard", "kernelspec": { "display_name": "Python 3.9.7", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.9.7" }, "vscode": { "interpreter": { "hash": "ed7f8024e43d3b8f5ca3c5e1a8151ab4d136b3ecee1e3fd59e0766ccc55e1b10" } } }, "nbformat": 4, "nbformat_minor": 0 }