File size: 11,755 Bytes

1ec1c03

{
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "### The environment 🎮\n",
        "\n",
        "- https://gymnasium.farama.org/environments/classic_control/mountain_car/\n",
        "\n",
        "### The library used 📚\n",
        "\n",
        "- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/)"
      ],
      "metadata": {
        "id": "x7oR6R-ZIbeS"
      }
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jeDAH0h0EBiG"
      },
      "source": [
        "## Install dependencies and create a virtual screen 🔽\n"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!apt install swig cmake"
      ],
      "metadata": {
        "id": "yQIGLPDkGhgG"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9XaULfDZDvrC"
      },
      "outputs": [],
      "source": [
        "!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n",
        "\n",
        "Hence the following cell will install virtual screen libraries and create and run a virtual screen 🖥"
      ],
      "metadata": {
        "id": "BEKeXQJsQCYm"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!sudo apt-get update\n",
        "!sudo apt-get install -y python3-opengl\n",
        "!apt install ffmpeg\n",
        "!apt install xvfb\n",
        "!pip3 install pyvirtualdisplay"
      ],
      "metadata": {
        "id": "j5f2cGkdP-mb"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "To make sure the new installed libraries are used, **sometimes it's required to restart the notebook runtime**. The next cell will force the **runtime to crash, so you'll need to connect again and run the code starting from here**. Thanks to this trick, **we will be able to run our virtual screen.**"
      ],
      "metadata": {
        "id": "TCwBTAwAW9JJ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "os.kill(os.getpid(), 9)"
      ],
      "metadata": {
        "id": "cYvkbef7XEMi"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Virtual display\n",
        "from pyvirtualdisplay import Display\n",
        "\n",
        "virtual_display = Display(visible=0, size=(1400, 900))\n",
        "virtual_display.start()"
      ],
      "metadata": {
        "id": "BE5JWP5rQIKf"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wrgpVFqyENVf"
      },
      "source": [
        "## Import the packages 📦\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "cygWLPGsEQ0m"
      },
      "outputs": [],
      "source": [
        "import gymnasium\n",
        "\n",
        "from huggingface_sb3 import load_from_hub, package_to_hub\n",
        "from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.\n",
        "\n",
        "from stable_baselines3 import PPO\n",
        "from stable_baselines3.common.env_util import make_vec_env\n",
        "from stable_baselines3.common.evaluation import evaluate_policy\n",
        "from stable_baselines3.common.monitor import Monitor"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "w7vOFlpA_ONz"
      },
      "outputs": [],
      "source": [
        "import gymnasium as gym\n",
        "\n",
        "# First, we create our environment\n",
        "env = gym.make(\"ALE/Tetris-v5\")\n",
        "\n",
        "# Then we reset this environment\n",
        "observation, info = env.reset()\n",
        "\n",
        "for _ in range(20):\n",
        "  # Take a random action\n",
        "  action = env.action_space.sample()\n",
        "  print(\"Action taken:\", action)\n",
        "\n",
        "  # Do this action in the environment and get\n",
        "  # next_state, reward, terminated, truncated and info\n",
        "  observation, reward, terminated, truncated, info = env.step(action)\n",
        "\n",
        "  # If the game is terminated (in our case we land, crashed) or truncated (timeout)\n",
        "  if terminated or truncated:\n",
        "      # Reset the environment\n",
        "      print(\"Environment is reset\")\n",
        "      observation, info = env.reset()\n",
        "\n",
        "env.close()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "poLBgRocF9aT"
      },
      "source": [
        "Let's see what the Environment looks like:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ZNPG0g_UGCfh"
      },
      "outputs": [],
      "source": [
        "# We create our environment with gym.make(\"<name_of_the_environment>\")\n",
        "env = gym.make(\"ALE/Tetris-v5\")\n",
        "env.reset()\n",
        "print(\"_____OBSERVATION SPACE_____ \\n\")\n",
        "print(\"Observation Space Shape\", env.observation_space.shape)\n",
        "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "We5WqOBGLoSm"
      },
      "outputs": [],
      "source": [
        "print(\"\\n _____ACTION SPACE_____ \\n\")\n",
        "print(\"Action Space Shape\", env.action_space.n)\n",
        "print(\"Action Space Sample\", env.action_space.sample()) # Take a random action"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dFD9RAFjG8aq"
      },
      "source": [
        "#### Vectorized Environment\n",
        "\n",
        "- We create a vectorized environment (a method for stacking multiple independent environments into a single environment) of 16 environments, this way, **we'll have more diverse experiences during the training.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "99hqQ_etEy1N"
      },
      "outputs": [],
      "source": [
        "# Create the environment\n",
        "env = make_vec_env('ALE/Tetris-v5', n_envs=16)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QAN7B0_HCVZC"
      },
      "source": [
        "#### Model and hyperparameters"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "543OHYDfcjK4"
      },
      "outputs": [],
      "source": [
        "model = PPO(\n",
        "    policy = 'MlpPolicy',\n",
        "    env = env,\n",
        "    n_steps = 1024,\n",
        "    batch_size = 64,\n",
        "    n_epochs = 4,\n",
        "    gamma = 0.99,\n",
        "    gae_lambda = 0.98,\n",
        "    ent_coef = 0.01,\n",
        "    verbose=1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ClJJk88yoBUi"
      },
      "source": [
        "## Train the PPO agent 🏃\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "poBCy9u_csyR"
      },
      "outputs": [],
      "source": [
        "model.learn(total_timesteps=100000)\n",
        "# Save the model\n",
        "model_name = \"Tetris-v5\"\n",
        "model.save(model_name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BqPKw3jt_pG5"
      },
      "source": [
        "#### Evaluate"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zpz8kHlt_a_m"
      },
      "outputs": [],
      "source": [
        "#@title\n",
        "eval_env = Monitor(gym.make(\"ALE/Tetris-v5\"))\n",
        "mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)\n",
        "print(f\"mean_reward={mean_reward:.2f} +/- {std_reward}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Upload to hub"
      ],
      "metadata": {
        "id": "7YFBLHXDPuH5"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "GZiFBBlzxzxY"
      },
      "outputs": [],
      "source": [
        "notebook_login()\n",
        "!git config --global credential.helper store"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import gymnasium as gym\n",
        "\n",
        "from stable_baselines3 import PPO\n",
        "from stable_baselines3.common.vec_env import DummyVecEnv\n",
        "from stable_baselines3.common.env_util import make_vec_env\n",
        "\n",
        "from huggingface_sb3 import package_to_hub\n",
        "\n",
        "# PLACE the variables you've just defined two cells above\n",
        "# Define the name of the environment\n",
        "env_id = \"ALE/Tetris-v5\"\n",
        "\n",
        "# TODO: Define the model architecture we used\n",
        "model_architecture = \"PPO\"\n",
        "\n",
        "## Define a repo_id\n",
        "## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name}\n",
        "## CHANGE WITH YOUR REPO ID\n",
        "repo_id = \"chirbard/ppo-Tetris-v5\" # Change with your repo id, you can't push with mine 😄\n",
        "\n",
        "## Define the commit message\n",
        "commit_message = \"Upload PPO Tetris-v5 trained agent\"\n",
        "\n",
        "# Create the evaluation env and set the render_mode=\"rgb_array\"\n",
        "eval_env = DummyVecEnv([lambda: gym.make(env_id, render_mode=\"rgb_array\")])\n",
        "\n",
        "# PLACE the package_to_hub function you've just filled here\n",
        "package_to_hub(model=model, # Our trained model\n",
        "               model_name=model_name, # The name of our trained model\n",
        "               model_architecture=model_architecture, # The model architecture we used: in our case PPO\n",
        "               env_id=env_id, # Name of the environment\n",
        "               eval_env=eval_env, # Evaluation Environment\n",
        "               repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name}\n",
        "               commit_message=commit_message)\n"
      ],
      "metadata": {
        "id": "I2E--IJu8JYq"
      },
      "execution_count": null,
      "outputs": []
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "private_outputs": true,
      "provenance": [],
      "collapsed_sections": [
        "QAN7B0_HCVZC",
        "BqPKw3jt_pG5"
      ]
    },
    "gpuClass": "standard",
    "kernelspec": {
      "display_name": "Python 3.9.7",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.9.7"
    },
    "vscode": {
      "interpreter": {
        "hash": "ed7f8024e43d3b8f5ca3c5e1a8151ab4d136b3ecee1e3fd59e0766ccc55e1b10"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}