Spaces:
Sleeping
Sleeping
title: Negatively Correlated Ensemble RL | |
emoji: 🌹 | |
colorFrom: red | |
colorTo: yellow | |
sdk: gradio | |
python_version: 3.9 | |
app_file: app.py | |
pinned: false | |
 | |
# Negatively Correlated Ensemble RL | |
## 环境安装 | |
创建conda环境 | |
```bash | |
conda create -n ncerl python=3.9 | |
``` | |
切换conda环境 | |
``` | |
conda activate ncerl | |
``` | |
安装环境依赖 | |
```bash | |
pip install -r requirements.txt | |
``` | |
注:该程序不需要您使用任何显卡,但是需要安装pytorch。如果您的显卡支持cuda,那么请安装cuda版本,否则安装cpu版本。使用cuda版本可以提高推理速度。 | |
## 快速开始 | |
如果您想查看效果,可以通过 | |
``` | |
python app.py | |
``` | |
后打开命令行显示连接互动查看。 | |
也可以通过运行 | |
``` | |
python generate_and_play.py | |
``` | |
后查看`models/example_policy/samples.png`查看生成效果。 | |
## 开始训练 | |
All training are launched by running `train.py` with option and arguments. For example, execute `python train.py ncesac --lbd 0.3 --m 5` will train NCERL with hyperparameters set as $\lambda = 0.3, m=5$. | |
Plot script is `plots.py` | |
* `python train.py gan`: to train a decoder which maps a continuous action to a game level segment. | |
* `python train.py sac`: to train a standard SAC as the policy for online game level generation | |
* `python train.py asyncsac`: to train a SAC with an asynchronous evaluation environment as the policy for online game level generation | |
* `python train.py ncesac`: to train an NCERL based on SAC as the policy for online game level generation | |
* `python train.py egsac`: to train an episodic generative SAC (see paper [*The fun facets of Mario: Multifaceted experience-driven PCG via reinforcement learning*](https://dl.acm.org/doi/abs/10.1145/3555858.3563282)) as the policy for online game level generation | |
* `python train.py pmoe`: to train an episodic generative SAC (see paper [*Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning*](https://arxiv.org/abs/2104.09122)) as the policy for online game level generation | |
* `python train.py sunrise`: to train a SUNRISE (see paper [*SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning*](https://proceedings.mlr.press/v139/lee21g.html)) as the policy for online game level generation | |
* `python train.py dvd`: to train a DvD-SAC (see paper [*Effective Diversity in Population Based Reinforcement Learning*](https://proceedings.neurips.cc/paper_files/paper/2020/hash/d1dc3a8270a6f9394f88847d7f0050cf-Abstract.html)) as the policy for online game level generation | |
For the training arguments, please refer to the help `python train.py [option] --help` | |
## 目录结构 | |
``` | |
NCERL-DIVERSE-PCG/ | |
* analysis/ | |
* generate.py 未使用 | |
* tests.py 做evaluation使用 | |
* media/ markdown素材文件 | |
* models/ | |
* example_policy/ 做生成展示使用 | |
* smb/ 马里奥仿真以及图片资源数据 | |
* src/ | |
* ddpm/ ddpm模型相关目录 | |
* drl/ drl模型、训练目录 | |
* env/ 马里奥gym环境和reward function | |
* gan/ gan模型、训练目录 | |
* olgen/ 在线生成环境与policy目录 | |
* rlkit/ 强化学习使用部件目录 | |
* smb/ 马里奥与仿真器交互组件以及多进程异步池组件 | |
* utils/ 一些功能性文件 | |
* training_data/ 训练数据 | |
* README.md 当前文件 | |
* app.py 用于gradio展示用途文件 | |
* generate_and_play.py 用于非gradio展示文件 | |
* train.py 训练文件 | |
* test_ddpm.py 测试训练ddpm文件 | |
* requirements.txt 环境依赖文件 | |
``` | |