Spaces:
Sleeping
Sleeping
File size: 3,509 Bytes
2801e45 3582c8a 4ff5145 3582c8a 9886f01 3582c8a 4ff5145 9886f01 eaf2e33 3582c8a eaf2e33 3582c8a eaf2e33 3582c8a eaf2e33 3582c8a eaf2e33 3582c8a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
---
title: Negatively Correlated Ensemble RL
emoji: 🌹
colorFrom: red
colorTo: yellow
sdk: gradio
python_version: 3.9
app_file: app.py
pinned: false
---

# Negatively Correlated Ensemble RL
## 环境安装
创建conda环境
```bash
conda create -n ncerl python=3.9
```
切换conda环境
```
conda activate ncerl
```
安装环境依赖
```bash
pip install -r requirements.txt
```
注:该程序不需要您使用任何显卡,但是需要安装pytorch。如果您的显卡支持cuda,那么请安装cuda版本,否则安装cpu版本。使用cuda版本可以提高推理速度。
## 快速开始
如果您想查看效果,可以通过
```
python app.py
```
后打开命令行显示连接互动查看。
也可以通过运行
```
python generate_and_play.py
```
后查看`models/example_policy/samples.png`查看生成效果。
## 开始训练
All training are launched by running `train.py` with option and arguments. For example, execute `python train.py ncesac --lbd 0.3 --m 5` will train NCERL with hyperparameters set as $\lambda = 0.3, m=5$.
Plot script is `plots.py`
* `python train.py gan`: to train a decoder which maps a continuous action to a game level segment.
* `python train.py sac`: to train a standard SAC as the policy for online game level generation
* `python train.py asyncsac`: to train a SAC with an asynchronous evaluation environment as the policy for online game level generation
* `python train.py ncesac`: to train an NCERL based on SAC as the policy for online game level generation
* `python train.py egsac`: to train an episodic generative SAC (see paper [*The fun facets of Mario: Multifaceted experience-driven PCG via reinforcement learning*](https://dl.acm.org/doi/abs/10.1145/3555858.3563282)) as the policy for online game level generation
* `python train.py pmoe`: to train an episodic generative SAC (see paper [*Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning*](https://arxiv.org/abs/2104.09122)) as the policy for online game level generation
* `python train.py sunrise`: to train a SUNRISE (see paper [*SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning*](https://proceedings.mlr.press/v139/lee21g.html)) as the policy for online game level generation
* `python train.py dvd`: to train a DvD-SAC (see paper [*Effective Diversity in Population Based Reinforcement Learning*](https://proceedings.neurips.cc/paper_files/paper/2020/hash/d1dc3a8270a6f9394f88847d7f0050cf-Abstract.html)) as the policy for online game level generation
For the training arguments, please refer to the help `python train.py [option] --help`
## 目录结构
```
NCERL-DIVERSE-PCG/
* analysis/
* generate.py 未使用
* tests.py 做evaluation使用
* media/ markdown素材文件
* models/
* example_policy/ 做生成展示使用
* smb/ 马里奥仿真以及图片资源数据
* src/
* ddpm/ ddpm模型相关目录
* drl/ drl模型、训练目录
* env/ 马里奥gym环境和reward function
* gan/ gan模型、训练目录
* olgen/ 在线生成环境与policy目录
* rlkit/ 强化学习使用部件目录
* smb/ 马里奥与仿真器交互组件以及多进程异步池组件
* utils/ 一些功能性文件
* training_data/ 训练数据
* README.md 当前文件
* app.py 用于gradio展示用途文件
* generate_and_play.py 用于非gradio展示文件
* train.py 训练文件
* test_ddpm.py 测试训练ddpm文件
* requirements.txt 环境依赖文件
```
|