File size: 3,830 Bytes
faa571f
 
011b48c
faa571f
011b48c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: mit
pipeline_tag: reinforcement-learning
---
# RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research

1️⃣ First work to incorporate end-to-end vehicle routing model in a modern RL platform (CleanRL)

⚑ Speed up the training of Attention Model by 8 times (25hours --> 3 hours)

πŸ”Ž A flexible framework for developing *model*, *algorithm*, *environment*, and *search* for operation research

## News

- 24/03/2023: We release our paper on [arxiv](https://arxiv.org/abs/2303.13117)!
- 20/03/2023: We release demo and pretrained checkpoints!
- 10/03/2023: We release our codebase!


## Demo
We provide inference demo on colab notebook:

| Environment | Search       | Demo                                                         |
| ----------- | ------------ | ------------------------------------------------------------ |
| TSP         | Greedy       | <a target="_blank" href="https://colab.research.google.com/github/cpwan/RLOR/blob/main/demo/tsp_search.ipynb"><br/>  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/><br/></a> |
| CVRP        | Multi-Greedy | <a target="_blank" href="https://colab.research.google.com/github/cpwan/RLOR/blob/main/demo/cvrp_search.ipynb"><br/>  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/><br/></a> |


## Installation
### Conda
```shell
conda env create -n <env name> -f environment.yml
# The environment.yml was generated from
# conda env export --no-builds > environment.yml
```
It can take a few minutes.
### Optional dependency
`wandb`

Refer to their [quick start guide](https://docs.wandb.ai/quickstart) for installation.

## File structures
All the major implementations were under [rlor](./rlor) folder.
```shell
./rlor
β”œβ”€β”€ envs
β”‚   β”œβ”€β”€ tsp_data.py # load pre-generated data for evaluation
β”‚   β”œβ”€β”€ tsp_vector_env.py # define the (vectorized) gym environment
β”‚   β”œβ”€β”€ cvrp_data.py 
β”‚   └── cvrp_vector_env.py 
β”œβ”€β”€ models
β”‚   β”œβ”€β”€ attention_model_wrapper.py # wrap refactored attention model to cleanRL
β”‚   └── nets # contains refactored attention model
└── ppo_or.py # implementaion of ppo with attention model for operation research problems
```

The [ppo_or.py](./ppo_or.py) was modified from [cleanrl/ppo.py](https://github.com/vwxyzjn/cleanrl/blob/28fd178ca182bd83c75ed0d49d52e235ca6cdc88/cleanrl/ppo.py). To see what's changed, use diff:
```shell
# apt install diff
diff --color ppo.py ppo_or.py
```

## Training OR model with PPO
### TSP
```shell
python ppo_or.py --num-steps 51 --env-id tsp-v0 --env-entry-point envs.tsp_vector_env:TSPVectorEnv --problem tsp
```
### CVRP
```shell
python ppo_or.py --num-steps 60 --env-id cvrp-v0 --env-entry-point envs.cvrp_vector_env:CVRPVectorEnv --problem cvrp
```
### Enable WandB
```shell
python ppo_or.py ... --track
```
Add `--track` argument to enable tracking with WandB.

### Where is the tsp data?
It can be generated from the [official repo](https://github.com/wouterkool/attention-learn-to-route) of the attention-learn-to-route paper. You may modify the [./envs/tsp_data.py](./envs/tsp_data.py) to update the path to data accordingly.

# Acknowledgements
The neural network model is refactored and developed from [Attention, Learn to Solve Routing Problems!](https://github.com/wouterkool/attention-learn-to-route).

The idea of multiple trajectory training/ inference is from [POMO: Policy Optimization with Multiple Optima for Reinforcement Learning](https://proceedings.neurips.cc/paper/2020/hash/f231f2107df69eab0a3862d50018a9b2-Abstract.html).

The RL environments are defined with [OpenAI Gym](https://github.com/openai/gym/tree/0.23.1).

The PPO algorithm implementation is based on [CleanRL](https://github.com/vwxyzjn/cleanrl).