cpwan commited on
Commit
011b48c
Β·
1 Parent(s): 52933b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md CHANGED
@@ -1,3 +1,88 @@
1
  ---
2
  license: mit
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ pipeline_tag: reinforcement-learning
4
  ---
5
+ # RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research
6
+
7
+ 1️⃣ First work to incorporate end-to-end vehicle routing model in a modern RL platform (CleanRL)
8
+
9
+ ⚑ Speed up the training of Attention Model by 8 times (25hours --> 3 hours)
10
+
11
+ πŸ”Ž A flexible framework for developing *model*, *algorithm*, *environment*, and *search* for operation research
12
+
13
+ ## News
14
+
15
+ - 24/03/2023: We release our paper on [arxiv](https://arxiv.org/abs/2303.13117)!
16
+ - 20/03/2023: We release demo and pretrained checkpoints!
17
+ - 10/03/2023: We release our codebase!
18
+
19
+
20
+ ## Demo
21
+ We provide inference demo on colab notebook:
22
+
23
+ | Environment | Search | Demo |
24
+ | ----------- | ------------ | ------------------------------------------------------------ |
25
+ | TSP | Greedy | <a target="_blank" href="https://colab.research.google.com/github/cpwan/RLOR/blob/main/demo/tsp_search.ipynb"><br/> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/><br/></a> |
26
+ | CVRP | Multi-Greedy | <a target="_blank" href="https://colab.research.google.com/github/cpwan/RLOR/blob/main/demo/cvrp_search.ipynb"><br/> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/><br/></a> |
27
+
28
+
29
+ ## Installation
30
+ ### Conda
31
+ ```shell
32
+ conda env create -n <env name> -f environment.yml
33
+ # The environment.yml was generated from
34
+ # conda env export --no-builds > environment.yml
35
+ ```
36
+ It can take a few minutes.
37
+ ### Optional dependency
38
+ `wandb`
39
+
40
+ Refer to their [quick start guide](https://docs.wandb.ai/quickstart) for installation.
41
+
42
+ ## File structures
43
+ All the major implementations were under [rlor](./rlor) folder.
44
+ ```shell
45
+ ./rlor
46
+ β”œβ”€β”€ envs
47
+ β”‚ β”œβ”€β”€ tsp_data.py # load pre-generated data for evaluation
48
+ β”‚ β”œβ”€β”€ tsp_vector_env.py # define the (vectorized) gym environment
49
+ β”‚ β”œβ”€β”€ cvrp_data.py
50
+ β”‚ └── cvrp_vector_env.py
51
+ β”œβ”€β”€ models
52
+ β”‚ β”œβ”€β”€ attention_model_wrapper.py # wrap refactored attention model to cleanRL
53
+ β”‚ └── nets # contains refactored attention model
54
+ └── ppo_or.py # implementaion of ppo with attention model for operation research problems
55
+ ```
56
+
57
+ The [ppo_or.py](./ppo_or.py) was modified from [cleanrl/ppo.py](https://github.com/vwxyzjn/cleanrl/blob/28fd178ca182bd83c75ed0d49d52e235ca6cdc88/cleanrl/ppo.py). To see what's changed, use diff:
58
+ ```shell
59
+ # apt install diff
60
+ diff --color ppo.py ppo_or.py
61
+ ```
62
+
63
+ ## Training OR model with PPO
64
+ ### TSP
65
+ ```shell
66
+ python ppo_or.py --num-steps 51 --env-id tsp-v0 --env-entry-point envs.tsp_vector_env:TSPVectorEnv --problem tsp
67
+ ```
68
+ ### CVRP
69
+ ```shell
70
+ python ppo_or.py --num-steps 60 --env-id cvrp-v0 --env-entry-point envs.cvrp_vector_env:CVRPVectorEnv --problem cvrp
71
+ ```
72
+ ### Enable WandB
73
+ ```shell
74
+ python ppo_or.py ... --track
75
+ ```
76
+ Add `--track` argument to enable tracking with WandB.
77
+
78
+ ### Where is the tsp data?
79
+ It can be generated from the [official repo](https://github.com/wouterkool/attention-learn-to-route) of the attention-learn-to-route paper. You may modify the [./envs/tsp_data.py](./envs/tsp_data.py) to update the path to data accordingly.
80
+
81
+ # Acknowledgements
82
+ The neural network model is refactored and developed from [Attention, Learn to Solve Routing Problems!](https://github.com/wouterkool/attention-learn-to-route).
83
+
84
+ The idea of multiple trajectory training/ inference is from [POMO: Policy Optimization with Multiple Optima for Reinforcement Learning](https://proceedings.neurips.cc/paper/2020/hash/f231f2107df69eab0a3862d50018a9b2-Abstract.html).
85
+
86
+ The RL environments are defined with [OpenAI Gym](https://github.com/openai/gym/tree/0.23.1).
87
+
88
+ The PPO algorithm implementation is based on [CleanRL](https://github.com/vwxyzjn/cleanrl).