<!-- <center><img src="misc/DROID.png" width="640" style="center"></center> --> | |
[]( | |
[DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras]( | |
Zachary Teed and Jia Deng | |
``` | |
@article{teed2021droid, | |
title={{DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras}}, | |
author={Teed, Zachary and Deng, Jia}, | |
journal={Advances in neural information processing systems}, | |
year={2021} | |
} | |
``` | |
**Initial Code Release:** This repo currently provides a single GPU implementation of our monocular, stereo, and RGB-D SLAM systems. It currently contains demos, training, and evaluation scripts. | |
## Requirements | |
To run the code you will need ... | |
* **Inference:** Running the demos will require a GPU with at least 11G of memory. | |
* **Training:** Training requires a GPU with at least 24G of memory. We train on 4 x RTX-3090 GPUs. | |
## Getting Started | |
1. Clone the repo using the `--recursive` flag | |
```Bash | |
git clone --recursive | |
``` | |
2. Creating a new anaconda environment using the provided .yaml file. Use `environment_novis.yaml` to if you do not want to use the visualization | |
```Bash | |
conda env create -f environment.yaml | |
pip install evo --upgrade --no-binary evo | |
pip install gdown | |
``` | |
3. Compile the extensions (takes about 10 minutes) | |
```Bash | |
python install | |
``` | |
## Demos | |
1. Download the model from google drive: [droid.pth]( | |
2. Download some sample videos using the provided script. | |
```Bash | |
./tools/ | |
``` | |
Run the demo on any of the samples (all demos can be run on a GPU with 11G of memory). While running, press the "s" key to increase the filtering threshold (= more points) and "a" to decrease the filtering threshold (= fewer points). To save the reconstruction with full resolution depth maps use the `--reconstruction_path` flag. | |
```Python | |
python --imagedir=data/abandonedfactory --calib=calib/tartan.txt --stride=2 | |
``` | |
```Python | |
python --imagedir=data/sfm_bench/rgb --calib=calib/eth.txt | |
``` | |
```Python | |
python --imagedir=data/Barn --calib=calib/barn.txt --stride=1 --backend_nms=4 | |
``` | |
```Python | |
python --imagedir=data/mav0/cam0/data --calib=calib/euroc.txt --t0=150 | |
``` | |
```Python | |
python --imagedir=data/rgbd_dataset_freiburg3_cabinet/rgb --calib=calib/tum3.txt | |
``` | |
**Running on your own data:** All you need is a calibration file. Calibration files are in the form | |
``` | |
fx fy cx cy [k1 k2 p1 p2 [ k3 [ k4 k5 k6 ]]] | |
``` | |
with parameters in brackets optional. | |
## Evaluation | |
We provide evaluation scripts for TartanAir, EuRoC, and TUM. EuRoC and TUM can be run on a 1080Ti. The TartanAir and ETH will require 24G of memory. | |
### TartanAir (Mono + Stereo) | |
Download the [TartanAir]( dataset using the script `thirdparty/tartanair_tools/` and put them in `datasets/TartanAir` | |
```Bash | |
./tools/ --plot_curve # monocular eval | |
./tools/ --plot_curve --stereo # stereo eval | |
``` | |
### EuRoC (Mono + Stereo) | |
Download the [EuRoC]( sequences (ASL format) and put them in `datasets/EuRoC` | |
```Bash | |
./tools/ # monocular eval | |
./tools/ --stereo # stereo eval | |
``` | |
### TUM-RGBD (Mono) | |
Download the fr1 sequences from [TUM-RGBD]( and put them in `datasets/TUM-RGBD` | |
```Bash | |
./tools/ # monocular eval | |
``` | |
### ETH3D (RGB-D) | |
Download the [ETH3D]( dataset | |
```Bash | |
./tools/ # RGB-D eval | |
``` | |
## Training | |
First download the TartanAir dataset. The download script can be found in `thirdparty/tartanair_tools/`. You will only need the `rgb` and `depth` data. | |
``` | |
python --rgb --depth | |
``` | |
You can then run the training script. We use 4x3090 RTX GPUs for training which takes approximatly 1 week. If you use a different number of GPUs, adjust the learning rate accordingly. | |
**Note:** On the first training run, covisibility is computed between all pairs of frames. This can take several hours, but the results are cached so that future training runs will start immediately. | |
``` | |
python --datapath=<path to tartanair> --gpus=4 --lr=0.00025 | |
``` | |
## Acknowledgements | |
Data from [TartanAir]( was used to train our model. We additionally use evaluation tools from [evo]( and [tartanair_tools]( | |