Spaces:
Running
on
Zero
Running
on
Zero
| <div align="center"> | |
| <h1>Depth Anything V2</h1> | |
| [**Lihe Yang**](https://liheyoung.github.io/)<sup>1</sup> 路 [**Bingyi Kang**](https://bingykang.github.io/)<sup>2†</sup> 路 [**Zilong Huang**](http://speedinghzl.github.io/)<sup>2</sup> | |
| <br> | |
| [**Zhen Zhao**](http://zhaozhen.me/) 路 [**Xiaogang Xu**](https://xiaogang00.github.io/) 路 [**Jiashi Feng**](https://sites.google.com/site/jshfeng/)<sup>2</sup> 路 [**Hengshuang Zhao**](https://hszhao.github.io/)<sup>1*</sup> | |
| <sup>1</sup>HKU   <sup>2</sup>TikTok | |
| <br> | |
| †project lead *corresponding author | |
| <a href="https://arxiv.org/abs/2406.09414"><img src='https://img.shields.io/badge/arXiv-Depth Anything V2-red' alt='Paper PDF'></a> | |
| <a href='https://depth-anything-v2.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything V2-green' alt='Project Page'></a> | |
| <a href='https://huggingface.co/spaces/depth-anything/Depth-Anything-V2'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> | |
| <a href='https://huggingface.co/datasets/depth-anything/DA-2K'><img src='https://img.shields.io/badge/Benchmark-DA--2K-yellow' alt='Benchmark'></a> | |
| </div> | |
| This work presents Depth Anything V2. It significantly outperforms [V1](https://github.com/LiheYoung/Depth-Anything) in fine-grained details and robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, and higher depth accuracy. | |
|  | |
| ## News | |
| - **2024-06-14:** Paper, project page, code, models, demo, and benchmark are all released. | |
| ## Pre-trained Models | |
| We provide **four models** of varying scales for robust relative depth estimation: | |
| | Model | Params | Checkpoint | | |
| |:-|-:|:-:| | |
| | Depth-Anything-V2-Small | 24.8M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true) | | |
| | Depth-Anything-V2-Base | 97.5M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Base/resolve/main/depth_anything_v2_vitb.pth?download=true) | | |
| | Depth-Anything-V2-Large | 335.3M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true) | | |
| | Depth-Anything-V2-Giant | 1.3B | Coming soon | | |
| ### Code snippet to use our models | |
| ```python | |
| import cv2 | |
| import torch | |
| from depth_anything_v2.dpt import DepthAnythingV2 | |
| # take depth-anything-v2-large as an example | |
| model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024]) | |
| model.load_state_dict(torch.load('checkpoints/depth_anything_v2_vitl.pth', map_location='cpu')) | |
| model.eval() | |
| raw_img = cv2.imread('your/image/path') | |
| depth = model.infer_image(raw_img) # HxW raw depth map | |
| ``` | |
| ## Usage | |
| ### Installation | |
| ```bash | |
| git clone https://github.com/DepthAnything/Depth-Anything-V2 | |
| cd Depth-Anything-V2 | |
| pip install -r requirements.txt | |
| ``` | |
| ### Running | |
| ```bash | |
| python run.py --encoder <vits | vitb | vitl | vitg> --img-path <path> --outdir <outdir> [--input-size <size>] [--pred-only] [--grayscale] | |
| ``` | |
| Options: | |
| - `--img-path`: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths. | |
| - `--input-size` (optional): By default, we use input size `518` for model inference. **You can increase the size for even more fine-grained results.** | |
| - `--pred-only` (optional): Only save the predicted depth map, without raw image. | |
| - `--grayscale` (optional): Save the grayscale depth map, without applying color palette. | |
| For example: | |
| ```bash | |
| python run.py --encoder vitl --img-path assets/examples --outdir depth_vis | |
| ``` | |
| **If you want to use Depth Anything V2 on videos:** | |
| ```bash | |
| python run_video.py --encoder vitl --video-path assets/examples_video --outdir video_depth_vis | |
| ``` | |
| *Please note that our larger model has better temporal consistency on videos.* | |
| ### Gradio demo | |
| To use our gradio demo locally: | |
| ```bash | |
| python app.py | |
| ``` | |
| You can also try our [online demo](https://huggingface.co/spaces/Depth-Anything/Depth-Anything-V2). | |
| **Note:** Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this [issue](https://github.com/LiheYoung/Depth-Anything/issues/81)). In V1, we *unintentionally* used features from the last four layers of DINOv2 for decoding. In V2, we use [intermediate features](https://github.com/DepthAnything/Depth-Anything-V2/blob/2cbc36a8ce2cec41d38ee51153f112e87c8e42d8/depth_anything_v2/dpt.py#L164-L169) instead. Although this modification did not improve details or accuracy, we decided to follow this common practice. | |
| ## Fine-tuned to Metric Depth Estimation | |
| Please refer to [metric depth estimation](./metric_depth). | |
| ## DA-2K Evaluation Benchmark | |
| Please refer to [DA-2K benchmark](./DA-2K.md). | |
| ## LICENSE | |
| Depth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base/Large/Giant models are under the CC-BY-NC-4.0 license. | |
| ## Citation | |
| If you find this project useful, please consider citing: | |
| ```bibtex | |
| @article{depth_anything_v2, | |
| title={Depth Anything V2}, | |
| author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang}, | |
| journal={arXiv:2406.09414}, | |
| year={2024} | |
| } | |
| @inproceedings{depth_anything_v1, | |
| title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, | |
| author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang}, | |
| booktitle={CVPR}, | |
| year={2024} | |
| } | |
| ``` |