Add comprehensive model card for Multi-View 3D Point Tracking (#1)
Browse files- Add comprehensive model card for Multi-View 3D Point Tracking (027783913e6d93e789b7fe23cd1263b14a415f44)
Co-authored-by: Niels Rogge <[email protected]>
README.md
ADDED
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
pipeline_tag: keypoint-detection
|
4 |
+
library_name: pytorch
|
5 |
+
---
|
6 |
+
|
7 |
+
<div align="center" style="line-height:1.2; margin:0; padding:0;">
|
8 |
+
<h1 style="margin-bottom:0em;">Multi-View 3D Point Tracking</h1>
|
9 |
+
|
10 |
+
<a href="https://huggingface.co/papers/2508.21060"><img src="https://img.shields.io/badge/Paper-2508.21060-b31b1b" alt="Paper"></a>
|
11 |
+
<a href="https://ethz-vlg.github.io/mvtracker/"><img src="https://img.shields.io/badge/Project%20Page-009688?logo=internetcomputer&logoColor=white" alt="Project Page"></a>
|
12 |
+
<a href="https://github.com/ethz-vlg/mvtracker"><img src="https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github&logoColor=white" alt="GitHub"></a>
|
13 |
+
[](#)
|
14 |
+
<br>
|
15 |
+
[**Frano Rajič**](https://m43.github.io/)<sup>1</sup> ·
|
16 |
+
[**Haofei Xu**](https://haofeixu.github.io/)<sup>1</sup> ·
|
17 |
+
[**Marko Mihajlovic**](https://markomih.github.io/)<sup>1</sup> ·
|
18 |
+
[**Siyuan Li**](https://siyuanliii.github.io/)<sup>1</sup> ·
|
19 |
+
[**Irem Demir**](https://github.com/iremddemir)<sup>1</sup>
|
20 |
+
[**Emircan Gündoğdu**](https://github.com/emircangun)<sup>1</sup> ·
|
21 |
+
[**Lei Ke**](https://www.kelei.site/)<sup>2</sup> ·
|
22 |
+
[**Sergey Prokudin**](https://vlg.inf.ethz.ch/team/Dr-Sergey-Prokudin.html)<sup>1,3</sup> ·
|
23 |
+
[**Marc Pollefeys**](https://people.inf.ethz.ch/marc.pollefeys/)<sup>1,4</sup> ·
|
24 |
+
[**Siyu Tang**](https://vlg.inf.ethz.ch/team/Prof-Dr-Siyu-Tang.html)<sup>1</sup>
|
25 |
+
<br>
|
26 |
+
<sup>1</sup>[ETH Zürich](https://vlg.inf.ethz.ch/)  
|
27 |
+
<sup>2</sup>[Carnegie Mellon University](https://www.cmu.edu/)  
|
28 |
+
<sup>3</sup>[Balgrist University Hospital](https://www.balgrist.ch/)  
|
29 |
+
<sup>4</sup>[Microsoft](https://www.microsoft.com/)
|
30 |
+
</div>
|
31 |
+
|
32 |
+
MVTracker is the first **data-driven multi-view 3D point tracker** for tracking arbitrary 3D points across multiple cameras. It fuses multi-view features into a unified 3D feature point cloud, within which it leverages kNN-based correlation to capture spatiotemporal relationships across views. A transformer then iteratively refines the point tracks, handling occlusions and adapting to varying camera setups without per-sequence optimization.
|
33 |
+
|
34 |
+
<p float="left">
|
35 |
+
<img alt="selfcap" src="https://github.com/user-attachments/assets/b502d193-c37c-43be-af6c-653b5de7597e" width="48%" />
|
36 |
+
<img alt="dexycb" src="https://github.com/user-attachments/assets/d14d4c6c-152e-4040-b29b-3da4b7e8b913" width="48%" />
|
37 |
+
<img alt="4d-dress-stretching" src="https://github.com/user-attachments/assets/f3eabdda-59e1-4032-b345-c4603ea86fc0" width="48%" />
|
38 |
+
<img alt="4d-dress-avatarmove" src="https://github.com/user-attachments/assets/3fef9924-84ad-4295-95e2-5b82ae7c3053" width="48%" />
|
39 |
+
</p>
|
40 |
+
|
41 |
+
## Abstract
|
42 |
+
We introduce the first data-driven multi-view 3D point tracker, designed to track arbitrary points in dynamic scenes using multiple camera views. Unlike existing monocular trackers, which struggle with depth ambiguities and occlusion, or prior multi-camera methods that require over 20 cameras and tedious per-sequence optimization, our feed-forward model directly predicts 3D correspondences using a practical number of cameras (e.g., four), enabling robust and accurate online tracking. Given known camera poses and either sensor-based or estimated multi-view depth, our tracker fuses multi-view features into a unified point cloud and applies k-nearest-neighbors correlation alongside a transformer-based update to reliably estimate long-range 3D correspondences, even under occlusion. We train on 5K synthetic multi-view Kubric sequences and evaluate on two real-world benchmarks: Panoptic Studio and DexYCB, achieving median trajectory errors of 3.1 cm and 2.0 cm, respectively. Our method generalizes well to diverse camera setups of 1-8 views with varying vantage points and video lengths of 24-150 frames. By releasing our tracker alongside training and evaluation datasets, we aim to set a new standard for multi-view 3D tracking research and provide a practical tool for real-world applications.
|
43 |
+
|
44 |
+
## Quick Start
|
45 |
+
|
46 |
+
This repo was validated on **Python 3.10.12**, **PyTorch 2.3.0** (CUDA 12.1), **cuDNN 8903**, and **gcc 11.3.0**. If you want a fresh minimal environment that runs the Hub demo and `demo.py`:
|
47 |
+
```bash
|
48 |
+
conda create -n 3dpt python=3.10.12 -y
|
49 |
+
conda activate 3dpt
|
50 |
+
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
|
51 |
+
pip install -r https://raw.githubusercontent.com/ethz-vlg/mvtracker/refs/heads/main/requirements.txt
|
52 |
+
|
53 |
+
# Optional, speeds up the model
|
54 |
+
pip install --upgrade --no-build-isolation flash-attn==2.5.8 # Speeds up attention
|
55 |
+
pip install "git+https://github.com/ethz-vlg/pointcept.git@2082918#subdirectory=libs/pointops" # Speeds up kNN search; may require gcc 11.3.0: conda install -c conda-forge gcc_linux-64=11.3.0 gxx_linux-64=11.3.0 gcc=11.3.0 gxx=11.3.0
|
56 |
+
```
|
57 |
+
|
58 |
+
With the minimal dependencies in place, you can try MVTracker directly via **PyTorch Hub**:
|
59 |
+
```python
|
60 |
+
import torch
|
61 |
+
import numpy as np
|
62 |
+
from huggingface_hub import hf_hub_download
|
63 |
+
|
64 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
65 |
+
mvtracker = torch.hub.load("ethz-vlg/mvtracker", "mvtracker", pretrained=True, device=device)
|
66 |
+
|
67 |
+
# Example input from demo sample (downloaded automatically)
|
68 |
+
sample = np.load(hf_hub_download("ethz-vlg/mvtracker", "data_sample.npz"))
|
69 |
+
rgbs = torch.from_numpy(sample["rgbs"]).float()
|
70 |
+
depths = torch.from_numpy(sample["depths"]).float()
|
71 |
+
intrs = torch.from_numpy(sample["intrs"]).float()
|
72 |
+
extrs = torch.from_numpy(sample["extrs"]).float()
|
73 |
+
query_points = torch.from_numpy(sample["query_points"]).float()
|
74 |
+
|
75 |
+
with torch.no_grad():
|
76 |
+
results = mvtracker(
|
77 |
+
rgbs=rgbs[None].to(device) / 255.0,
|
78 |
+
depths=depths[None].to(device),\
|
79 |
+
intrs=intrs[None].to(device),
|
80 |
+
extrs=extrs[None].to(device),
|
81 |
+
query_points_3d=query_points[None].to(device),
|
82 |
+
)
|
83 |
+
|
84 |
+
pred_tracks = results["traj_e"].cpu() # [T,N,3]
|
85 |
+
pred_vis = results["vis_e"].cpu() # [T,N]
|
86 |
+
print(pred_tracks.shape, pred_vis.shape)
|
87 |
+
```
|
88 |
+
|
89 |
+
## Citation
|
90 |
+
If you find our repository useful, please consider giving it a star ⭐ and citing our work:
|
91 |
+
```bibtex
|
92 |
+
@inproceedings{rajic2025mvtracker,
|
93 |
+
title = {Multi-View 3D Point Tracking},
|
94 |
+
author = {Raji{\v{c}}, Frano and Xu, Haofei and Mihajlovic, Marko and Li, Siyuan and Demir, Irem and G{\"u}ndo{\u{g}}du, Emircan and Ke, Lei and Prokudin, Sergey and Pollefeys, Marc and Tang, Siyu},
|
95 |
+
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
|
96 |
+
year = {2025}
|
97 |
+
}
|
98 |
+
```
|