Pretrained Weights of NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation (RSS 2024)

The model is trained on samples collected from the training splits of VLN-CE R2R and RxR.

Evaliation Benchmark TL NE OS SR SPL
VLN-CE R2R Val. 10.7 5.65 49.2 41.9 36.5
VLN-CE R2R Test 11.3 5.39 52 45 39
VLN-CE RxR Val. 15.4 5.72 55.6 45.7 38.2

The related inference code can be found in here

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.