Pretrained Weights of NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation (RSS 2024)
The model is trained on samples collected from the training splits of VLN-CE R2R and RxR.
Evaliation Benchmark | TL | NE | OS | SR | SPL |
---|---|---|---|---|---|
VLN-CE R2R Val. | 10.7 | 5.65 | 49.2 | 41.9 | 36.5 |
VLN-CE R2R Test | 11.3 | 5.39 | 52 | 45 | 39 |
VLN-CE RxR Val. | 15.4 | 5.72 | 55.6 | 45.7 | 38.2 |
The related inference code can be found in here
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.