Pretrained Weights of NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation (RSS 2024)

The model is trained on samples collected from the training splits of VLN-CE R2R and RxR.

Evaliation Benchmark	TL	NE	OS	SR	SPL
VLN-CE R2R Val.	10.7	5.65	49.2	41.9	36.5
VLN-CE R2R Test	11.3	5.39	52	45	39
VLN-CE RxR Val.	15.4	5.72	55.6	45.7	38.2

The related inference code can be found in here

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.