NAVCON
NAVCON is a large‑scale Vision‑Language Navigation (VLN) corpus created by the University of Pennsylvania, built on top of the R2R and RxR datasets. It contains 30,815 instructions with 236,316 concept annotations and aligns them with 2.7 million paired images, illustrating the visual context encountered by agents while following instructions. The corpus was generated using cognitive heuristics and language foundations, producing silver‑standard annotations that were subsequently human‑validated for quality. NAVCON is primarily intended for language‑guided navigation tasks, aiming to improve models' abilities to comprehend and execute natural language commands, especially in cross‑modal alignment and concept recognition.
Description
Vision-and-Language Navigation in Continuous Environments (VLN‑CE)
Dataset Overview
VLN‑CE is an instruction‑driven navigation benchmark featuring crowd‑sourced instructions, real‑world environments, and unrestricted agent navigation. The benchmark supports the Room‑to‑Room (R2R) and Room‑Across‑Room (RxR) datasets.
Scene Data
- Matterport3D (MP3D): Utilizes reconstructions from the Matterport3D dataset. Scenes can be downloaded via the official Matterport3D script and extracted to
data/scene_datasets/mp3d/{scene}/{scene}.glb. There are 90 scenes in total.
Task Data
Room‑to‑Room (R2R)
- R2R_VLNCE_v1‑3: A port of the R2R dataset for the Matterport3D‑Simulator (MP3D‑Sim). Two variants are provided:
- R2R_VLNCE_v1‑3.zip – 3 MB, extracts to
data/datasets/R2R_VLNCE_v1-3. - R2R_VLNCE_v1‑3_preprocessed.zip – 250 MB, extracts to
data/datasets/R2R_VLNCE_v1-3_preprocessed.
- R2R_VLNCE_v1‑3.zip – 3 MB, extracts to
Room‑Across‑Room (RxR)
- RxR_VLNCE_v0.zip: Contains multilingual instructions (English, Hindi, Telugu) and diverse trajectories for continuous environments. Splits include
train,val_seen,val_unseen, andtest_challengewith the following structure:data/datasets ├─ RxR_VLNCE_v0 │ ├─ train │ │ ├─ train_guide.json.gz │ │ ├─ train_guide_gt.json.gz │ │ ├─ train_follower.json.gz │ │ ├─ train_follower_gt.json.gz │ ├─ val_seen │ │ ├─ val_seen_guide.json.gz │ │ ├─ val_seen_guide_gt.json.gz │ │ ├─ val_seen_follower.json.gz │ │ ├─ val_seen_follower_gt.json.gz │ ├─ val_unseen │ │ ├─ val_unseen_guide.json.gz │ │ ├─ val_unseen_guide_gt.json.gz │ │ ├─ val_unseen_follower.json.gz │ │ ├─ val_unseen_follower_gt.json.gz │ ├─ test_challenge │ │ ├─ test_challenge_guide.json.gz │ ├─ text_features │ │ └─ ...
Pre‑trained Model Weights
- ResNet Pre‑trained Weights: ResNet weights for deep visual observation can be downloaded here and should be extracted to
data/ddppo-models/{model}.pth.
Dataset Usage
Installation
- Python 3.6: Recommended to create a conda or miniconda environment.
- Habitat‑Sim 0.1.7: Install via conda or build from source.
- Habitat‑Lab 0.1.7: Install from source.
Data Download
- Matterport3D: Use
download_mp.pyto fetch scene data. - R2R_VLNCE_v1‑3: Download with the
gdowncommand. - RxR_VLNCE_v0.zip: Direct download.
Dataset Structure
- R2R_VLNCE_v1‑3: Contains training, validation, and test splits.
- RxR_VLNCE_v0: Provides multilingual instructions and trajectory data.
Citation
If you use the VLN‑CE dataset, please cite the following paper:
@inproceedings{krantz_vlnce_2020,
title={Beyond the Nav‑Graph: Vision and Language Navigation in Continuous Environments},
author={Jacob Krantz and Erik Wijmans and Arjun Majundar and Dhruv Batra and Stefan Lee},
booktitle={European Conference on Computer Vision (ECCV)},
year={2020}
}
If you also use the RxR‑Habitat data, cite additionally:
@inproceedings{ku2020room,
title={Room‑Across‑Room: Multilingual Vision‑and‑Language Navigation with Dense Spatiotemporal Grounding},
author={Ku, Alexander and Anderson, Peter and Patel, Roma and Ie, Eugene and Baldridge, Jason},
booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
pages={4392--4412},
year={2020}
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 12/17/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.