Dataset assetOpen Source CommunityRobotic NavigationVision‑Language Navigation

NAVCON

NAVCON is a large‑scale Vision‑Language Navigation (VLN) corpus created by the University of Pennsylvania, built on top of the R2R and RxR datasets. It contains 30,815 instructions with 236,316 concept annotations and aligns them with 2.7 million paired images, illustrating the visual context encountered by agents while following instructions. The corpus was generated using cognitive heuristics and language foundations, producing silver‑standard annotations that were subsequently human‑validated for quality. NAVCON is primarily intended for language‑guided navigation tasks, aiming to improve models' abilities to comprehend and execute natural language commands, especially in cross‑modal alignment and concept recognition.

Source

arXiv

Created

Dec 17, 2024

Updated

Dec 18, 2024

Signals

539 views

Availability

Linked source ready

Overview

Dataset description and usage context

Vision-and-Language Navigation in Continuous Environments (VLN‑CE)

Dataset Overview

VLN‑CE is an instruction‑driven navigation benchmark featuring crowd‑sourced instructions, real‑world environments, and unrestricted agent navigation. The benchmark supports the Room‑to‑Room (R2R) and Room‑Across‑Room (RxR) datasets.

Scene Data

Matterport3D (MP3D): Utilizes reconstructions from the Matterport3D dataset. Scenes can be downloaded via the official Matterport3D script and extracted to data/scene_datasets/mp3d/{scene}/{scene}.glb. There are 90 scenes in total.

Task Data

Room‑to‑Room (R2R)

R2R_VLNCE_v1‑3: A port of the R2R dataset for the Matterport3D‑Simulator (MP3D‑Sim). Two variants are provided:
- R2R_VLNCE_v1‑3.zip – 3 MB, extracts to data/datasets/R2R_VLNCE_v1-3.
- R2R_VLNCE_v1‑3_preprocessed.zip – 250 MB, extracts to data/datasets/R2R_VLNCE_v1-3_preprocessed.

Room‑Across‑Room (RxR)

RxR_VLNCE_v0.zip: Contains multilingual instructions (English, Hindi, Telugu) and diverse trajectories for continuous environments. Splits include train, val_seen, val_unseen, and test_challenge with the following structure:

data/datasets
├─ RxR_VLNCE_v0
│   ├─ train
│   │    ├─ train_guide.json.gz
│   │    ├─ train_guide_gt.json.gz
│   │    ├─ train_follower.json.gz
│   │    ├─ train_follower_gt.json.gz
│   ├─ val_seen
│   │    ├─ val_seen_guide.json.gz
│   │    ├─ val_seen_guide_gt.json.gz
│   │    ├─ val_seen_follower.json.gz
│   │    ├─ val_seen_follower_gt.json.gz
│   ├─ val_unseen
│   │    ├─ val_unseen_guide.json.gz
│   │    ├─ val_unseen_guide_gt.json.gz
│   │    ├─ val_unseen_follower.json.gz
│   │    ├─ val_unseen_follower_gt.json.gz
│   ├─ test_challenge
│   │    ├─ test_challenge_guide.json.gz
│   ├─ text_features
│   │    └─ ...

Pre‑trained Model Weights

ResNet Pre‑trained Weights: ResNet weights for deep visual observation can be downloaded here and should be extracted to data/ddppo-models/{model}.pth.

Dataset Usage

Installation

Python 3.6: Recommended to create a conda or miniconda environment.
Habitat‑Sim 0.1.7: Install via conda or build from source.
Habitat‑Lab 0.1.7: Install from source.

Data Download

Matterport3D: Use download_mp.py to fetch scene data.
R2R_VLNCE_v1‑3: Download with the gdown command.
RxR_VLNCE_v0.zip: Direct download.

Dataset Structure

R2R_VLNCE_v1‑3: Contains training, validation, and test splits.
RxR_VLNCE_v0: Provides multilingual instructions and trajectory data.

Citation

If you use the VLN‑CE dataset, please cite the following paper:

@inproceedings{krantz_vlnce_2020,
  title={Beyond the Nav‑Graph: Vision and Language Navigation in Continuous Environments},
  author={Jacob Krantz and Erik Wijmans and Arjun Majundar and Dhruv Batra and Stefan Lee},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2020}
}

If you also use the RxR‑Habitat data, cite additionally:

@inproceedings{ku2020room,
  title={Room‑Across‑Room: Multilingual Vision‑and‑Language Navigation with Dense Spatiotemporal Grounding},
  author={Ku, Alexander and Anderson, Peter and Patel, Roma and Ie, Eugene and Baldridge, Jason},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  pages={4392--4412},
  year={2020}
}

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio