JUHE API Marketplace
DATASET
Open Source Community

NAVCON

NAVCON is a large‑scale Vision‑Language Navigation (VLN) corpus created by the University of Pennsylvania, built on top of the R2R and RxR datasets. It contains 30,815 instructions with 236,316 concept annotations and aligns them with 2.7 million paired images, illustrating the visual context encountered by agents while following instructions. The corpus was generated using cognitive heuristics and language foundations, producing silver‑standard annotations that were subsequently human‑validated for quality. NAVCON is primarily intended for language‑guided navigation tasks, aiming to improve models' abilities to comprehend and execute natural language commands, especially in cross‑modal alignment and concept recognition.

Updated 12/18/2024
arXiv

Description

Vision-and-Language Navigation in Continuous Environments (VLN‑CE)

Dataset Overview

VLN‑CE is an instruction‑driven navigation benchmark featuring crowd‑sourced instructions, real‑world environments, and unrestricted agent navigation. The benchmark supports the Room‑to‑Room (R2R) and Room‑Across‑Room (RxR) datasets.

Scene Data

  • Matterport3D (MP3D): Utilizes reconstructions from the Matterport3D dataset. Scenes can be downloaded via the official Matterport3D script and extracted to data/scene_datasets/mp3d/{scene}/{scene}.glb. There are 90 scenes in total.

Task Data

Room‑to‑Room (R2R)

  • R2R_VLNCE_v1‑3: A port of the R2R dataset for the Matterport3D‑Simulator (MP3D‑Sim). Two variants are provided:
    • R2R_VLNCE_v1‑3.zip – 3 MB, extracts to data/datasets/R2R_VLNCE_v1-3.
    • R2R_VLNCE_v1‑3_preprocessed.zip – 250 MB, extracts to data/datasets/R2R_VLNCE_v1-3_preprocessed.

Room‑Across‑Room (RxR)

  • RxR_VLNCE_v0.zip: Contains multilingual instructions (English, Hindi, Telugu) and diverse trajectories for continuous environments. Splits include train, val_seen, val_unseen, and test_challenge with the following structure:
    data/datasets
    ├─ RxR_VLNCE_v0
    │   ├─ train
    │   │    ├─ train_guide.json.gz
    │   │    ├─ train_guide_gt.json.gz
    │   │    ├─ train_follower.json.gz
    │   │    ├─ train_follower_gt.json.gz
    │   ├─ val_seen
    │   │    ├─ val_seen_guide.json.gz
    │   │    ├─ val_seen_guide_gt.json.gz
    │   │    ├─ val_seen_follower.json.gz
    │   │    ├─ val_seen_follower_gt.json.gz
    │   ├─ val_unseen
    │   │    ├─ val_unseen_guide.json.gz
    │   │    ├─ val_unseen_guide_gt.json.gz
    │   │    ├─ val_unseen_follower.json.gz
    │   │    ├─ val_unseen_follower_gt.json.gz
    │   ├─ test_challenge
    │   │    ├─ test_challenge_guide.json.gz
    │   ├─ text_features
    │   │    └─ ...
    

Pre‑trained Model Weights

  • ResNet Pre‑trained Weights: ResNet weights for deep visual observation can be downloaded here and should be extracted to data/ddppo-models/{model}.pth.

Dataset Usage

Installation

  • Python 3.6: Recommended to create a conda or miniconda environment.
  • Habitat‑Sim 0.1.7: Install via conda or build from source.
  • Habitat‑Lab 0.1.7: Install from source.

Data Download

  • Matterport3D: Use download_mp.py to fetch scene data.
  • R2R_VLNCE_v1‑3: Download with the gdown command.
  • RxR_VLNCE_v0.zip: Direct download.

Dataset Structure

  • R2R_VLNCE_v1‑3: Contains training, validation, and test splits.
  • RxR_VLNCE_v0: Provides multilingual instructions and trajectory data.

Citation

If you use the VLN‑CE dataset, please cite the following paper:

@inproceedings{krantz_vlnce_2020,
  title={Beyond the Nav‑Graph: Vision and Language Navigation in Continuous Environments},
  author={Jacob Krantz and Erik Wijmans and Arjun Majundar and Dhruv Batra and Stefan Lee},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2020}
}

If you also use the RxR‑Habitat data, cite additionally:

@inproceedings{ku2020room,
  title={Room‑Across‑Room: Multilingual Vision‑and‑Language Navigation with Dense Spatiotemporal Grounding},
  author={Ku, Alexander and Anderson, Peter and Patel, Roma and Ie, Eugene and Baldridge, Jason},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  pages={4392--4412},
  year={2020}
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Vision‑Language Navigation
Robotic Navigation

Source

Organization: arXiv

Created: 12/17/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.