Back to datasets
Dataset assetOpen Source CommunityStreet View ImagesVisual Geolocation

osv5m/osv5m

OpenStreetView‑5M is the first large‑scale open street‑view image geolocation benchmark. It provides a global visual geolocation challenge, allowing users to experience the difficulty of the benchmark through the supplied demo. The dataset contains training and test splits, downloadable via the Hugging Face Hub.

Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 27, 2024
Signals
664 views
Availability
Linked source ready
Overview

Dataset description and usage context

OpenStreetView‑5M

Dataset Overview

OpenStreetView‑5M is a large‑scale open street‑view image geolocation benchmark dataset.

Structure

  • Config Name: default
  • Data Files:
    • Training Set:
      • File path: "train.csv"
      • Image directory: "images/train"
    • Test Set:
      • File path: "test.csv"
      • Image directory: "images/test"

Download

Full Dataset

from huggingface_hub import snapshot_download
snapshot_download(repo_id="osv5m/osv5m", local_dir="datasets/osv5m", repo_type=dataset)

Extract

import os, zipfile
for root, dirs, files in os.walk("datasets/osv5m"):
    for file in files:
        if file.endswith(".zip"):
            with zipfile.ZipFile(os.path.join(root, file), r) as zip_ref:
                zip_ref.extractall(root)
                os.remove(os.path.join(root, file))

Load Directly

from datasets import load_dataset
dataset = load_dataset(osv5m/osv5m, full=False)

The full flag indicates whether to load the complete metadata (default False).

Download Test Set Only

from huggingface_hub import hf_hub_download
for i in range(5):
    hf_hub_download(repo_id="osv5m/osv5m", filename=str(i).zfill(2)+.zip, subfolder="images/test", repo_type=dataset, local_dir="datasets/OpenWorld")
    hf_hub_download(repo_id="osv5m/osv5m", filename="README.md", repo_type=dataset, local_dir="datasets/OpenWorld")

Citation

@article{osv5m,
    title = {{OpenStreetView‑5M}: {T}he Many Roads to Global Visual Geolocation},
    author = {Astruc, Guillaume and Dufour, Nicolas and Siglidis, Ioannis and Aronssohn, Constantin and Bouia, Nacim and Fu, Stephanie and Loiseau, Romain and Nguyen, Van Nguyen and Raude, Charles and Vincent, Elliot and Xu, Lintao and Zhou, Hongyu and Landrieu, Loic},
    journal = {CVPR},
    year = {2024},
}
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio