Back to datasets
Dataset assetOpen Source CommunityStreet View ImagesVisual Geolocation
osv5m/osv5m
OpenStreetView‑5M is the first large‑scale open street‑view image geolocation benchmark. It provides a global visual geolocation challenge, allowing users to experience the difficulty of the benchmark through the supplied demo. The dataset contains training and test splits, downloadable via the Hugging Face Hub.
Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 27, 2024
Signals
664 views
Availability
Linked source ready
Overview
Dataset description and usage context
OpenStreetView‑5M
Dataset Overview
OpenStreetView‑5M is a large‑scale open street‑view image geolocation benchmark dataset.
Structure
- Config Name: default
- Data Files:
- Training Set:
- File path: "train.csv"
- Image directory: "images/train"
- Test Set:
- File path: "test.csv"
- Image directory: "images/test"
- Training Set:
Download
Full Dataset
from huggingface_hub import snapshot_download
snapshot_download(repo_id="osv5m/osv5m", local_dir="datasets/osv5m", repo_type=dataset)
Extract
import os, zipfile
for root, dirs, files in os.walk("datasets/osv5m"):
for file in files:
if file.endswith(".zip"):
with zipfile.ZipFile(os.path.join(root, file), r) as zip_ref:
zip_ref.extractall(root)
os.remove(os.path.join(root, file))
Load Directly
from datasets import load_dataset
dataset = load_dataset(osv5m/osv5m, full=False)
The full flag indicates whether to load the complete metadata (default False).
Download Test Set Only
from huggingface_hub import hf_hub_download
for i in range(5):
hf_hub_download(repo_id="osv5m/osv5m", filename=str(i).zfill(2)+.zip, subfolder="images/test", repo_type=dataset, local_dir="datasets/OpenWorld")
hf_hub_download(repo_id="osv5m/osv5m", filename="README.md", repo_type=dataset, local_dir="datasets/OpenWorld")
Citation
@article{osv5m,
title = {{OpenStreetView‑5M}: {T}he Many Roads to Global Visual Geolocation},
author = {Astruc, Guillaume and Dufour, Nicolas and Siglidis, Ioannis and Aronssohn, Constantin and Bouia, Nacim and Fu, Stephanie and Loiseau, Romain and Nguyen, Van Nguyen and Raude, Charles and Vincent, Elliot and Xu, Lintao and Zhou, Hongyu and Landrieu, Loic},
journal = {CVPR},
year = {2024},
}
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.