Back to datasets
Dataset assetOpen Source CommunityMachine LearningUrban Air Quality Prediction

links-ads/mil-qualair

The MIL‑QUALAIR dataset was constructed for predicting urban air pollution in the Milan metropolitan area. It includes Sentinel‑5P satellite observations, meteorological conditions, terrain features, and ground‑station measurements. The dataset spans 2018‑2023 and supports prediction of five major pollutants: PM10, PM2.5, NO2, O3, and SO2. It was compiled by the LINKS Foundation, funded by the UP2030 project, and released under the MIT license.

Source
hugging_face
Created
Nov 28, 2025
Updated
May 31, 2024
Signals
116 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Name: MIL‑QUALAIR

Purpose: Urban air pollution prediction for the Milan metropolitan area, integrating Sentinel‑5P satellite observations, meteorological data, terrain features, and ground‑station measurements.

Temporal Range: 2018‑2023

Coverage Area: Milan metropolitan region

Primary Pollutants: PM10, PM2.5, NO2, O3, SO2

Dataset Details

Description

The dataset combines multiple sources, including Sentinel‑5 satellite observations, Digital Elevation Model (DEM) data, land‑cover information, weather records, and ground measurements, to facilitate prediction of the five main pollutants.

Sources

  • Sentinel 5P: ESA Copernicus Sentinel‑5p mission
  • DEM: Copernicus
  • Weather: Visual Crossing Weather
  • Land Cover: Copernicus Land Monitoring Service – Urban Atlas
  • Ground Observations: Milan Open Data Portal

Structure

  • Sentinel 5P: sentinel5.csv – Daily Sentinel‑5P satellite band readings
  • DEM: dem.tiff – 10 m resolution digital elevation model for the Milan area
  • Weather: weather.csv – Daily weather variable measurements
  • Land Cover: land_cover/land_cover.tiff – Land‑cover classification map; land_cover/land_cover_taxonomy.json – Class‑label mapping; land_cover/land_cover_mapping.json – Land‑cover class mapping
  • Ground Observations: stations.csv – Daily station measurements for the five supported pollutants

Use Cases

Primarily used to develop air‑pollution prediction models. By integrating the various data sources, users can build a comprehensive feature set for more accurate forecasting of the five supported pollutants.

Dataset Creation

Source Data

  • Sentinel 5P: ESA Copernicus Sentinel‑5p mission
  • DEM: Copernicus
  • Weather: Visual Crossing Weather
  • Land Cover: Copernicus Land Monitoring Service – Urban Atlas
  • Ground Observations: Milan Open Data Portal

Authors and Contact

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio