Dataset assetOpen Source CommunitySound RecognitionVehicle Safety

NINA Dataset

NINA dataset is a collection of in‑vehicle and out‑of‑vehicle sounds (e.g., electric‑vehicle alarm sounds) for research purposes. The sounds were recorded via dashcams or smartphone microphones; because recording conditions are uncontrolled, no vehicle speed, specific recording device model, or microphone details are provided.

Source

github

Created

Dec 9, 2019

Updated

Sep 12, 2023

Signals

245 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Name

Naturalistic IN‑vehicle Audio Dataset (NINA)

Dataset Content

NINA dataset contains sounds generated inside and outside vehicles (e.g., electric‑vehicle alarm sounds). These sounds were mainly recorded via dashcams or smartphone microphones, and because the environment is uncontrolled, the dataset does not include vehicle speed, recording device model, or microphone details.

Dataset Classification

Category	Segments	Total Duration (s)
Crash	751	865
Driving	295	1086
Tire skidding	186	208
Horn	261	314
Harsh acceleration	22	63
Talking	265	653
Screaming	157	113
Music	198	821
Pothole	144	138
Meteo (strong rain/hail)	94	3613
Police siren	39	288
Ambulance siren	159	1253
Firetruck siren	76	822

Dataset Files

datasetCreation.sh: main script file
youtube_IDs.csv: list of YouTube videos
labels: folder containing txt files, each with annotations [start time] [end time] [category]

Dataset Usage

By running bash datasetCreation.sh ./labels/ ./output, an output folder is created containing subfolders for each category, each populated with wav files.

Contribution / Extension

Contributors can extend the dataset by adding new YouTube video IDs and related titles to the youtube_IDs.csv file and annotating audio using Audacity or other tools.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio