Back to datasets
Dataset assetOpen Source CommunityAudio ClassificationBioacoustics

DBD-research-group/BirdSet

BirdSet is a large‑scale audio classification dataset focusing on bird vocalizations. It contains over 6,800 hours of recordings, providing training data for nearly 10,000 classes and over 400 hours of evaluation data across eight strongly labeled evaluation sets. BirdSet serves as a rich resource for audio classification tasks such as multi‑label classification, covariate shift, or self‑supervised learning.

Source
hugging_face
Created
Nov 28, 2025
Updated
Sep 10, 2025
Signals
292 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Description

Dataset Overview

  • Task Category: Audio Classification
  • License: cc
  • Tags: Bird Classification, Passive Acoustic Monitoring

Dataset Details

Dataset Composition

  • Training and Test Sets: Multiple training and test sets covering various bird vocalizations.
  • Data Format: .ogg files at 32 kHz sampling rate.
  • Data Source: Recordings obtained from Xeno‑Canto (XC), excluding recordings with CC‑ND licenses.
  • Labels: Scientific bird names converted to ebird_code tags.

Dataset Statistics

Dataset NameTrain SamplesTest Samples5‑second Test SamplesSize (GB)Number of Classes
PER (Amazon Basin)16,80214,79815,12010.5132
NES (Colombia Costa Rica)16,1176,95224,48014.289
UHH (Hawaiian Islands)3,62659,58336,6374.9225 tr, 27 te
HSN (high_sierras)5,46010,29612,0005.9221
NBP (NIPS4BPlus)24,3275,49356329.951
POW (Powdermill Nature)14,91116,0524,56015.748
SSW (Sapsucker Woods)28,40350,760205,20035.281
SNE (Sierra Nevada)19,39020,14723,75620.856
XCM (Xenocanto Subset M)89,798xx89.3409 (411)
XCL (Xenocanto Complete)528,434xx4849,735

Dataset Usage

  • Training Set: Uses focus audio from XC with quality ratings A, B, C, excluding CC‑ND licensed recordings.
  • Test Set: Provides full recordings and label sets, excluding recordings without bird vocalizations.
  • 5‑second Test Set: Multi‑label task; each recording is segmented into 5‑second intervals, with an unlabeled [0] vector for silent segments.

Metadata

  • Audio Format: 32 kHz mono.
  • Labels: ebird_code, ebird_code_multilabel, ebird_code_secondary.
  • Additional Metadata: Start time, end time, frequency range, geographic location, etc.

Citation

@misc{birdset, title={BirdSet: A Multi‑Task Benchmark for Classification in Avian Bioacoustics}, author={Lukas Rauch and Raphael Schwinger and Moritz Wirth and René Heinrich and Jonas Lange and Stefan Kahl and Bernhard Sick and Sven Tomforde and Christoph Scholz}, year={2024}, eprint={2403.10380}, archivePrefix={arXiv}, primaryClass={cs.SD} }

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio