DBD-research-group/BirdSet
BirdSet is a large‑scale audio classification dataset focusing on bird vocalizations. It contains over 6,800 hours of recordings, providing training data for nearly 10,000 classes and over 400 hours of evaluation data across eight strongly labeled evaluation sets. BirdSet serves as a rich resource for audio classification tasks such as multi‑label classification, covariate shift, or self‑supervised learning.
Description
Dataset Description
Dataset Overview
- Task Category: Audio Classification
- License: cc
- Tags: Bird Classification, Passive Acoustic Monitoring
Dataset Details
- Dataset Name: BirdSet
- Contact: Lukas Rauch (lukas.rauch@uni-kassel.de)
Dataset Composition
- Training and Test Sets: Multiple training and test sets covering various bird vocalizations.
- Data Format: .ogg files at 32 kHz sampling rate.
- Data Source: Recordings obtained from Xeno‑Canto (XC), excluding recordings with CC‑ND licenses.
- Labels: Scientific bird names converted to ebird_code tags.
Dataset Statistics
| Dataset Name | Train Samples | Test Samples | 5‑second Test Samples | Size (GB) | Number of Classes |
|---|---|---|---|---|---|
| PER (Amazon Basin) | 16,802 | 14,798 | 15,120 | 10.5 | 132 |
| NES (Colombia Costa Rica) | 16,117 | 6,952 | 24,480 | 14.2 | 89 |
| UHH (Hawaiian Islands) | 3,626 | 59,583 | 36,637 | 4.92 | 25 tr, 27 te |
| HSN (high_sierras) | 5,460 | 10,296 | 12,000 | 5.92 | 21 |
| NBP (NIPS4BPlus) | 24,327 | 5,493 | 563 | 29.9 | 51 |
| POW (Powdermill Nature) | 14,911 | 16,052 | 4,560 | 15.7 | 48 |
| SSW (Sapsucker Woods) | 28,403 | 50,760 | 205,200 | 35.2 | 81 |
| SNE (Sierra Nevada) | 19,390 | 20,147 | 23,756 | 20.8 | 56 |
| XCM (Xenocanto Subset M) | 89,798 | x | x | 89.3 | 409 (411) |
| XCL (Xenocanto Complete) | 528,434 | x | x | 484 | 9,735 |
Dataset Usage
- Training Set: Uses focus audio from XC with quality ratings A, B, C, excluding CC‑ND licensed recordings.
- Test Set: Provides full recordings and label sets, excluding recordings without bird vocalizations.
- 5‑second Test Set: Multi‑label task; each recording is segmented into 5‑second intervals, with an unlabeled [0] vector for silent segments.
Metadata
- Audio Format: 32 kHz mono.
- Labels: ebird_code, ebird_code_multilabel, ebird_code_secondary.
- Additional Metadata: Start time, end time, frequency range, geographic location, etc.
Citation
@misc{birdset, title={BirdSet: A Multi‑Task Benchmark for Classification in Avian Bioacoustics}, author={Lukas Rauch and Raphael Schwinger and Moritz Wirth and René Heinrich and Jonas Lange and Stefan Kahl and Bernhard Sick and Sven Tomforde and Christoph Scholz}, year={2024}, eprint={2403.10380}, archivePrefix={arXiv}, primaryClass={cs.SD} }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.