Back to datasets
Dataset assetOpen Source CommunityAudio ClassificationBird Species Recognition

Syoy/birdclef_2023_train

The dataset birdclef_2023_train primarily contains bird audio data and associated label information. Its features include audio files, primary labels, secondary labels, type, latitude, longitude, scientific name, common name, author, license, rating, URL, and embedding vectors. The dataset is divided into a training set, which includes 16,941 samples, with a total size of 5,388,534,029.882 bytes and a download size of 5,367,714,895 bytes.

Source
hugging_face
Created
Nov 28, 2025
Updated
Mar 21, 2023
Signals
299 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • Name: birdclef_2023_train

Dataset Features

  • audio: Audio data
  • primary_label: Primary labels, encompassing 202 different category names
  • secondary_labels: Secondary labels, string type
  • type: String type
  • latitude: Latitude, float type
  • longitude: Longitude, float type
  • scientific_name: Scientific name, string type
  • common_name: Common name, string type
  • author: Author, string type
  • license: License, string type
  • rating: Rating, float type
  • url: URL link, string type
  • embeddings: Embedding vectors, sequence of floats

Dataset Split

  • train: Training set
    • num_bytes: 5388534029.882 bytes
    • num_examples: 16941 samples
    • download_size: 5367714895 bytes
    • dataset_size: 5388534029.882 bytes
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio