Back to datasets
Dataset assetOpen Source CommunitySignal ProcessingAudio Captioning

Clotho

Clotho is an audio‑captioning dataset used as input/output for audio captioning methods. The dataset was accepted and published at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Source
github
Created
Mar 25, 2020
Updated
Mar 25, 2020
Signals
300 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

Clotho Dataset

Dataset Purpose

Used for the development and evaluation of audio‑captioning methods.

Dataset Content

  • Audio Data: Divided into development and evaluation sets. Each set includes a 7z‑compressed audio file and a CSV caption file.
    • Development set: clotho_audio_development.7z and clotho_captions_development.csv
    • Evaluation set: clotho_audio_evaluation.7z and clotho_captions_evaluation.csv

Dataset Processing

  • Data Download: Download required files from Zenodo.
  • Data Setup: Extract downloaded files into the data folder of the project directory.
  • Code Setup: Clone the code repository and configure the environment; create a Conda environment and install dependencies.

Dataset Usage

  • Data Processing: Use the provided code to create NumPy objects containing audio and corresponding captions, and extract features from the audio.
  • Feature Extraction: By default, 64 log‑Mel‑filter‑bank energy features are extracted; users may provide custom feature‑extraction functions.

Citation Requirement

When using the Clotho dataset, the following paper should be cited: K. Drossos, S. Lipping, and T. Virtanen, "Clotho: An Audio Captioning Dataset," accepted in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 4‑8, 2020

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio