Back to datasets
Dataset assetOpen Source CommunitySignal ProcessingAudio Captioning
Clotho
Clotho is an audio‑captioning dataset used as input/output for audio captioning methods. The dataset was accepted and published at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Source
github
Created
Mar 25, 2020
Updated
Mar 25, 2020
Signals
300 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
Clotho Dataset
Dataset Purpose
Used for the development and evaluation of audio‑captioning methods.
Dataset Content
- Audio Data: Divided into development and evaluation sets. Each set includes a 7z‑compressed audio file and a CSV caption file.
- Development set:
clotho_audio_development.7zandclotho_captions_development.csv - Evaluation set:
clotho_audio_evaluation.7zandclotho_captions_evaluation.csv
- Development set:
Dataset Processing
- Data Download: Download required files from Zenodo.
- Data Setup: Extract downloaded files into the
datafolder of the project directory. - Code Setup: Clone the code repository and configure the environment; create a Conda environment and install dependencies.
Dataset Usage
- Data Processing: Use the provided code to create NumPy objects containing audio and corresponding captions, and extract features from the audio.
- Feature Extraction: By default, 64 log‑Mel‑filter‑bank energy features are extracted; users may provide custom feature‑extraction functions.
Citation Requirement
When using the Clotho dataset, the following paper should be cited: K. Drossos, S. Lipping, and T. Virtanen, "Clotho: An Audio Captioning Dataset," accepted in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 4‑8, 2020
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.