Clotho
Clotho is an audio‑captioning dataset used as input/output for audio captioning methods. The dataset was accepted and published at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Description
Dataset Overview
Dataset Name
Clotho Dataset
Dataset Purpose
Used for the development and evaluation of audio‑captioning methods.
Dataset Content
- Audio Data: Divided into development and evaluation sets. Each set includes a 7z‑compressed audio file and a CSV caption file.
- Development set:
clotho_audio_development.7zandclotho_captions_development.csv - Evaluation set:
clotho_audio_evaluation.7zandclotho_captions_evaluation.csv
- Development set:
Dataset Processing
- Data Download: Download required files from Zenodo.
- Data Setup: Extract downloaded files into the
datafolder of the project directory. - Code Setup: Clone the code repository and configure the environment; create a Conda environment and install dependencies.
Dataset Usage
- Data Processing: Use the provided code to create NumPy objects containing audio and corresponding captions, and extract features from the audio.
- Feature Extraction: By default, 64 log‑Mel‑filter‑bank energy features are extracted; users may provide custom feature‑extraction functions.
Citation Requirement
When using the Clotho dataset, the following paper should be cited: K. Drossos, S. Lipping, and T. Virtanen, "Clotho: An Audio Captioning Dataset," accepted in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 4‑8, 2020
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 3/25/2020
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.