Clotho

Dataset Overview

Dataset Name

Clotho Dataset

Dataset Purpose

Used for the development and evaluation of audio‑captioning methods.

Dataset Content

Audio Data: Divided into development and evaluation sets. Each set includes a 7z‑compressed audio file and a CSV caption file.
- Development set: clotho_audio_development.7z and clotho_captions_development.csv
- Evaluation set: clotho_audio_evaluation.7z and clotho_captions_evaluation.csv

Dataset Processing

Data Download: Download required files from Zenodo.
Data Setup: Extract downloaded files into the data folder of the project directory.
Code Setup: Clone the code repository and configure the environment; create a Conda environment and install dependencies.

Dataset Usage

Data Processing: Use the provided code to create NumPy objects containing audio and corresponding captions, and extract features from the audio.
Feature Extraction: By default, 64 log‑Mel‑filter‑bank energy features are extracted; users may provide custom feature‑extraction functions.

Citation Requirement

When using the Clotho dataset, the following paper should be cited: K. Drossos, S. Lipping, and T. Virtanen, "Clotho: An Audio Captioning Dataset," accepted in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 4‑8, 2020

Description

Dataset Overview

Dataset Name

Dataset Purpose

Dataset Content

Dataset Processing

Dataset Usage

Citation Requirement

AI studio

Access Dataset

Topics

Source