Codecfake Dataset
Codecfake dataset and countermeasures for universal detection of deep‑fake audio. Because of Zenodo repository size limits, the dataset is split into multiple subsets, including training, development, and test sets.
Description
Dataset Overview
Dataset Name
- Codecfake Dataset
Dataset Description
- Codecfake Dataset is a dataset for universal detection of deep‑fake audio, composed of multiple subsets including training, development, and test sets.
Dataset Subsets
| Subset Name | Description | Link |
|---|---|---|
| Training set (part 1 of 3) & Labels | train_split.zip & train_split.z01 - train_split.z06 | Link |
| Training set (part 2 of 3) | train_split.z07 - train_split.z14 | Link |
| Training set (part 3 of 3) | train_split.z15 - train_split.z19 | Link |
| Development set | dev_split.zip & dev_split.z01 - dev_split.z02 | Link |
| Test set (part 1 of 2) | Codec test: C1.zip - C6.cip & ALM test: A1.zip - A3.zip | Link |
| Test set (part 2 of 2) | Codec unseen test: C7.zip | Link |
Dataset License
- CC BY‑NC‑ND 4.0
Dataset Structure
-
The dataset should be organized as follows:
├── Codecfake │ ├── label │ │ └── *.txt │ ├── train │ │ └── *.wav (740,747 samples) │ ├── dev │ │ └── *.wav (92,596 samples) │ ├── test │ │ └── C1 │ │ └── *.wav (26,456 samples) │ │ └── C2 │ │ └── *.wav (26,456 samples) │ │ └── C3 │ │ └── *.wav (26,456 samples) │ │ └── C4 │ │ └── *.wav (26,456 samples) │ │ └── C5 │ │ └── *.wav (26,456 samples) │ │ └── C6 │ │ └── *.wav (26,456 samples) │ │ └── C7 │ │ └── *.wav (145,505 samples) │ │ └── A1 │ │ └── *.wav (8,902 samples) │ │ └── A2 │ │ └── *.wav (8,902 samples) │ │ └── A3 │ │ └── *.wav (99,112 samples)
Usage Recommendations
- If you wish to jointly train with the ASVspoof2019 dataset, first download the corresponding training, development, and evaluation sets from the ASVspoof2019 LA Database.
Pre‑trained Models
- Several pre‑trained models are provided, including Vocoder‑trained ADD, Codec‑trained ADD, and Co‑trained ADD models, stored in the
./pretrained_modeldirectory.
Citation
-
When using this dataset, please cite:
@article{xie2024codecfake, title={The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio}, author={Xie, Yuankun and Lu, Yi and Fu, Ruibo and Wen, Zhengqi and Wang, Zhiyong and Tao, Jianhua and Qi, Xin and Wang, Xiaopeng and Liu, Yukun and Cheng, Haonan and others}, journal={arXiv preprint arXiv:2405.04880}, year={2024} }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 5/8/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.