Back to datasets
Dataset assetClassic DatasetSpeech RecognitionSpeech Synthesis
VCTK Corpus
This repository provides full‑context label files for the VCTK corpus. These label files were created following the preprocessing steps in r9y9/deepvoice3_pytorch. The dataset includes both full and mono label files, detailing the segmentation and annotation format of the audio data.
Source
github
Created
Mar 8, 2020
Updated
Jun 23, 2022
Signals
330 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- Full‑context label for VCTK‑Corpus
Dataset Content
- Provides full‑context label files for the VCTK‑Corpus.
Dataset Structure
├── lab
│ ├── full
│ │ ├── p225
│ │ │ ├── p225_001.lab
│ │ │ ├── p225_002.lab
│ │ │ ├── p225_003.lab
│ │ │ ├── p225_004.lab
│ │ │ ├── p225_005.lab
│ │ │ ...
│ ├── mono
│ │ ├── p225
│ │ │ ├── p225_001.lab
│ │ │ ├── p225_002.lab
│ │ │ ├── p225_003.lab
│ │ │ ├── p225_004.lab
│ │ │ ├── p225_005.lab
│ │ │ ...
Missing Files
lab/*/p315/*.lab(p315 lacks txt)lab/mono/p295/p295_047.lab(alignment failed)lab/mono/p305/p305_423.lab(alignment failed)lab/mono/p317/p317_424.lab(alignment failed)lab/mono/p345/p345_387.lab(alignment failed)
Label Format
Mono label
0 850000 pau
850000 2850000 pau
2850000 3600000 p
3600000 3900000 l
3900000 6000000 iy
6000000 8450000 z
8450000 8600000 k
8600000 11300000 ao
11300000 11450000 l
11450000 12800000 s
12800000 13099999 t
13099999 15800000 eh
15800000 16050000 l
16050000 17600000 ax
17600000 20400000 pau
Full context label
0 850000 x^x-pau+pau=p@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:0+0+0/D:0_0/E:x+x@x+x&x+x#x+x/F:0_0/G:0_0/H:x=x@1=1|0/I:0=0/J:4+3-1
850000 2850000 x^pau-pau+p=l@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+4/D:0_0/E:x+x@x+x&x+x#x+x/F:content_1/G:0_0/H:x=x@1=1|0/I:4=3/J:4+3-1
2850000 3600000 pau^pau-p+l=iy@1_4/A:0_0_0/B:1-1-4@1-1&1-4#1-3$1-4!0-1;0-1|iy/C:1+1+3/D:0_0/E:content+1@1+3&1+2#0+1/F:content_1/G:0_0/H:4=3@1=1|L-L%/I:0=0/J:4+3-1
...
References
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.