High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

edinburghcstr/edacc

The Edinburgh International Accents of English Corpus (EdAcc) is a new automatic speech recognition (ASR) dataset containing 40 hours of English dialogue that spans a wide range of English accents. It includes extensive first‑language and second‑language English variants, along with detailed speaker background information. Recent evaluations with public and commercial models show that EdAcc highlights shortcomings of current English ASR models: while they perform well on existing benchmarks, their performance degrades significantly on speakers with different accents.

hugging_face

View Details