Back to datasets
Dataset assetOpen Source CommunitySpeech RecognitionRussian
speech-recognition-dataset
This dataset consists of video recordings of people uttering different phrases. It is based on the State University of Nizhny Novgorod in Russia and is unique because it contains a Russian phrase library. Most of the phrases used in the dataset come from classic Russian literature and other publicly available texts. Participants sat in front of a phone or laptop screen and spoke the phrases from various distances. Each person in a video utters a specific phrase from the total phrase list. Videos are recorded in mp4 format.
Source
github
Created
May 13, 2020
Updated
May 13, 2023
Signals
135 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
- Name: Speech recognition dataset
- Content: Contains video recordings of people reading different sentences, mainly from Russian literary works and other public texts.
- Features: The dataset is unique, containing a database of Russian sentences.
- Video format: mp4
Current Status
- Number of speakers: 46
- Number of video recordings: 1194
- Number of sentences: 221
Organization
- File naming format: {speakerID}.{sentenceID}.mp4, e.g., 43.168.mp4
- Sentence text: Included in a file named “Фразы”
Access
- Download link: Yandex.Disk
License
- Type: Creative Commons Attribution 4.0 International License
- Link: Creative Commons License
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.