Back to datasets
Dataset assetOpen Source CommunitySpeech RecognitionWord Classification

speech-commands

This dataset is used to train speech recognition models and contains 35 words divided into numeric, directional, command, animal, and other categories.

Source
github
Created
Oct 25, 2024
Updated
Oct 25, 2024
Signals
189 views
Availability
Linked source ready
Overview

Dataset description and usage context

Simple Speech Recognition System

Dataset

  • Dataset Name: speech-commands
  • Number of Recognizable Words: 35
  • Word Categories:
    1. Numeric: zero, one, two, three, four, five, six, seven, eight, nine
    2. Directional: left, right, forward, backward, up, down
    3. Command: go, stop, yes, no, on, off, follow
    4. Animal: bird, cat, dog
    5. Other: bed, house, happy, tree, wow, learn, visual, sheila, marvin

Model Files

  • Model File: speech_commands_model_epoch_20_9621--64mel.pth
  • Test Set Accuracy: 96.05%

Training Code

  • Training Code File: train.py
  • Function: Train a speech recognition model using the speech-commands dataset

Inference Code

  • Inference Code File: Inference.ipynb
  • Functions:
    1. Recognize the word corresponding to a single .wav audio file
    2. Recognize the words corresponding to all .wav files in a folder
    3. Record for 2 seconds and recognize the spoken word
    4. Continuously record and recognize a series of spoken words, providing the (start time, end time) for each word
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio