JUHE API Marketplace
DATASET
Open Source Community

speech-commands

This dataset is used to train speech recognition models and contains 35 words divided into numeric, directional, command, animal, and other categories.

Updated 10/25/2024
github

Description

Simple Speech Recognition System

Dataset

  • Dataset Name: speech-commands
  • Number of Recognizable Words: 35
  • Word Categories:
    1. Numeric: zero, one, two, three, four, five, six, seven, eight, nine
    2. Directional: left, right, forward, backward, up, down
    3. Command: go, stop, yes, no, on, off, follow
    4. Animal: bird, cat, dog
    5. Other: bed, house, happy, tree, wow, learn, visual, sheila, marvin

Model Files

  • Model File: speech_commands_model_epoch_20_9621--64mel.pth
  • Test Set Accuracy: 96.05%

Training Code

  • Training Code File: train.py
  • Function: Train a speech recognition model using the speech-commands dataset

Inference Code

  • Inference Code File: Inference.ipynb
  • Functions:
    1. Recognize the word corresponding to a single .wav audio file
    2. Recognize the words corresponding to all .wav files in a folder
    3. Record for 2 seconds and recognize the spoken word
    4. Continuously record and recognize a series of spoken words, providing the (start time, end time) for each word

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Speech Recognition
Word Classification

Source

Organization: github

Created: 10/25/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.