Parkinsons Disease Speech Dataset

This dataset originates from the University of Oxford and contains 195 instances, of which 147 are Parkinson's disease patients and 48 are non‑patients. It includes 22 features such as frequency, pitch, amplitude/period of the waveform, etc., and a label where 1 indicates Parkinson's disease and 0 indicates non‑Parkinson's.

Updated 12/17/2023

github

Dataset Overview

Dataset Name

A Machine Learning Approach for the Diagnosis of Parkinson's Disease via Speech Analysis

Research Time

March 2022

Dataset Source

University of Oxford

Dataset Composition

Number of Instances: 195
- 147 Parkinson's subjects
- 48 without Parkinson's
Number of Features: 22
- Includes features such as frequency, pitch, amplitude/period of the waveform, etc.
Label: 1 represents Parkinson’s, 0 represents non‑Parkinson’s

Algorithms Used

Logistic Regression (LR)
Linear Discriminant Analysis (LDA)
k Nearest Neighbors (KNN)
Decision Tree (DT)
Neural Network (NN)
Naive Bayes (NB)
Gradient Boost (GB)

Engineering Goal

Develop a machine‑learning model for Parkinson’s diagnosis, achieving at least 90 % accuracy and/or a Matthews Correlation Coefficient of at least 0.9.

Data Analysis Results

After rebalancing the dataset, a 75‑25 train‑test split yielded the best performance. K‑Nearest Neighbors and Neural Network achieved a maximum accuracy of 98 %.

Conclusion

The project demonstrates that machine learning significantly improves Parkinson’s diagnosis compared with existing methods, achieving 98 % accuracy, which is crucial for effective treatment.

Parkinsons Disease Speech Dataset

Description

Dataset Overview

Dataset Name

Research Time

Dataset Source

Dataset Composition

Algorithms Used

Engineering Goal

Data Analysis Results

Conclusion

AI studio

Access Dataset

Topics

Source