Parkinsons Disease Speech Dataset
This dataset originates from the University of Oxford and contains 195 instances, of which 147 are Parkinson's disease patients and 48 are non‑patients. It includes 22 features such as frequency, pitch, amplitude/period of the waveform, etc., and a label where 1 indicates Parkinson's disease and 0 indicates non‑Parkinson's.
Dataset description and usage context
Dataset Overview
Dataset Name
A Machine Learning Approach for the Diagnosis of Parkinson's Disease via Speech Analysis
Research Time
March 2022
Dataset Source
University of Oxford
Dataset Composition
- Number of Instances: 195
- 147 Parkinson's subjects
- 48 without Parkinson's
- Number of Features: 22
- Includes features such as frequency, pitch, amplitude/period of the waveform, etc.
- Label: 1 represents Parkinson’s, 0 represents non‑Parkinson’s
Algorithms Used
- Logistic Regression (LR)
- Linear Discriminant Analysis (LDA)
- k Nearest Neighbors (KNN)
- Decision Tree (DT)
- Neural Network (NN)
- Naive Bayes (NB)
- Gradient Boost (GB)
Engineering Goal
Develop a machine‑learning model for Parkinson’s diagnosis, achieving at least 90 % accuracy and/or a Matthews Correlation Coefficient of at least 0.9.
Data Analysis Results
After rebalancing the dataset, a 75‑25 train‑test split yielded the best performance. K‑Nearest Neighbors and Neural Network achieved a maximum accuracy of 98 %.
Conclusion
The project demonstrates that machine learning significantly improves Parkinson’s diagnosis compared with existing methods, achieving 98 % accuracy, which is crucial for effective treatment.
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.