DATASET
Open Source Community
Cereberal-Stroke-Analysis
This dataset is used to analyze stroke, employing machine learning models and resampling techniques such as SMOTEENN to improve prediction accuracy and address dataset imbalance.
Updated 12/12/2023
github
Description
Overview of Dataset Processing Workflow
Data Loading and Import
- Use
pandas,numpy,seaborn,matplotlib.pyplot, and other libraries to import and read the CSV file into a DataFrame (df).
Exploratory Data Analysis (EDA)
- Perform basic data exploration with
head()anddescribe(). - Check and count missing values using
isnull().sum().
Handling Categorical Variables
- Apply
pd.get_dummies()for one‑hot encoding of categorical variables.
Handling Missing Values
- Fill missing values using the
KNNImputeralgorithm.
Feature Scaling and Train‑Test Split
- Perform feature scaling with
MinMaxScaler. - Split the dataset into training and testing sets.
Model Selection and Evaluation
- Conduct preliminary testing with models such as
KNeighborsClassifier,GaussianNB,DecisionTreeClassifier, andRandomForestClassifier. - Generate a classification report to evaluate model performance on the imbalanced dataset.
Data Resampling
- Apply SMOTE for oversampling.
- Perform random undersampling to balance class distribution.
- Use SMOTEENN to combine oversampling and undersampling.
Post‑Resampling Model Evaluation
- Retrain and evaluate models on the oversampled, undersampled, and combined sampled datasets.
Conclusion
- Various resampling techniques, especially SMOTEENN, substantially improve the model’s ability to identify positive stroke cases.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Stroke
Machine Learning
Source
Organization: github
Created: 12/12/2023
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.