Dataset assetOpen Source CommunityMachine LearningStroke

Cereberal-Stroke-Analysis

This dataset is used to analyze stroke, employing machine learning models and resampling techniques such as SMOTEENN to improve prediction accuracy and address dataset imbalance.

Source

github

Created

Dec 12, 2023

Updated

Dec 12, 2023

Signals

403 views

Availability

Linked source ready

Overview

Dataset description and usage context

Overview of Dataset Processing Workflow

Data Loading and Import

Use pandas, numpy, seaborn, matplotlib.pyplot, and other libraries to import and read the CSV file into a DataFrame (df).

Exploratory Data Analysis (EDA)

Perform basic data exploration with head() and describe().
Check and count missing values using isnull().sum().

Handling Categorical Variables

Apply pd.get_dummies() for one‑hot encoding of categorical variables.

Handling Missing Values

Fill missing values using the KNNImputer algorithm.

Feature Scaling and Train‑Test Split

Perform feature scaling with MinMaxScaler.
Split the dataset into training and testing sets.

Model Selection and Evaluation

Conduct preliminary testing with models such as KNeighborsClassifier, GaussianNB, DecisionTreeClassifier, and RandomForestClassifier.
Generate a classification report to evaluate model performance on the imbalanced dataset.

Data Resampling

Apply SMOTE for oversampling.
Perform random undersampling to balance class distribution.
Use SMOTEENN to combine oversampling and undersampling.

Post‑Resampling Model Evaluation

Retrain and evaluate models on the oversampled, undersampled, and combined sampled datasets.

Conclusion

Various resampling techniques, especially SMOTEENN, substantially improve the model’s ability to identify positive stroke cases.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio