Back to datasets
Dataset assetOpen Source CommunityMachine LearningStroke

Cereberal-Stroke-Analysis

This dataset is used to analyze stroke, employing machine learning models and resampling techniques such as SMOTEENN to improve prediction accuracy and address dataset imbalance.

Source
github
Created
Dec 12, 2023
Updated
Dec 12, 2023
Signals
403 views
Availability
Linked source ready
Overview

Dataset description and usage context

Overview of Dataset Processing Workflow

Data Loading and Import

  • Use pandas, numpy, seaborn, matplotlib.pyplot, and other libraries to import and read the CSV file into a DataFrame (df).

Exploratory Data Analysis (EDA)

  • Perform basic data exploration with head() and describe().
  • Check and count missing values using isnull().sum().

Handling Categorical Variables

  • Apply pd.get_dummies() for one‑hot encoding of categorical variables.

Handling Missing Values

  • Fill missing values using the KNNImputer algorithm.

Feature Scaling and Train‑Test Split

  • Perform feature scaling with MinMaxScaler.
  • Split the dataset into training and testing sets.

Model Selection and Evaluation

  • Conduct preliminary testing with models such as KNeighborsClassifier, GaussianNB, DecisionTreeClassifier, and RandomForestClassifier.
  • Generate a classification report to evaluate model performance on the imbalanced dataset.

Data Resampling

  • Apply SMOTE for oversampling.
  • Perform random undersampling to balance class distribution.
  • Use SMOTEENN to combine oversampling and undersampling.

Post‑Resampling Model Evaluation

  • Retrain and evaluate models on the oversampled, undersampled, and combined sampled datasets.

Conclusion

  • Various resampling techniques, especially SMOTEENN, substantially improve the model’s ability to identify positive stroke cases.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio