ECG-5000 Dataset
The ECG‑5000 dataset is an ECG (electrocardiogram) dataset for anomaly detection, containing both normal and abnormal heart signal recordings. It is used to train TCN models to identify anomalous patterns in ECG data.
Description
Overview of the ECG Anomaly Detection Dataset
Overview
- Goal: Detect anomalies in electrocardiogram (ECG) data using machine‑learning techniques.
- Key Features: Data preprocessing, TCN model training, and ECG signal anomaly detection.
Prerequisites
- Libraries Used:
- NumPy, Pandas, Matplotlib – for data manipulation and visualization.
- Scikit‑learn – for data splitting and standardisation.
- Darts – for time‑series modelling and anomaly detection.
- TensorFlow, PyTorch Lightning – for early stopping during model training.
Project Workflow
-
Load and Merge Datasets:
- Data source: Load
train_dataandtest_datafrom text files. - Merge: Combine into a single dataset
merged_datato simplify processing.
- Data source: Load
-
Define Normal and Anomalous Classes:
- Class definition:
- Normal: label 1 (normal ECG signal).
- Anomalous: any label other than 1.
- Dataset split: create
normal_dataandanomalous_datasubsets.
- Class definition:
-
Data Splitting and Standardisation:
- Feature selection: extract features into
X_normal, excluding the label. - Split: use
train_test_splitto divide data into training, validation, and test sets. - Standardisation: apply
StandardScalerto normalise feature values.
- Feature selection: extract features into
-
Convert Data to Time‑Series Format:
- Time‑series conversion: reshape data into time‑series objects, essential for modelling.
- Ensure variability: check and modify data so that multiple samples exist in each time‑series.
-
Implement Early Stopping:
- Purpose: prevent over‑fitting by halting training based on validation performance.
- Configuration: monitor
val_loss; stop if no improvement of 0.05 for 5 consecutive epochs.
-
Train the TCN Model:
- Model configuration: input block length = 30, output block length = 10.
- Training: train on
series_train(normal ECG data) with early stopping. - Model saving: persist the trained model for later use.
-
Anomaly Detection Model:
- Use the trained TCN to compare predicted values with actual values to detect anomalies.
- Compute anomaly score based on the deviation.
-
Compute Anomaly Scores and Threshold:
- Threshold determination: set threshold as mean validation anomaly score plus three standard deviations.
- Purpose: classify ECG signals as anomalous or normal based on this threshold.
-
Evaluation on Test Data:
- Anomaly score calculation: apply the model to test data and compute scores for normal and anomalous signals.
- Result: print anomaly scores for comparison.
-
Visualise Results:
- ECG signal plot: display ECG signals with the anomaly threshold, highlighting anomalous regions.
- Anomaly score plot: show anomaly scores for normal and anomalous data with the threshold line.
Key Concepts
- Time‑Series Modelling: Using a TCN to predict future points of an ECG time‑series.
- Anomaly Detection: Identifying abnormal patterns by comparing predictions with actual ECG signals.
- Early Stopping: Technique to avoid over‑fitting during model training.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 7/30/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.