Back to datasets
Dataset assetOpen Source CommunityAnomaly DetectionECG Analysis
ECG-5000 Dataset
The ECG‑5000 dataset is an ECG (electrocardiogram) dataset for anomaly detection, containing both normal and abnormal heart signal recordings. It is used to train TCN models to identify anomalous patterns in ECG data.
Source
github
Created
Jul 30, 2024
Updated
Aug 12, 2024
Signals
767 views
Availability
Linked source ready
Overview
Dataset description and usage context
Overview of the ECG Anomaly Detection Dataset
Overview
- Goal: Detect anomalies in electrocardiogram (ECG) data using machine‑learning techniques.
- Key Features: Data preprocessing, TCN model training, and ECG signal anomaly detection.
Prerequisites
- Libraries Used:
- NumPy, Pandas, Matplotlib – for data manipulation and visualization.
- Scikit‑learn – for data splitting and standardisation.
- Darts – for time‑series modelling and anomaly detection.
- TensorFlow, PyTorch Lightning – for early stopping during model training.
Project Workflow
-
Load and Merge Datasets:
- Data source: Load
train_dataandtest_datafrom text files. - Merge: Combine into a single dataset
merged_datato simplify processing.
- Data source: Load
-
Define Normal and Anomalous Classes:
- Class definition:
- Normal: label 1 (normal ECG signal).
- Anomalous: any label other than 1.
- Dataset split: create
normal_dataandanomalous_datasubsets.
- Class definition:
-
Data Splitting and Standardisation:
- Feature selection: extract features into
X_normal, excluding the label. - Split: use
train_test_splitto divide data into training, validation, and test sets. - Standardisation: apply
StandardScalerto normalise feature values.
- Feature selection: extract features into
-
Convert Data to Time‑Series Format:
- Time‑series conversion: reshape data into time‑series objects, essential for modelling.
- Ensure variability: check and modify data so that multiple samples exist in each time‑series.
-
Implement Early Stopping:
- Purpose: prevent over‑fitting by halting training based on validation performance.
- Configuration: monitor
val_loss; stop if no improvement of 0.05 for 5 consecutive epochs.
-
Train the TCN Model:
- Model configuration: input block length = 30, output block length = 10.
- Training: train on
series_train(normal ECG data) with early stopping. - Model saving: persist the trained model for later use.
-
Anomaly Detection Model:
- Use the trained TCN to compare predicted values with actual values to detect anomalies.
- Compute anomaly score based on the deviation.
-
Compute Anomaly Scores and Threshold:
- Threshold determination: set threshold as mean validation anomaly score plus three standard deviations.
- Purpose: classify ECG signals as anomalous or normal based on this threshold.
-
Evaluation on Test Data:
- Anomaly score calculation: apply the model to test data and compute scores for normal and anomalous signals.
- Result: print anomaly scores for comparison.
-
Visualise Results:
- ECG signal plot: display ECG signals with the anomaly threshold, highlighting anomalous regions.
- Anomaly score plot: show anomaly scores for normal and anomalous data with the threshold line.
Key Concepts
- Time‑Series Modelling: Using a TCN to predict future points of an ECG time‑series.
- Anomaly Detection: Identifying abnormal patterns by comparing predictions with actual ECG signals.
- Early Stopping: Technique to avoid over‑fitting during model training.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.