Back to datasets
Dataset assetOpen Source CommunityAnomaly DetectionECG Analysis

ECG-5000 Dataset

The ECG‑5000 dataset is an ECG (electrocardiogram) dataset for anomaly detection, containing both normal and abnormal heart signal recordings. It is used to train TCN models to identify anomalous patterns in ECG data.

Source
github
Created
Jul 30, 2024
Updated
Aug 12, 2024
Signals
767 views
Availability
Linked source ready
Overview

Dataset description and usage context

Overview of the ECG Anomaly Detection Dataset

Overview

  • Goal: Detect anomalies in electrocardiogram (ECG) data using machine‑learning techniques.
  • Key Features: Data preprocessing, TCN model training, and ECG signal anomaly detection.

Prerequisites

  • Libraries Used:
    • NumPy, Pandas, Matplotlib – for data manipulation and visualization.
    • Scikit‑learn – for data splitting and standardisation.
    • Darts – for time‑series modelling and anomaly detection.
    • TensorFlow, PyTorch Lightning – for early stopping during model training.

Project Workflow

  1. Load and Merge Datasets:

    • Data source: Load train_data and test_data from text files.
    • Merge: Combine into a single dataset merged_data to simplify processing.
  2. Define Normal and Anomalous Classes:

    • Class definition:
      • Normal: label 1 (normal ECG signal).
      • Anomalous: any label other than 1.
    • Dataset split: create normal_data and anomalous_data subsets.
  3. Data Splitting and Standardisation:

    • Feature selection: extract features into X_normal, excluding the label.
    • Split: use train_test_split to divide data into training, validation, and test sets.
    • Standardisation: apply StandardScaler to normalise feature values.
  4. Convert Data to Time‑Series Format:

    • Time‑series conversion: reshape data into time‑series objects, essential for modelling.
    • Ensure variability: check and modify data so that multiple samples exist in each time‑series.
  5. Implement Early Stopping:

    • Purpose: prevent over‑fitting by halting training based on validation performance.
    • Configuration: monitor val_loss; stop if no improvement of 0.05 for 5 consecutive epochs.
  6. Train the TCN Model:

    • Model configuration: input block length = 30, output block length = 10.
    • Training: train on series_train (normal ECG data) with early stopping.
    • Model saving: persist the trained model for later use.
  7. Anomaly Detection Model:

    • Use the trained TCN to compare predicted values with actual values to detect anomalies.
    • Compute anomaly score based on the deviation.
  8. Compute Anomaly Scores and Threshold:

    • Threshold determination: set threshold as mean validation anomaly score plus three standard deviations.
    • Purpose: classify ECG signals as anomalous or normal based on this threshold.
  9. Evaluation on Test Data:

    • Anomaly score calculation: apply the model to test data and compute scores for normal and anomalous signals.
    • Result: print anomaly scores for comparison.
  10. Visualise Results:

    • ECG signal plot: display ECG signals with the anomaly threshold, highlighting anomalous regions.
    • Anomaly score plot: show anomaly scores for normal and anomalous data with the threshold line.

Key Concepts

  • Time‑Series Modelling: Using a TCN to predict future points of an ECG time‑series.
  • Anomaly Detection: Identifying abnormal patterns by comparing predictions with actual ECG signals.
  • Early Stopping: Technique to avoid over‑fitting during model training.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio