JUHE API Marketplace
DATASET
Open Source Community

ECG-5000 Dataset

The ECG‑5000 dataset is an ECG (electrocardiogram) dataset for anomaly detection, containing both normal and abnormal heart signal recordings. It is used to train TCN models to identify anomalous patterns in ECG data.

Updated 8/12/2024
github

Description

Overview of the ECG Anomaly Detection Dataset

Overview

  • Goal: Detect anomalies in electrocardiogram (ECG) data using machine‑learning techniques.
  • Key Features: Data preprocessing, TCN model training, and ECG signal anomaly detection.

Prerequisites

  • Libraries Used:
    • NumPy, Pandas, Matplotlib – for data manipulation and visualization.
    • Scikit‑learn – for data splitting and standardisation.
    • Darts – for time‑series modelling and anomaly detection.
    • TensorFlow, PyTorch Lightning – for early stopping during model training.

Project Workflow

  1. Load and Merge Datasets:

    • Data source: Load train_data and test_data from text files.
    • Merge: Combine into a single dataset merged_data to simplify processing.
  2. Define Normal and Anomalous Classes:

    • Class definition:
      • Normal: label 1 (normal ECG signal).
      • Anomalous: any label other than 1.
    • Dataset split: create normal_data and anomalous_data subsets.
  3. Data Splitting and Standardisation:

    • Feature selection: extract features into X_normal, excluding the label.
    • Split: use train_test_split to divide data into training, validation, and test sets.
    • Standardisation: apply StandardScaler to normalise feature values.
  4. Convert Data to Time‑Series Format:

    • Time‑series conversion: reshape data into time‑series objects, essential for modelling.
    • Ensure variability: check and modify data so that multiple samples exist in each time‑series.
  5. Implement Early Stopping:

    • Purpose: prevent over‑fitting by halting training based on validation performance.
    • Configuration: monitor val_loss; stop if no improvement of 0.05 for 5 consecutive epochs.
  6. Train the TCN Model:

    • Model configuration: input block length = 30, output block length = 10.
    • Training: train on series_train (normal ECG data) with early stopping.
    • Model saving: persist the trained model for later use.
  7. Anomaly Detection Model:

    • Use the trained TCN to compare predicted values with actual values to detect anomalies.
    • Compute anomaly score based on the deviation.
  8. Compute Anomaly Scores and Threshold:

    • Threshold determination: set threshold as mean validation anomaly score plus three standard deviations.
    • Purpose: classify ECG signals as anomalous or normal based on this threshold.
  9. Evaluation on Test Data:

    • Anomaly score calculation: apply the model to test data and compute scores for normal and anomalous signals.
    • Result: print anomaly scores for comparison.
  10. Visualise Results:

    • ECG signal plot: display ECG signals with the anomaly threshold, highlighting anomalous regions.
    • Anomaly score plot: show anomaly scores for normal and anomalous data with the threshold line.

Key Concepts

  • Time‑Series Modelling: Using a TCN to predict future points of an ECG time‑series.
  • Anomaly Detection: Identifying abnormal patterns by comparing predictions with actual ECG signals.
  • Early Stopping: Technique to avoid over‑fitting during model training.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

ECG Analysis
Anomaly Detection

Source

Organization: github

Created: 7/30/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.