JUHE API Marketplace
DATASET
Open Source Community

BMD-HS Dataset

The BMD‑HS dataset is a breakthrough collection of heart sound recordings carefully curated to enhance automated cardiovascular disease (CVD) diagnosis. It contains over 800 recordings divided into six categories, including common valve diseases: Aortic Stenosis (AS), Aortic Regurgitation (AR), Mitral Regurgitation (MR), Multi‑Disease (MD), Mitral Stenosis (MS), and healthy (Normal) samples.

Updated 9/4/2024
github

Description

BMD‑HS Dataset: Heart Sound Recordings for Automated Cardiovascular Disease Diagnosis

Overview

The BMD‑HS dataset is a breakthrough collection of heart sound recordings carefully curated to enhance automated cardiovascular disease (CVD) diagnosis. It contains over 800 recordings divided into six categories, including common valve diseases: Aortic Stenosis (AS), Aortic Regurgitation (AR), Mitral Regurgitation (MR), Multi‑Disease (MD) and Mitral Stenosis (MS), as well as healthy (Normal) samples.

Key Features

  • Multi‑Label Annotations: Enables fine‑grained classification capturing single‑valve and multi‑valve disease states.
  • Echocardiogram Data: Includes ECHO data providing additional diagnostic context for cardiovascular research.
  • Diverse Representation: Recorded at the National Cardiovascular Disease Research Institute of Bangladesh, the dataset balances gender representation to ensure relevance for Bangladesh and similar regions.
  • Balanced Class Representation: Recordings from 20 healthy participants and 20 participants for each valve disease category address class imbalance.
  • Rich Metadata: Annotations include disease presence, severity, and demographics supporting in‑depth studies and potential new correlation discoveries.
  • Multi‑Disease Data: Includes patients with multiple valve diseases, offering a realistic real‑world scenario dataset.

Dataset Structure

1. Train Folder

  • Files: Contains 872 .wav audio files.
  • Details: Recordings from 59 patients captured at 8 different auscultation locations, each 20 seconds long, sampled at 4 kHz.

2. Train.csv

  • Purpose: Provides training labels and corresponding recording filenames for each patient.
  • Columns:
    • patient_id: Filename in the training folder.
    • AS: Aortic Stenosis label (0 = absent, 1 = present).
    • AR: Aortic Regurgitation label (0 = absent, 1 = present).
    • MR: Mitral Regurgitation label (0 = absent, 1 = present).
    • MS: Mitral Stenosis label (0 = absent, 1 = present).
    • MD: Multi‑Disease patient label (0 = diseased, 1 = normal).
    • N: Normal patient label (0 = diseased, 1 = normal).
    • recording_1 to recording_8: Filenames of the eight recordings for each patient.

3. Additional_metadata.csv

  • Purpose: Provides supplementary patient information for enhanced prediction or inference.
  • Columns:
    • patient_id: Filename in the training folder.
    • Age: Patient age.
    • Gender: Patient gender (M = male, F = female).
    • Smoker: Smoking status (0 = non‑smoker, 1 = smoker).
    • Lives: Residence area (U = urban, F = rural).

Key Points

  • Pre‑processing & Augmentation: Effective preprocessing and augmentation are crucial due to the limited training set size.
  • Transfer Learning: Encouraged to leverage external public datasets for transfer learning.
  • Metadata Utilization: Exploring correlations between valve disease categories and provided metadata (age, gender, smoking status, residence) may improve model performance.

Weaknesses

  • Dataset Imbalance: Despite efforts to balance class representation, some disease severity levels and demographic variations may still introduce imbalance, potentially affecting model training and performance.

Potential Impact

The BMD‑HS dataset offers diverse representation, particularly suited for research and healthcare development in regions like Bangladesh. Multi‑label annotations, echocardiogram data, and comprehensive heart health representation hold significant value for advancing AI‑driven cardiovascular disease diagnostic tools, especially in underserved areas.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Cardiovascular Diseases
Medical Diagnosis

Source

Organization: github

Created: 8/18/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.