Back to datasets
Dataset assetOpen Source CommunityAnomaly DetectionMedical Imaging Analysis

ChestX-Det10

This dataset contains 3,543 chest X‑ray images, annotated with bounding boxes for ten chest abnormalities, including nodules, masses, and pneumothorax. The project uses the ChestX‑Det10 dataset, which is a subset of NIH ChestX‑14.

Source
github
Created
Nov 27, 2024
Updated
Dec 5, 2024
Signals
269 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Information

  • Dataset Name: ChestX-Det10
  • Data Source: This dataset is a subset of the NIH ChestX-14 dataset.
  • Data Type: Chest X‑ray images
  • Number of Images:
    • Training images: 3,001
    • Test images: 1,000+
  • Number of Classes: 10
  • Class List:
    • Consolidation
    • Pneumothorax
    • Emphysema
    • Calcification
    • Nodule
    • Mass
    • Fracture
    • Effusion
    • Atelectasis
    • Fibrosis
  • Data Missing: 22.69% of the data is considered background.

Dataset Characteristics

  • Class Distribution: The dataset exhibits class imbalance, with Consolidation and Effusion being the most common, while Mass and Pneumothorax are relatively rare.
  • Bounding Box Distribution: Most bounding boxes are small, with width and height primarily below 200 pixels, and the area distribution skewed toward smaller regions.

Data Processing

  • Data Augmentation: Applied random horizontal flips, random rotations, color jitter, and normalization.
  • Class Weight Adjustment: Used weighted loss functions to address class imbalance.
  • Attention Mechanism: Introduced CBAM (Convolutional Block Attention Module) to enhance feature selection.

Dataset Applications

  • Object Detection: Used to train and evaluate YOLOv8 and Faster R-CNN models for detecting abnormalities in chest X‑ray images.
  • Model Comparison: Analyze the performance of YOLOv8 and Faster R-CNN in terms of speed, accuracy, and precision.
  • Performance Optimization: Optimize model performance using class weighting, data augmentation, and learning rate adjustments.

Dataset Citation

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.