Back to datasets
Dataset assetOpen Source CommunitySkin Lesion ClassificationModel Fairness

ISIC Archive Skin Lesion Dataset

The ISIC Archive skin‑lesion dataset was jointly created by Forthcoming University of Applied Sciences and Eindhoven University of Technology, containing 71,035 skin‑lesion images. It is used to study model gender bias and fairness. The dataset was constructed using linear programming to balance gender, age, and lesion type, aiming to mitigate gender bias in medical image diagnosis. Primary applications are skin‑lesion classification and fairness research.

Source
arXiv
Created
Jul 24, 2024
Updated
Jul 24, 2024
Signals
432 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Research Objective

This study systematically evaluates the diagnostic accuracy of various convolutional neural network (CNN) architectures on skin‑lesion images, with particular attention to how demographic parameters such as gender influence performance.

Dataset Construction

  • A balanced test set was used.
  • Five training sets of equal size were built with female‑to‑male ratios of: all‑female, 75:25, 50:50, 25:75, all‑male.
  • All six datasets maintain a 50:50 benign‑to‑malignant ratio.

Data Source

The dataset comprises metadata from the ISIC Archive, with references:

  1. Codella, N., et al. (2019)
  2. Codella, N.C.F., et al. (2018)
  3. Combalia, M., et al. (2019)
  4. Gutman, D., et al. (2016)
  5. Tschandl, P., et al. (2018)
  6. Veronica, R., et al. (2021)

Code Structure

  • 0_data: Collected skin‑lesion metadata.
  • 1_code: Baseline and multi‑task models, experiment definitions, and MATLAB code.
    • single task: 0_baseline.py (Keras/TensorFlow)
    • reinforcing: 1_mtl_strengthen.py (Keras/TensorFlow)
    • adversarial: br‑net.py (PyTorch)
    • MATLAB folder: Linear‑programming model for creating dataset distributions.
    • Experiments folder: Runs various model‑dataset combinations.
      • e1: 50F:50M (run‑e1: base, run‑e1m: reinforcing, run‑e1br: adversarial)
      • e5: all‑female
      • e7: all‑male
      • e8: 25F:75M
      • e9: 75F:25M
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio