JUHE API Marketplace
DATASET
Open Source Community

PatchCamelyon (PCam) benchmark dataset

The dataset contains small‑size pathology images with corresponding labels indicating the presence of tumor tissue. Images are 96 × 96 pixels and were provided as part of a Kaggle competition.

Updated 7/30/2024
github

Description

Histopathologic Cancer Detection Dataset Overview

Dataset Description

The dataset contains small‑size pathology images with corresponding labels indicating the presence of tumor tissue. Images are 96 × 96 pixels and were provided as part of a Kaggle competition.

  • Number of Training Images: 220,000
  • Number of Validation Images: 57,000
  • Image Size: 96 × 96 pixels

Project Structure

  • train.py: Script for training a CNN model.
  • infer.py: Script for inference using a trained model.
  • HCDNetwork.py: Definition of the CNN architecture.
  • utils.py: Utility functions for data processing and visualization.
  • data/: Directory containing the dataset.
  • model/: Directory for saved model weights and results.

Model Architecture

The CNN model HCDNetwork can be configured with varying numbers of convolutional layers and dropout rates. Architecture includes:

  • Convolutional layers followed by ReLU activation and max‑pooling
  • Fully‑connected layers with dropout for regularization
  • Softmax output layer for classification

Example Model Configuration

params_model = {
    "shape_in": (3, 96, 96),
    "initial_filters": 8,
    "num_fc1": 100,
    "num_classes": 2,
    "dropout_rate": 0.75,  # Dropout rate
    "num_conv_layers": 4   # Number of convolutional layers
}

Training and Evaluation

Training involves hyper‑parameter tuning, exploring different architectures, and applying various techniques to boost performance. Model performance is evaluated using the Area Under the ROC Curve (AUC).

Training Results

ModelDropout RateConv LayersTrain LossTrain AccuracyTrain AUCVal LossVal AccuracyVal AUC
A0.1040.20420.93000.97590.45120.80870.8842
B0.5040.24470.90970.96380.47840.80000.8736
C0.9040.43140.80340.88330.44830.81250.8780
D0.7530.35150.84780.92380.38880.84000.9003
E0.7540.38620.83560.90770.37940.84500.9064
F0.7550.08810.97940.99580.61200.81130.8746

Inference

The infer.py script allows inference on new images using a trained model. The script loads the trained model, preprocesses the input image, and outputs predicted labels and class probabilities.

Example Usage

from infer import infer

model_path = model/trained_hcd_model.pth
image_path = test/sample_image.tif
pred_label, pred_probs = infer(model, image_path, device=cuda)

print(f"Predicted Label: {pred_label}")
print(f"Class Probabilities: {pred_probs}")

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Medical Image Analysis
Machine Learning Competition

Source

Organization: github

Created: 7/30/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.