Histopathologic Cancer Detection Dataset Overview

Dataset Description

The dataset contains small‑size pathology images with corresponding labels indicating the presence of tumor tissue. Images are 96 × 96 pixels and were provided as part of a Kaggle competition.

Number of Training Images: 220,000
Number of Validation Images: 57,000
Image Size: 96 × 96 pixels

Project Structure

train.py: Script for training a CNN model.
infer.py: Script for inference using a trained model.
HCDNetwork.py: Definition of the CNN architecture.
utils.py: Utility functions for data processing and visualization.
data/: Directory containing the dataset.
model/: Directory for saved model weights and results.

Model Architecture

The CNN model HCDNetwork can be configured with varying numbers of convolutional layers and dropout rates. Architecture includes:

Convolutional layers followed by ReLU activation and max‑pooling
Fully‑connected layers with dropout for regularization
Softmax output layer for classification

Example Model Configuration

params_model = {
    "shape_in": (3, 96, 96),
    "initial_filters": 8,
    "num_fc1": 100,
    "num_classes": 2,
    "dropout_rate": 0.75,  # Dropout rate
    "num_conv_layers": 4   # Number of convolutional layers
}

Training and Evaluation

Training involves hyper‑parameter tuning, exploring different architectures, and applying various techniques to boost performance. Model performance is evaluated using the Area Under the ROC Curve (AUC).

Training Results

Model	Dropout Rate	Conv Layers	Train Loss	Train Accuracy	Train AUC	Val Loss	Val Accuracy	Val AUC
A	0.10	4	0.2042	0.9300	0.9759	0.4512	0.8087	0.8842
B	0.50	4	0.2447	0.9097	0.9638	0.4784	0.8000	0.8736
C	0.90	4	0.4314	0.8034	0.8833	0.4483	0.8125	0.8780
D	0.75	3	0.3515	0.8478	0.9238	0.3888	0.8400	0.9003
E	0.75	4	0.3862	0.8356	0.9077	0.3794	0.8450	0.9064
F	0.75	5	0.0881	0.9794	0.9958	0.6120	0.8113	0.8746

Inference

The infer.py script allows inference on new images using a trained model. The script loads the trained model, preprocesses the input image, and outputs predicted labels and class probabilities.

Example Usage

from infer import infer

model_path = model/trained_hcd_model.pth
image_path = test/sample_image.tif
pred_label, pred_probs = infer(model, image_path, device=cuda)

print(f"Predicted Label: {pred_label}")
print(f"Class Probabilities: {pred_probs}")

PatchCamelyon (PCam) benchmark dataset

Description

Histopathologic Cancer Detection Dataset Overview

Dataset Description

Project Structure

Model Architecture

Example Model Configuration

Training and Evaluation

Training Results

Inference

Example Usage

AI studio

Access Dataset

Topics

Source