PatchCamelyon (PCam) benchmark dataset
The dataset contains small‑size pathology images with corresponding labels indicating the presence of tumor tissue. Images are 96 × 96 pixels and were provided as part of a Kaggle competition.
Description
Histopathologic Cancer Detection Dataset Overview
Dataset Description
The dataset contains small‑size pathology images with corresponding labels indicating the presence of tumor tissue. Images are 96 × 96 pixels and were provided as part of a Kaggle competition.
- Number of Training Images: 220,000
- Number of Validation Images: 57,000
- Image Size: 96 × 96 pixels
Project Structure
train.py: Script for training a CNN model.infer.py: Script for inference using a trained model.HCDNetwork.py: Definition of the CNN architecture.utils.py: Utility functions for data processing and visualization.data/: Directory containing the dataset.model/: Directory for saved model weights and results.
Model Architecture
The CNN model HCDNetwork can be configured with varying numbers of convolutional layers and dropout rates. Architecture includes:
- Convolutional layers followed by ReLU activation and max‑pooling
- Fully‑connected layers with dropout for regularization
- Softmax output layer for classification
Example Model Configuration
params_model = {
"shape_in": (3, 96, 96),
"initial_filters": 8,
"num_fc1": 100,
"num_classes": 2,
"dropout_rate": 0.75, # Dropout rate
"num_conv_layers": 4 # Number of convolutional layers
}
Training and Evaluation
Training involves hyper‑parameter tuning, exploring different architectures, and applying various techniques to boost performance. Model performance is evaluated using the Area Under the ROC Curve (AUC).
Training Results
| Model | Dropout Rate | Conv Layers | Train Loss | Train Accuracy | Train AUC | Val Loss | Val Accuracy | Val AUC |
|---|---|---|---|---|---|---|---|---|
| A | 0.10 | 4 | 0.2042 | 0.9300 | 0.9759 | 0.4512 | 0.8087 | 0.8842 |
| B | 0.50 | 4 | 0.2447 | 0.9097 | 0.9638 | 0.4784 | 0.8000 | 0.8736 |
| C | 0.90 | 4 | 0.4314 | 0.8034 | 0.8833 | 0.4483 | 0.8125 | 0.8780 |
| D | 0.75 | 3 | 0.3515 | 0.8478 | 0.9238 | 0.3888 | 0.8400 | 0.9003 |
| E | 0.75 | 4 | 0.3862 | 0.8356 | 0.9077 | 0.3794 | 0.8450 | 0.9064 |
| F | 0.75 | 5 | 0.0881 | 0.9794 | 0.9958 | 0.6120 | 0.8113 | 0.8746 |
Inference
The infer.py script allows inference on new images using a trained model. The script loads the trained model, preprocesses the input image, and outputs predicted labels and class probabilities.
Example Usage
from infer import infer
model_path = model/trained_hcd_model.pth
image_path = test/sample_image.tif
pred_label, pred_probs = infer(model, image_path, device=cuda)
print(f"Predicted Label: {pred_label}")
print(f"Class Probabilities: {pred_probs}")
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 7/30/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.