PatchCamelyon (PCam) benchmark dataset
The dataset contains small‑size pathology images with corresponding labels indicating the presence of tumor tissue. Images are 96 × 96 pixels and were provided as part of a Kaggle competition.
Dataset description and usage context
Histopathologic Cancer Detection Dataset Overview
Dataset Description
The dataset contains small‑size pathology images with corresponding labels indicating the presence of tumor tissue. Images are 96 × 96 pixels and were provided as part of a Kaggle competition.
- Number of Training Images: 220,000
- Number of Validation Images: 57,000
- Image Size: 96 × 96 pixels
Project Structure
train.py: Script for training a CNN model.infer.py: Script for inference using a trained model.HCDNetwork.py: Definition of the CNN architecture.utils.py: Utility functions for data processing and visualization.data/: Directory containing the dataset.model/: Directory for saved model weights and results.
Model Architecture
The CNN model HCDNetwork can be configured with varying numbers of convolutional layers and dropout rates. Architecture includes:
- Convolutional layers followed by ReLU activation and max‑pooling
- Fully‑connected layers with dropout for regularization
- Softmax output layer for classification
Example Model Configuration
params_model = {
"shape_in": (3, 96, 96),
"initial_filters": 8,
"num_fc1": 100,
"num_classes": 2,
"dropout_rate": 0.75, # Dropout rate
"num_conv_layers": 4 # Number of convolutional layers
}
Training and Evaluation
Training involves hyper‑parameter tuning, exploring different architectures, and applying various techniques to boost performance. Model performance is evaluated using the Area Under the ROC Curve (AUC).
Training Results
| Model | Dropout Rate | Conv Layers | Train Loss | Train Accuracy | Train AUC | Val Loss | Val Accuracy | Val AUC |
|---|---|---|---|---|---|---|---|---|
| A | 0.10 | 4 | 0.2042 | 0.9300 | 0.9759 | 0.4512 | 0.8087 | 0.8842 |
| B | 0.50 | 4 | 0.2447 | 0.9097 | 0.9638 | 0.4784 | 0.8000 | 0.8736 |
| C | 0.90 | 4 | 0.4314 | 0.8034 | 0.8833 | 0.4483 | 0.8125 | 0.8780 |
| D | 0.75 | 3 | 0.3515 | 0.8478 | 0.9238 | 0.3888 | 0.8400 | 0.9003 |
| E | 0.75 | 4 | 0.3862 | 0.8356 | 0.9077 | 0.3794 | 0.8450 | 0.9064 |
| F | 0.75 | 5 | 0.0881 | 0.9794 | 0.9958 | 0.6120 | 0.8113 | 0.8746 |
Inference
The infer.py script allows inference on new images using a trained model. The script loads the trained model, preprocesses the input image, and outputs predicted labels and class probabilities.
Example Usage
from infer import infer
model_path = model/trained_hcd_model.pth
image_path = test/sample_image.tif
pred_label, pred_probs = infer(model, image_path, device=cuda)
print(f"Predicted Label: {pred_label}")
print(f"Class Probabilities: {pred_probs}")
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.