FungiTastic
The FungiTastic dataset was created jointly by the University of West Bohemia, INRIA and the Czech Technical University in Prague. It comprises approximately 350,000 records and over 650,000 fungal photographs with detailed metadata. The dataset stems from twenty years of continuous data collection and supports various machine learning tasks such as closed‑set and open‑set classification. Rich metadata includes timestamps, camera settings, geographic coordinates, satellite imagery, and biological taxonomy. It is widely used for image classification problems in biology, particularly for fungal identification.
Description
Dataset Overview
Dataset Name
The FungiTastic Dataset
Dataset Description
FungiTastic is a comprehensive multimodal machine‑learning dataset for classifying fungi from images and metadata. It includes photographs of fungal observations, satellite imagery, weather observations, segmentation masks, and textual metadata. Metadata enriches each observation with timestamps, camera settings, GPS location, substrate, habitat, and biological taxonomy. By combining multiple modalities, the dataset provides a robust benchmark for multimodal classification, enabling the development and evaluation of complex machine‑learning models under realistic and dynamic conditions.
Dataset Content
- Image Data: Photographs of fungal observations, satellite images, and segmentation masks.
- Metadata: Timestamps, camera configuration, GPS coordinates, substrate, habitat, and biological taxonomy.
Dataset Subsets
- FungiTastic Closed Set: Training (246,884), validation (45,616), test (48,379) observations.
- FungiTastic‑M Closed Set: Small prototype subset, training (25,786), validation (4,687), test (5,531).
- FungiTastic‑FS Closed Set: Few‑shot subset, training (4,293), validation (1,099), test (998).
- FungiTastic Open Set: Training (246,884), validation (47,453), test (50,085) observations.
- FungiTastic‑M Open Set: Small open‑set subset, training (25,786), validation (4,703), test (5,587).
Dataset Statistics
- Total Images: Over 650,000.
- Total Observations: Over 350,000.
- Class Distribution: Long‑tail distribution (see Figure 2 in the paper).
Evaluation and Metrics
The dataset defines five problem settings with corresponding evaluation metrics:
- Fine‑grained closed‑set classification with a heavy long‑tail distribution.
- Standard closed‑set classification with out‑of‑distribution (OOD) detection.
- Classification with non‑standard cost functions.
- Classification on a time‑sorted dataset for benchmarking adaptation methods.
- Few‑shot classification for species with limited training observations.
Baseline Results
Performance metrics for various architectures on each subset are provided, including Top‑1, Top‑3, and macro‑averaged F1 scores.
Dataset Download
- Kaggle: FungiTastic
- GitHub: FungiTastic Repository
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 8/25/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.