Wild Bee Dataset
The Wild Bee Dataset was created by Berlin University of Applied Sciences and contains approximately 30 000 images of wild bees sourced from the iNaturalist database. It is primarily intended to support insect monitoring and species classification research. The dataset covers 25 common German wild bee species; four visually similar species were merged into a single class. During creation, the dataset underwent rigorous labeling, including segmentation masks for body parts. The goal is to assist biologists in annotating rare species using deep‑learning techniques, thereby improving understanding and protection of biodiversity.
Description
Dataset Overview
Dataset Introduction
The dataset is intended to support the development of automatic insect‑monitoring systems capable of identifying insect species without capturing or killing the insects. Because of the great diversity and rarity of insect species, building a high‑quality insect‑image dataset is challenging. The construction involved downloading insect images from iNaturalist via the script webscraper_inat.py and manually annotating them.
Data Acquisition
Images were downloaded using the script webscraper_inat.py from iNaturalist. Users must specify the target folder, maximum number of images, and the species URL index. For example, the index for Anthidium manicatum can be obtained by searching its name and copying the number at the end of the URL.
Data Annotation
From the downloaded images, about 30 samples per species (the mini dataset) were selected and further annotated using Label Studio. The final mini dataset contains 726 images covering 25 bee species. Annotations include segmentation of major body parts such as head, thorax, and abdomen.
Data Pre‑processing
Scripts create_metafiles_mini.py and create_metafiles_all.py were used to generate CUB200‑style metadata files from the JSON exports of Label Studio. These files map class names, image files, class labels, body parts, and their locations.
Training and Validation
A pretrained ResNet50 model was trained and cross‑validated on the full dataset, using the mini dataset as a test set. Reported test accuracies were 0.78 (top‑1) and 0.95 (top‑3), competitive with state‑of‑the‑art fine‑grained models.
Preliminary XAI Experiments
In initial experiments without human involvement, several XAI methods (e.g., saliency maps) were used to assess model interpretability. Experiments employed segmentation masks as a reference for explanations and evaluated fidelity via pixel‑flipping and Monte Carlo dropout.
Concept‑Based Prototype Nearest Neighbor (CoProNN)
A new concept‑based posterior XAI method was developed, leveraging text‑to‑image models (e.g., Stable Diffusion) to generate high‑level concept images, which were then used with k‑NN to explain model predictions. User studies confirmed that the method helped users classify bees more accurately and more easily discover erroneous model predictions.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 6/15/2022
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.