JUHE API Marketplace
DATASET
Open Source Community

Wild Bee Dataset

The Wild Bee Dataset was created by Berlin University of Applied Sciences and contains approximately 30 000 images of wild bees sourced from the iNaturalist database. It is primarily intended to support insect monitoring and species classification research. The dataset covers 25 common German wild bee species; four visually similar species were merged into a single class. During creation, the dataset underwent rigorous labeling, including segmentation masks for body parts. The goal is to assist biologists in annotating rare species using deep‑learning techniques, thereby improving understanding and protection of biodiversity.

Updated 6/15/2022
arXiv

Description

Dataset Overview

Dataset Introduction

The dataset is intended to support the development of automatic insect‑monitoring systems capable of identifying insect species without capturing or killing the insects. Because of the great diversity and rarity of insect species, building a high‑quality insect‑image dataset is challenging. The construction involved downloading insect images from iNaturalist via the script webscraper_inat.py and manually annotating them.

Data Acquisition

Images were downloaded using the script webscraper_inat.py from iNaturalist. Users must specify the target folder, maximum number of images, and the species URL index. For example, the index for Anthidium manicatum can be obtained by searching its name and copying the number at the end of the URL.

Data Annotation

From the downloaded images, about 30 samples per species (the mini dataset) were selected and further annotated using Label Studio. The final mini dataset contains 726 images covering 25 bee species. Annotations include segmentation of major body parts such as head, thorax, and abdomen.

Data Pre‑processing

Scripts create_metafiles_mini.py and create_metafiles_all.py were used to generate CUB200‑style metadata files from the JSON exports of Label Studio. These files map class names, image files, class labels, body parts, and their locations.

Training and Validation

A pretrained ResNet50 model was trained and cross‑validated on the full dataset, using the mini dataset as a test set. Reported test accuracies were 0.78 (top‑1) and 0.95 (top‑3), competitive with state‑of‑the‑art fine‑grained models.

Preliminary XAI Experiments

In initial experiments without human involvement, several XAI methods (e.g., saliency maps) were used to assess model interpretability. Experiments employed segmentation masks as a reference for explanations and evaluated fidelity via pixel‑flipping and Monte Carlo dropout.

Concept‑Based Prototype Nearest Neighbor (CoProNN)

A new concept‑based posterior XAI method was developed, leveraging text‑to‑image models (e.g., Stable Diffusion) to generate high‑level concept images, which were then used with k‑NN to explain model predictions. User studies confirmed that the method helped users classify bees more accurately and more easily discover erroneous model predictions.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Entomology
Image Recognition

Source

Organization: arXiv

Created: 6/15/2022

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.