MikeTrizna/bees
The USNM Bumblebee dataset is a natural‑history collection containing single‑view images and occurrence data for 73,497 bumblebee specimens (family Apidae). The data conform to the Darwin Core standard, including taxonomy, collection date, location, and other metadata; most specimen locations are georeferenced. The dataset is global in scope but limited to specimens held by the Smithsonian Institution’s USNM collection. Image metadata follow the Audiovisual Core standard. Collection and digitization involved specimen gathering, imaging, data transcription, and quality control. The dataset can be used for evolutionary biology, ecology, climate change studies, and related research fields.
Description
Dataset Card – Bee Dataset
Dataset Overview
The United States National Museum of Natural History (USNM) bumblebee dataset is a natural‑history collection comprising single‑side or dorsal images of 73,497 bumblebee specimens belonging to the Apidae family, along with a tab‑separated values file containing occurrence data. Occurrence data include taxonomic classification, collection date, location information, and other metadata compliant with the Darwin Core standard (https://dwc.tdwg.org). 11,421 specimens are not identified to species and are listed as Bombus sp. or Xylocopa sp. Most specimens (55,301) have georeferenced locations. The dataset is global but limited to specimens housed in the Smithsonian USNM collection.
Language
English
Data Example
A typical data point includes specimen metadata and image information.
An example from the dataset:
{
"occurrenceID": "http://n2t.net/ark:/65665/30042e2d8-669d-4520-b456-e3c64203eff8",
"catalogNumber": "USNMENT01732649",
"recordedBy": "R. Craig",
"year": "1949",
"month": "4",
"day": "13",
"country": "United States",
"stateProvince": "California",
"county": "Fresno",
"locality": "Auberry",
"decimalLatitude": "37.0808",
"decimalLongitude": "-119.485",
"identifiedBy": "OBrien, L. R.",
"scientificName": "Xylocopa (Notoxylocopa) tabaniformis orpifex",
"genus": "Xylocopa",
"subgenus": "Notoxylocopa",
"specificEpithet": "tabaniformis",
"infraspecificEpithet": "orpifex",
"scientificNameAuthorship": "Smith",
"accessURI": "https://ids.si.edu/ids/deliveryService?id=NMNH-USNMENT01732649",
"PixelXDimension": 2000,
"PixelYDimension": 1212
}
Data Fields
Specimen metadata fields follow the Darwin Core standard; see https://dwc.tdwg.org for details. Image metadata fields follow the Audiovisual Core standard; see https://ac.tdwg.org/.
Dataset Size
- Training set: 73,387 samples, 3,672,202,733.82 bytes
- Download size: 3,659,907,058 bytes
- Total dataset size: 3,672,202,733.82 bytes
Configuration
- Configuration name: default
- Data files:
- Split: training
- Path: data/train-*
Dataset Curators
Smithsonian National Museum of Natural History, Department of Entomology. Jessica Bird (Entomology Data Manager) is the primary contact.
License
Public domain, Creative Commons CC0.
Citation
Orrell T, Informatics Office (2023). NMNH Extant Specimen Records (USNM, US). Version 1.72. National Museum of Natural History, Smithsonian Institution. Occurrence dataset. https://collections.nmnh.si.edu/ipt/resource?r=nmnh_extant_dwc-a&v=1.72
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.