JUHE API Marketplace
DATASET
Open Source Community

MikeTrizna/bees

The USNM Bumblebee dataset is a natural‑history collection containing single‑view images and occurrence data for 73,497 bumblebee specimens (family Apidae). The data conform to the Darwin Core standard, including taxonomy, collection date, location, and other metadata; most specimen locations are georeferenced. The dataset is global in scope but limited to specimens held by the Smithsonian Institution’s USNM collection. Image metadata follow the Audiovisual Core standard. Collection and digitization involved specimen gathering, imaging, data transcription, and quality control. The dataset can be used for evolutionary biology, ecology, climate change studies, and related research fields.

Updated 9/22/2023
hugging_face

Description

Dataset Card – Bee Dataset

Dataset Overview

The United States National Museum of Natural History (USNM) bumblebee dataset is a natural‑history collection comprising single‑side or dorsal images of 73,497 bumblebee specimens belonging to the Apidae family, along with a tab‑separated values file containing occurrence data. Occurrence data include taxonomic classification, collection date, location information, and other metadata compliant with the Darwin Core standard (https://dwc.tdwg.org). 11,421 specimens are not identified to species and are listed as Bombus sp. or Xylocopa sp. Most specimens (55,301) have georeferenced locations. The dataset is global but limited to specimens housed in the Smithsonian USNM collection.

Language

English

Data Example

A typical data point includes specimen metadata and image information.

An example from the dataset:

{
  "occurrenceID": "http://n2t.net/ark:/65665/30042e2d8-669d-4520-b456-e3c64203eff8",
  "catalogNumber": "USNMENT01732649",
  "recordedBy": "R. Craig",
  "year": "1949",
  "month": "4",
  "day": "13",
  "country": "United States",
  "stateProvince": "California",
  "county": "Fresno",
  "locality": "Auberry",
  "decimalLatitude": "37.0808",
  "decimalLongitude": "-119.485",
  "identifiedBy": "OBrien, L. R.",
  "scientificName": "Xylocopa (Notoxylocopa) tabaniformis orpifex",
  "genus": "Xylocopa",
  "subgenus": "Notoxylocopa",
  "specificEpithet": "tabaniformis",
  "infraspecificEpithet": "orpifex",
  "scientificNameAuthorship": "Smith",
  "accessURI": "https://ids.si.edu/ids/deliveryService?id=NMNH-USNMENT01732649",
  "PixelXDimension": 2000,
  "PixelYDimension": 1212
}

Data Fields

Specimen metadata fields follow the Darwin Core standard; see https://dwc.tdwg.org for details. Image metadata fields follow the Audiovisual Core standard; see https://ac.tdwg.org/.

Dataset Size

  • Training set: 73,387 samples, 3,672,202,733.82 bytes
  • Download size: 3,659,907,058 bytes
  • Total dataset size: 3,672,202,733.82 bytes

Configuration

  • Configuration name: default
  • Data files:
    • Split: training
    • Path: data/train-*

Dataset Curators

Smithsonian National Museum of Natural History, Department of Entomology. Jessica Bird (Entomology Data Manager) is the primary contact.

License

Public domain, Creative Commons CC0.

Citation

Orrell T, Informatics Office (2023). NMNH Extant Specimen Records (USNM, US). Version 1.72. National Museum of Natural History, Smithsonian Institution. Occurrence dataset. https://collections.nmnh.si.edu/ipt/resource?r=nmnh_extant_dwc-a&v=1.72

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Bee Research
Natural History Dataset

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.