JUHE API Marketplace
DATASET
Open Source Community

Gharaee/BIOSCAN_1M_Insect_Dataset

The BIOSCAN_1M insect dataset provides information about insects. Each record includes four primary attributes: DNA barcode sequence, barcode index number (BIN), taxonomic rank annotation, and RGB image. The DNA barcode sequence shows the nucleotide arrangement, BIN serves as an alternative to Linnaean names, providing gene‑centered taxonomy, taxonomic rank annotation classifies organisms hierarchically based on evolutionary relationships, and the RGB image displays raw images from the 16 most densely sampled insect orders. The dataset also illustrates class distribution and class imbalance, which are inherent characteristics of insect communities.

Updated 6/20/2024
hugging_face

Description

BIOSCAN_1M Insect Dataset

Dataset Overview

BIOSCAN‑1M Insect Dataset provides information about insects, with each record containing the following four main attributes:

  1. DNA Barcode Sequence
  2. Barcode Index Number (BIN)
  3. Taxonomic Rank Annotation
  4. RGB Image

I. DNA Barcode Sequence

The provided DNA barcode sequence displays the nucleotide arrangement:

  • Adenine (A): Red
  • Thymine (T): Blue
  • Cytosine (C): Green
  • Guanine (G): Yellow

Example sequence:

TTTATATTTTATTTTTGGAGCATGATCAGGAATAGTTGGAACTTCAATAAGTTTATTAATTCGAACAGAATTAAGCCAACCAGGAATTTTTA …

II. Barcode Index Number (BIN)

BIN serves as an alternative to Linnaean names, offering a gene‑centered classification.

Example BIN:

BOLD:AER5166

III. Taxonomic Rank Annotation

Annotations are organized hierarchically based on evolutionary relationships, grouping species that share common features and genetic similarity.

IV. RGB Image

Images are sourced from the 16 most densely sampled orders in the BIOSCAN‑1M Insect Dataset. Below each image, a number indicates the count of images in that class, clearly showing the class imbalance within the dataset.

Diptera: 896,234Hymenoptera: 89,311Coleoptera: 47,328Hemiptera: 46,970
Lepidoptera: 32,538Psocodea: 9,635Thysanoptera: 2,088Trichoptera: 1,296
Orthoptera: 1,057Blattodea: 824Neuroptera: 676Ephemeroptera: 96
Dermaptera: 66Archaeognatha: 63Plecoptera: 30Embioptera: 6

Class Distribution

The dataset visualizes class distribution and imbalance, reflecting an inherent characteristic of insect communities.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Biodiversity
Insect Genetic Classification

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.