Gharaee/BIOSCAN_1M_Insect_Dataset
The BIOSCAN_1M insect dataset provides information about insects. Each record includes four primary attributes: DNA barcode sequence, barcode index number (BIN), taxonomic rank annotation, and RGB image. The DNA barcode sequence shows the nucleotide arrangement, BIN serves as an alternative to Linnaean names, providing gene‑centered taxonomy, taxonomic rank annotation classifies organisms hierarchically based on evolutionary relationships, and the RGB image displays raw images from the 16 most densely sampled insect orders. The dataset also illustrates class distribution and class imbalance, which are inherent characteristics of insect communities.
Description
BIOSCAN_1M Insect Dataset
Dataset Overview
BIOSCAN‑1M Insect Dataset provides information about insects, with each record containing the following four main attributes:
- DNA Barcode Sequence
- Barcode Index Number (BIN)
- Taxonomic Rank Annotation
- RGB Image
I. DNA Barcode Sequence
The provided DNA barcode sequence displays the nucleotide arrangement:
- Adenine (A): Red
- Thymine (T): Blue
- Cytosine (C): Green
- Guanine (G): Yellow
Example sequence:
TTTATATTTTATTTTTGGAGCATGATCAGGAATAGTTGGAACTTCAATAAGTTTATTAATTCGAACAGAATTAAGCCAACCAGGAATTTTTA …
II. Barcode Index Number (BIN)
BIN serves as an alternative to Linnaean names, offering a gene‑centered classification.
Example BIN:
BOLD:AER5166
III. Taxonomic Rank Annotation
Annotations are organized hierarchically based on evolutionary relationships, grouping species that share common features and genetic similarity.
IV. RGB Image
Images are sourced from the 16 most densely sampled orders in the BIOSCAN‑1M Insect Dataset. Below each image, a number indicates the count of images in that class, clearly showing the class imbalance within the dataset.
![]() | ![]() | ![]() | ![]() |
| Diptera: 896,234 | Hymenoptera: 89,311 | Coleoptera: 47,328 | Hemiptera: 46,970 |
![]() | ![]() | ![]() | ![]() |
| Lepidoptera: 32,538 | Psocodea: 9,635 | Thysanoptera: 2,088 | Trichoptera: 1,296 |
![]() | ![]() | ![]() | ![]() |
| Orthoptera: 1,057 | Blattodea: 824 | Neuroptera: 676 | Ephemeroptera: 96 |
![]() | ![]() | ![]() | ![]() |
| Dermaptera: 66 | Archaeognatha: 63 | Plecoptera: 30 | Embioptera: 6 |
Class Distribution
The dataset visualizes class distribution and imbalance, reflecting an inherent characteristic of insect communities.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.















