Gharaee/BIOSCAN_1M_Insect_Dataset
The BIOSCAN_1M insect dataset provides information about insects. Each record includes four primary attributes: DNA barcode sequence, barcode index number (BIN), taxonomic rank annotation, and RGB image. The DNA barcode sequence shows the nucleotide arrangement, BIN serves as an alternative to Linnaean names, providing gene‑centered taxonomy, taxonomic rank annotation classifies organisms hierarchically based on evolutionary relationships, and the RGB image displays raw images from the 16 most densely sampled insect orders. The dataset also illustrates class distribution and class imbalance, which are inherent characteristics of insect communities.
Dataset description and usage context
BIOSCAN_1M Insect Dataset
Dataset Overview
BIOSCAN‑1M Insect Dataset provides information about insects, with each record containing the following four main attributes:
- DNA Barcode Sequence
- Barcode Index Number (BIN)
- Taxonomic Rank Annotation
- RGB Image
I. DNA Barcode Sequence
The provided DNA barcode sequence displays the nucleotide arrangement:
- Adenine (A): Red
- Thymine (T): Blue
- Cytosine (C): Green
- Guanine (G): Yellow
Example sequence:
TTTATATTTTATTTTTGGAGCATGATCAGGAATAGTTGGAACTTCAATAAGTTTATTAATTCGAACAGAATTAAGCCAACCAGGAATTTTTA …
II. Barcode Index Number (BIN)
BIN serves as an alternative to Linnaean names, offering a gene‑centered classification.
Example BIN:
BOLD:AER5166
III. Taxonomic Rank Annotation
Annotations are organized hierarchically based on evolutionary relationships, grouping species that share common features and genetic similarity.
IV. RGB Image
Images are sourced from the 16 most densely sampled orders in the BIOSCAN‑1M Insect Dataset. Below each image, a number indicates the count of images in that class, clearly showing the class imbalance within the dataset.
![]() | ![]() | ![]() | ![]() |
| Diptera: 896,234 | Hymenoptera: 89,311 | Coleoptera: 47,328 | Hemiptera: 46,970 |
![]() | ![]() | ![]() | ![]() |
| Lepidoptera: 32,538 | Psocodea: 9,635 | Thysanoptera: 2,088 | Trichoptera: 1,296 |
![]() | ![]() | ![]() | ![]() |
| Orthoptera: 1,057 | Blattodea: 824 | Neuroptera: 676 | Ephemeroptera: 96 |
![]() | ![]() | ![]() | ![]() |
| Dermaptera: 66 | Archaeognatha: 63 | Plecoptera: 30 | Embioptera: 6 |
Class Distribution
The dataset visualizes class distribution and imbalance, reflecting an inherent characteristic of insect communities.
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.















