Back to datasets
Dataset assetOpen Source CommunityImage SegmentationCircuit Diagram Recognition

lowercaseonly/cghd

The GTDB‑HD public ground‑truth dataset for hand‑drawn circuit diagrams contains images of hand‑drawn electrical schematics together with bounding‑box annotations for object detection and segmentation ground‑truth files. It is intended for training models to extract electrical diagrams from raster graphics. The dataset is organised into folders storing images, annotations, instance‑segmentation polygons and segmentation maps, and includes a README with usage guide, contribution instructions, citation format and license information.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 11, 2024
Signals
316 views
Availability
Linked source ready
Overview

Dataset description and usage context

Public Hand‑drawn Circuit Diagram Dataset (GTDB‑HD)

Dataset Overview

  • Name: Public Hand‑drawn Circuit Diagram Dataset (GTDB‑HD)
  • License: Creative Commons Attribution Share Alike 3.0
  • Size: 1K < n < 10K
  • Task Categories:
    • Object Detection
    • Image Segmentation
  • Languages:
    • English
    • German

Dataset Structure

gtdh‑hd │ README.md # This file │ classes.json # Class list │ classes_color.json # Class‑to‑color mapping │ classes_discontinuous.json # Class shape information │ classes_ports.json # Electrical port descriptions │ consistency.py # Statistics and consistency checks | loader.py # Simple dataset loading and storage utilities │ segmentation.py # Multi‑class segmentation generation │ utils.py # Helper functions │ requirements.txt # Script dependencies └───drafter_D └───annotations # Bounding‑box annotations │ │ CX_DY_PZ.xml │ │ ... └───images # Raw images │ │ CX_DY_PZ.jpg │ │ ... └───instances # Instance‑segmentation polygons │ │ CX_DY_PZ.json │ │ ... └───segmentation # Binary segmentation maps (stroke vs background) │ │ CX_DY_PZ.jpg │ │ ... ...

File Naming Rules

  • D is the global drafter identifier
  • X is the global circuit identifier (12 circuits per drafter)
  • Y is the local diagram identifier (2 diagrams per circuit)
  • Z is the local image identifier (4 images per diagram)

Image Files

  • Each image is RGB, stored as jpg, jpeg or png (mixed case extensions).

Bounding‑Box Annotations

  • Category labels and mapping tables are in classes.json.
  • Annotations follow the PASCAL VOC format.
  • Every raw image has a corresponding annotation file.

Known Annotation Issues

  • C25_D1_P4 truncates a text label
  • C27 truncates some texts
  • C29_D1_P1 has an extra text
  • C31_D2_P4 is missing a text
  • C33_D1_P4 is missing a text
  • C46_D2_P2 truncates a text

Instance Segmentation

  • Each binary segmentation map has a matching instance‑segmentation polygon file in labelme format.

Segmentation Maps

  • Binary maps have the same resolution as their images and contain only black‑white pixels representing drawing strokes vs background.

Netlist Files

  • Some images include netlist files in ASC format.

Consistency and Statistics

  • Scripts are provided for class distribution, bounding‑box size statistics and consistency checks.

  • Run scripts as:

    $ python3 consistency.py

    or for a specific drafter:

    $ python3 consistency.py 15

Multi‑Class (Instance) Segmentation Processing

  • Scripts are provided to handle new and existing instance segmentation files.

    $ python3 segmentation.py <drafter_id>

    where <command> can be:

    • transform
    • wire
    • keypoint
    • create
    • refine
    • pipeline
    • assign

Dataset Loader

  • Loading and writing utilities are included for training.

    from loader import read_dataset

    db_bb = read_dataset() # Load all bounding‑box annotations db_seg = read_dataset(segmentation=True) # Load all polygon annotations db_bb_val = read_dataset(drafter=12) # Load annotations for drafter 12

    len(db_bb) # Number of samples db_bb[5] # Retrieve any sample

    db = read_images(drafter=12) # Returns list of (image, annotation) pairs db = read_snippets(drafter=12) # Returns list of (image, annotation) pairs

Citation

@inproceedings{thoma2021public, title={A Public Ground‑Truth Dataset for Handwritten Circuit Diagram Images}, author={Thoma, Felix and Bayer, Johannes and Li, Yakun and Dengel, Andreas}, booktitle={International Conference on Document Analysis and Recognition}, pages={20--27}, year={2021}, organization={Springer} }

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio