JUHE API Marketplace
DATASET
Open Source Community

lowercaseonly/cghd

The GTDB‑HD public ground‑truth dataset for hand‑drawn circuit diagrams contains images of hand‑drawn electrical schematics together with bounding‑box annotations for object detection and segmentation ground‑truth files. It is intended for training models to extract electrical diagrams from raster graphics. The dataset is organised into folders storing images, annotations, instance‑segmentation polygons and segmentation maps, and includes a README with usage guide, contribution instructions, citation format and license information.

Updated 7/11/2024
hugging_face

Description

Public Hand‑drawn Circuit Diagram Dataset (GTDB‑HD)

Dataset Overview

  • Name: Public Hand‑drawn Circuit Diagram Dataset (GTDB‑HD)
  • License: Creative Commons Attribution Share Alike 3.0
  • Size: 1K < n < 10K
  • Task Categories:
    • Object Detection
    • Image Segmentation
  • Languages:
    • English
    • German

Dataset Structure

gtdh‑hd │ README.md # This file │ classes.json # Class list │ classes_color.json # Class‑to‑color mapping │ classes_discontinuous.json # Class shape information │ classes_ports.json # Electrical port descriptions │ consistency.py # Statistics and consistency checks | loader.py # Simple dataset loading and storage utilities │ segmentation.py # Multi‑class segmentation generation │ utils.py # Helper functions │ requirements.txt # Script dependencies └───drafter_D └───annotations # Bounding‑box annotations │ │ CX_DY_PZ.xml │ │ ... └───images # Raw images │ │ CX_DY_PZ.jpg │ │ ... └───instances # Instance‑segmentation polygons │ │ CX_DY_PZ.json │ │ ... └───segmentation # Binary segmentation maps (stroke vs background) │ │ CX_DY_PZ.jpg │ │ ... ...

File Naming Rules

  • D is the global drafter identifier
  • X is the global circuit identifier (12 circuits per drafter)
  • Y is the local diagram identifier (2 diagrams per circuit)
  • Z is the local image identifier (4 images per diagram)

Image Files

  • Each image is RGB, stored as jpg, jpeg or png (mixed case extensions).

Bounding‑Box Annotations

  • Category labels and mapping tables are in classes.json.
  • Annotations follow the PASCAL VOC format.
  • Every raw image has a corresponding annotation file.

Known Annotation Issues

  • C25_D1_P4 truncates a text label
  • C27 truncates some texts
  • C29_D1_P1 has an extra text
  • C31_D2_P4 is missing a text
  • C33_D1_P4 is missing a text
  • C46_D2_P2 truncates a text

Instance Segmentation

  • Each binary segmentation map has a matching instance‑segmentation polygon file in labelme format.

Segmentation Maps

  • Binary maps have the same resolution as their images and contain only black‑white pixels representing drawing strokes vs background.

Netlist Files

  • Some images include netlist files in ASC format.

Consistency and Statistics

  • Scripts are provided for class distribution, bounding‑box size statistics and consistency checks.

  • Run scripts as:

    $ python3 consistency.py

    or for a specific drafter:

    $ python3 consistency.py 15

Multi‑Class (Instance) Segmentation Processing

  • Scripts are provided to handle new and existing instance segmentation files.

    $ python3 segmentation.py <drafter_id>

    where <command> can be:

    • transform
    • wire
    • keypoint
    • create
    • refine
    • pipeline
    • assign

Dataset Loader

  • Loading and writing utilities are included for training.

    from loader import read_dataset

    db_bb = read_dataset() # Load all bounding‑box annotations db_seg = read_dataset(segmentation=True) # Load all polygon annotations db_bb_val = read_dataset(drafter=12) # Load annotations for drafter 12

    len(db_bb) # Number of samples db_bb[5] # Retrieve any sample

    db = read_images(drafter=12) # Returns list of (image, annotation) pairs db = read_snippets(drafter=12) # Returns list of (image, annotation) pairs

Citation

@inproceedings{thoma2021public, title={A Public Ground‑Truth Dataset for Handwritten Circuit Diagram Images}, author={Thoma, Felix and Bayer, Johannes and Li, Yakun and Dengel, Andreas}, booktitle={International Conference on Document Analysis and Recognition}, pages={20--27}, year={2021}, organization={Springer} }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Circuit Diagram Recognition
Image Segmentation

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.