lowercaseonly/cghd

The GTDB‑HD public ground‑truth dataset for hand‑drawn circuit diagrams contains images of hand‑drawn electrical schematics together with bounding‑box annotations for object detection and segmentation ground‑truth files. It is intended for training models to extract electrical diagrams from raster graphics. The dataset is organised into folders storing images, annotations, instance‑segmentation polygons and segmentation maps, and includes a README with usage guide, contribution instructions, citation format and license information.

Updated 7/11/2024

hugging_face

Description

Public Hand‑drawn Circuit Diagram Dataset (GTDB‑HD)

Dataset Overview

Name: Public Hand‑drawn Circuit Diagram Dataset (GTDB‑HD)
License: Creative Commons Attribution Share Alike 3.0
Size: 1K < n < 10K
Task Categories:
- Object Detection
- Image Segmentation
Languages:
- English
- German

Dataset Structure

gtdh‑hd │ README.md # This file │ classes.json # Class list │ classes_color.json # Class‑to‑color mapping │ classes_discontinuous.json # Class shape information │ classes_ports.json # Electrical port descriptions │ consistency.py # Statistics and consistency checks | loader.py # Simple dataset loading and storage utilities │ segmentation.py # Multi‑class segmentation generation │ utils.py # Helper functions │ requirements.txt # Script dependencies └───drafter_D └───annotations # Bounding‑box annotations │ │ CX_DY_PZ.xml │ │ ... └───images # Raw images │ │ CX_DY_PZ.jpg │ │ ... └───instances # Instance‑segmentation polygons │ │ CX_DY_PZ.json │ │ ... └───segmentation # Binary segmentation maps (stroke vs background) │ │ CX_DY_PZ.jpg │ │ ... ...

File Naming Rules

D is the global drafter identifier
X is the global circuit identifier (12 circuits per drafter)
Y is the local diagram identifier (2 diagrams per circuit)
Z is the local image identifier (4 images per diagram)

Image Files

Each image is RGB, stored as jpg, jpeg or png (mixed case extensions).

Bounding‑Box Annotations

Category labels and mapping tables are in classes.json.
Annotations follow the PASCAL VOC format.
Every raw image has a corresponding annotation file.

Known Annotation Issues

C25_D1_P4 truncates a text label
C27 truncates some texts
C29_D1_P1 has an extra text
C31_D2_P4 is missing a text
C33_D1_P4 is missing a text
C46_D2_P2 truncates a text

Instance Segmentation

Each binary segmentation map has a matching instance‑segmentation polygon file in labelme format.

Segmentation Maps

Binary maps have the same resolution as their images and contain only black‑white pixels representing drawing strokes vs background.

Netlist Files

Some images include netlist files in ASC format.

Consistency and Statistics

Scripts are provided for class distribution, bounding‑box size statistics and consistency checks.
Run scripts as:

$ python3 consistency.py

or for a specific drafter:

$ python3 consistency.py 15

Multi‑Class (Instance) Segmentation Processing

Scripts are provided to handle new and existing instance segmentation files.

$ python3 segmentation.py <drafter_id>

where <command> can be:
- transform
- wire
- keypoint
- create
- refine
- pipeline
- assign

Dataset Loader

Loading and writing utilities are included for training.

from loader import read_dataset

db_bb = read_dataset() # Load all bounding‑box annotations db_seg = read_dataset(segmentation=True) # Load all polygon annotations db_bb_val = read_dataset(drafter=12) # Load annotations for drafter 12

len(db_bb) # Number of samples db_bb[5] # Retrieve any sample

db = read_images(drafter=12) # Returns list of (image, annotation) pairs db = read_snippets(drafter=12) # Returns list of (image, annotation) pairs

Citation

@inproceedings{thoma2021public, title={A Public Ground‑Truth Dataset for Handwritten Circuit Diagram Images}, author={Thoma, Felix and Bayer, Johannes and Li, Yakun and Dengel, Andreas}, booktitle={International Conference on Document Analysis and Recognition}, pages={20--27}, year={2021}, organization={Springer} }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Circuit Diagram Recognition

Image Segmentation

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →