lowercaseonly/cghd
The GTDB‑HD public ground‑truth dataset for hand‑drawn circuit diagrams contains images of hand‑drawn electrical schematics together with bounding‑box annotations for object detection and segmentation ground‑truth files. It is intended for training models to extract electrical diagrams from raster graphics. The dataset is organised into folders storing images, annotations, instance‑segmentation polygons and segmentation maps, and includes a README with usage guide, contribution instructions, citation format and license information.
Description
Public Hand‑drawn Circuit Diagram Dataset (GTDB‑HD)
Dataset Overview
- Name: Public Hand‑drawn Circuit Diagram Dataset (GTDB‑HD)
- License: Creative Commons Attribution Share Alike 3.0
- Size: 1K < n < 10K
- Task Categories:
- Object Detection
- Image Segmentation
- Languages:
- English
- German
Dataset Structure
gtdh‑hd │ README.md # This file │ classes.json # Class list │ classes_color.json # Class‑to‑color mapping │ classes_discontinuous.json # Class shape information │ classes_ports.json # Electrical port descriptions │ consistency.py # Statistics and consistency checks | loader.py # Simple dataset loading and storage utilities │ segmentation.py # Multi‑class segmentation generation │ utils.py # Helper functions │ requirements.txt # Script dependencies └───drafter_D └───annotations # Bounding‑box annotations │ │ CX_DY_PZ.xml │ │ ... └───images # Raw images │ │ CX_DY_PZ.jpg │ │ ... └───instances # Instance‑segmentation polygons │ │ CX_DY_PZ.json │ │ ... └───segmentation # Binary segmentation maps (stroke vs background) │ │ CX_DY_PZ.jpg │ │ ... ...
File Naming Rules
Dis the global drafter identifierXis the global circuit identifier (12 circuits per drafter)Yis the local diagram identifier (2 diagrams per circuit)Zis the local image identifier (4 images per diagram)
Image Files
- Each image is RGB, stored as
jpg,jpegorpng(mixed case extensions).
Bounding‑Box Annotations
- Category labels and mapping tables are in
classes.json. - Annotations follow the PASCAL VOC format.
- Every raw image has a corresponding annotation file.
Known Annotation Issues
- C25_D1_P4 truncates a text label
- C27 truncates some texts
- C29_D1_P1 has an extra text
- C31_D2_P4 is missing a text
- C33_D1_P4 is missing a text
- C46_D2_P2 truncates a text
Instance Segmentation
- Each binary segmentation map has a matching instance‑segmentation polygon file in labelme format.
Segmentation Maps
- Binary maps have the same resolution as their images and contain only black‑white pixels representing drawing strokes vs background.
Netlist Files
- Some images include netlist files in ASC format.
Consistency and Statistics
-
Scripts are provided for class distribution, bounding‑box size statistics and consistency checks.
-
Run scripts as:
$ python3 consistency.py
or for a specific drafter:
$ python3 consistency.py 15
Multi‑Class (Instance) Segmentation Processing
-
Scripts are provided to handle new and existing instance segmentation files.
$ python3 segmentation.py
<drafter_id> where
<command>can be:transformwirekeypointcreaterefinepipelineassign
Dataset Loader
-
Loading and writing utilities are included for training.
from loader import read_dataset
db_bb = read_dataset() # Load all bounding‑box annotations db_seg = read_dataset(segmentation=True) # Load all polygon annotations db_bb_val = read_dataset(drafter=12) # Load annotations for drafter 12
len(db_bb) # Number of samples db_bb[5] # Retrieve any sample
db = read_images(drafter=12) # Returns list of (image, annotation) pairs db = read_snippets(drafter=12) # Returns list of (image, annotation) pairs
Citation
@inproceedings{thoma2021public, title={A Public Ground‑Truth Dataset for Handwritten Circuit Diagram Images}, author={Thoma, Felix and Bayer, Johannes and Li, Yakun and Dengel, Andreas}, booktitle={International Conference on Document Analysis and Recognition}, pages={20--27}, year={2021}, organization={Springer} }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.