geometric-shapes
The Geometric Shapes dataset is a synthetic collection containing images of various geometric shapes overlaid with random text. Each image has a random‑colored background, a shape (or just text), and a short random string partially occluding the shape. It is designed for shape classification, image recognition, and robustness testing of computer‑vision models.
Dataset description and usage context
Geometric Shapes Dataset
Dataset Description
Dataset Overview
Geometric Shapes Dataset is a synthetic collection containing images of various geometric shapes overlaid with random text. Each image displays a polygon (or only text) on a randomly colored background, with a short random string partially occluding the shape. The dataset is designed for shape classification, image recognition, and robustness testing of computer‑vision models.
Supported Tasks and Leaderboards
- Image Classification: The main task is multi‑class image classification, aiming to identify the shape type in each image.
Data Instances
Each data instance includes:
- An image (50 × 50 px, RGB)
- A label indicating the shape type
Data Fields
image: a 50 × 50 px RGB image stored as a NumPy array.label: a string indicating the shape type. Labels correspond to the following shapes:- "1": No shape (only random text on a colored background)
- "2": Circle‑like shape (approximated by a 100‑sided polygon)
- "3": Triangle
- "4": Square
- "5": Pentagon
Each image contains:
- A randomly colored background
- The designated geometric shape (except label "1"), filled with a different random color
- A short (4‑character) random alphanumeric text overlaid on top, partially occluding the shape
Note: The “circle” (label "2") is approximated by a 100‑sided polygon and appears circular at the given resolution.
Data Splits
The dataset is divided into training (70 %), validation (10 %), and test (20 %) sets.
Dataset Creation
Rationale
The dataset provides a simple, controlled environment for testing image‑classification models, especially when geometric shapes are partially occluded by text.
Source Data
Data are generated synthetically using custom Python scripts; no external data sources are used.
Annotation
Labels are generated automatically during image creation.
Personal and Sensitive Information
The dataset contains no personal or sensitive information.
Known Limitations
- The dataset includes only a predefined set of shapes.
- Image resolution is fixed at 50 × 50 px.
- Text overlay is always present, which may not reflect all real‑world scenarios.
License
The dataset is released under the MIT License.
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.