geometric-shapes
The Geometric Shapes dataset is a synthetic collection containing images of various geometric shapes overlaid with random text. Each image has a random‑colored background, a shape (or just text), and a short random string partially occluding the shape. It is designed for shape classification, image recognition, and robustness testing of computer‑vision models.
Description
Geometric Shapes Dataset
Dataset Description
Dataset Overview
Geometric Shapes Dataset is a synthetic collection containing images of various geometric shapes overlaid with random text. Each image displays a polygon (or only text) on a randomly colored background, with a short random string partially occluding the shape. The dataset is designed for shape classification, image recognition, and robustness testing of computer‑vision models.
Supported Tasks and Leaderboards
- Image Classification: The main task is multi‑class image classification, aiming to identify the shape type in each image.
Data Instances
Each data instance includes:
- An image (50 × 50 px, RGB)
- A label indicating the shape type
Data Fields
image: a 50 × 50 px RGB image stored as a NumPy array.label: a string indicating the shape type. Labels correspond to the following shapes:- "1": No shape (only random text on a colored background)
- "2": Circle‑like shape (approximated by a 100‑sided polygon)
- "3": Triangle
- "4": Square
- "5": Pentagon
Each image contains:
- A randomly colored background
- The designated geometric shape (except label "1"), filled with a different random color
- A short (4‑character) random alphanumeric text overlaid on top, partially occluding the shape
Note: The “circle” (label "2") is approximated by a 100‑sided polygon and appears circular at the given resolution.
Data Splits
The dataset is divided into training (70 %), validation (10 %), and test (20 %) sets.
Dataset Creation
Rationale
The dataset provides a simple, controlled environment for testing image‑classification models, especially when geometric shapes are partially occluded by text.
Source Data
Data are generated synthetically using custom Python scripts; no external data sources are used.
Annotation
Labels are generated automatically during image creation.
Personal and Sensitive Information
The dataset contains no personal or sensitive information.
Known Limitations
- The dataset includes only a predefined set of shapes.
- Image resolution is fixed at 50 × 50 px.
- Text overlay is always present, which may not reflect all real‑world scenarios.
License
The dataset is released under the MIT License.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 9/2/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.