Back to datasets
Dataset assetOpen Source CommunityComputer VisionGeometric Shapes

geometric-shapes

The Geometric Shapes dataset is a synthetic collection containing images of various geometric shapes overlaid with random text. Each image has a random‑colored background, a shape (or just text), and a short random string partially occluding the shape. It is designed for shape classification, image recognition, and robustness testing of computer‑vision models.

Source
huggingface
Created
Sep 2, 2024
Updated
Sep 11, 2024
Signals
162 views
Availability
Linked source ready
Overview

Dataset description and usage context

Geometric Shapes Dataset

Dataset Description

Dataset Overview

Geometric Shapes Dataset is a synthetic collection containing images of various geometric shapes overlaid with random text. Each image displays a polygon (or only text) on a randomly colored background, with a short random string partially occluding the shape. The dataset is designed for shape classification, image recognition, and robustness testing of computer‑vision models.

Supported Tasks and Leaderboards

  • Image Classification: The main task is multi‑class image classification, aiming to identify the shape type in each image.

Data Instances

Each data instance includes:

  • An image (50 × 50 px, RGB)
  • A label indicating the shape type

Data Fields

  • image: a 50 × 50 px RGB image stored as a NumPy array.
  • label: a string indicating the shape type. Labels correspond to the following shapes:
    • "1": No shape (only random text on a colored background)
    • "2": Circle‑like shape (approximated by a 100‑sided polygon)
    • "3": Triangle
    • "4": Square
    • "5": Pentagon

Each image contains:

  1. A randomly colored background
  2. The designated geometric shape (except label "1"), filled with a different random color
  3. A short (4‑character) random alphanumeric text overlaid on top, partially occluding the shape

Note: The “circle” (label "2") is approximated by a 100‑sided polygon and appears circular at the given resolution.

Data Splits

The dataset is divided into training (70 %), validation (10 %), and test (20 %) sets.

Dataset Creation

Rationale

The dataset provides a simple, controlled environment for testing image‑classification models, especially when geometric shapes are partially occluded by text.

Source Data

Data are generated synthetically using custom Python scripts; no external data sources are used.

Annotation

Labels are generated automatically during image creation.

Personal and Sensitive Information

The dataset contains no personal or sensitive information.

Known Limitations

  • The dataset includes only a predefined set of shapes.
  • Image resolution is fixed at 50 × 50 px.
  • Text overlay is always present, which may not reflect all real‑world scenarios.

License

The dataset is released under the MIT License.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio