JUHE API Marketplace
DATASET
Open Source Community

geometric-shapes

The Geometric Shapes dataset is a synthetic collection containing images of various geometric shapes overlaid with random text. Each image has a random‑colored background, a shape (or just text), and a short random string partially occluding the shape. It is designed for shape classification, image recognition, and robustness testing of computer‑vision models.

Updated 9/11/2024
huggingface

Description

Geometric Shapes Dataset

Dataset Description

Dataset Overview

Geometric Shapes Dataset is a synthetic collection containing images of various geometric shapes overlaid with random text. Each image displays a polygon (or only text) on a randomly colored background, with a short random string partially occluding the shape. The dataset is designed for shape classification, image recognition, and robustness testing of computer‑vision models.

Supported Tasks and Leaderboards

  • Image Classification: The main task is multi‑class image classification, aiming to identify the shape type in each image.

Data Instances

Each data instance includes:

  • An image (50 × 50 px, RGB)
  • A label indicating the shape type

Data Fields

  • image: a 50 × 50 px RGB image stored as a NumPy array.
  • label: a string indicating the shape type. Labels correspond to the following shapes:
    • "1": No shape (only random text on a colored background)
    • "2": Circle‑like shape (approximated by a 100‑sided polygon)
    • "3": Triangle
    • "4": Square
    • "5": Pentagon

Each image contains:

  1. A randomly colored background
  2. The designated geometric shape (except label "1"), filled with a different random color
  3. A short (4‑character) random alphanumeric text overlaid on top, partially occluding the shape

Note: The “circle” (label "2") is approximated by a 100‑sided polygon and appears circular at the given resolution.

Data Splits

The dataset is divided into training (70 %), validation (10 %), and test (20 %) sets.

Dataset Creation

Rationale

The dataset provides a simple, controlled environment for testing image‑classification models, especially when geometric shapes are partially occluded by text.

Source Data

Data are generated synthetically using custom Python scripts; no external data sources are used.

Annotation

Labels are generated automatically during image creation.

Personal and Sensitive Information

The dataset contains no personal or sensitive information.

Known Limitations

  • The dataset includes only a predefined set of shapes.
  • Image resolution is fixed at 50 × 50 px.
  • Text overlay is always present, which may not reflect all real‑world scenarios.

License

The dataset is released under the MIT License.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Geometric Shapes
Computer Vision

Source

Organization: huggingface

Created: 9/2/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.