High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

IAM Handwriting dataset

Handwritten Recognition

Text Dataset

The IAM Handwriting dataset contains 115,320 isolated and labeled word images written by 657 different authors.

github

View Details

stas/openwebtext-10k

Natural Language Processing

Text Dataset

This is a subset of the OpenWebText dataset, named stas/openwebtext-10k, which is an open‑source replica of OpenAI's WebText dataset. The subset contains the first 10,000 records of the original dataset, primarily for testing purposes. It includes a single split called `train` with a `text` feature, comprising 10,000 rows. The compressed size is approximately 15 MB and the uncompressed size is about 50 MB.

hugging_face

View Details