Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingHieroglyph Recognition

HamdiJr/Egyptian_hieroglyphs

The dataset contains 10 images of Egyptian hieroglyphs extracted from the book "The Pyramid of Unas", together with a language model. Each hieroglyph is manually annotated and labeled according to the Gardiner sign list. The dataset also includes automated detection results, tools for building the language model (e.g., vocabulary and n‑gram grammars), and a description of its structure and GPL non‑commercial license.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 22, 2022
Signals
147 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • Egyptian hieroglyphs 𓂀
  • Hieroglyphs image dataset along with Language Model

Dataset Features

  • Source: Built from 10 images in the book The Pyramid of Unas (Alexandre Piankoff, 1955).
  • Image Numbers: 3, 5, 7, 9, 20, 21, 22, 23, 39, 41.
  • Annotation: Every hieroglyph was manually annotated and labeled according to the Gardiner Sign List.
  • Image Naming: File names contain the label and image index.

Dataset Statistics

  • Total Images: 4,210 (including 179 labeled as UNKNOWN).
  • Total Classes: 171 (excluding the UNKNOWN class).

Annotation Accuracy

  • Note: Annotations may not be fully accurate; unidentified hieroglyphs are marked as “UNKNOWN”.

Data Processing

  • Manual Annotation: Hand‑annotated hieroglyphs.
  • Automated Detection: Automatic extraction of hieroglyphs using text detection methods, stored in Dataset/Automated/.
  • Location Information: x/y coordinates for each hieroglyph stored in the Location-folder.

Dataset Structure

  • Images: The 10 source images from The Pyramid of Unas.
  • Manual Annotation: Hieroglyph image crops with location data.
  • Automated Detection: Automatically detected hieroglyph crops with location data.
  • Language Model: Egyptian text, dictionary, and n‑grams from the JSesh database.

License

  • GPL: Non‑commercial use only.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio