JUHE API Marketplace
DATASET
Open Source Community

ATHAR

The ATHAR dataset is a comprehensive collection of classical Arabic texts and their English translations, comprising approximately 66,000 parallel lines of original classical Arabic and corresponding English translations. The dataset is split into test and training subsets. Each record contains a classical Arabic text field and its English translation field. It is suitable for Arabic‑to‑English translation tasks.

Updated 7/30/2024
huggingface

Description

Dataset Overview

Dataset Information

Features

  • Name: arabic
    • Data Type: string
  • Name: english
    • Data Type: string

Splits

  • Name: train
    • Size (bytes): 27878710
    • Number of samples: 65043
  • Name: test
    • Size (bytes): 430500
    • Number of samples: 1000

Size

  • Download size: 14722818
  • Dataset size: 28309210

Config

  • Config name: default
    • Data files:
      • Split: train
        • Path: data/train-*
      • Split: test
        • Path: data/test-*

Task type

  • translation

Languages

  • ar
  • en

Name

  • ATHAR

Size category

  • 10K<n<100K

Dataset Structure

Fields

  • Field: Arabic(str)
    • Description: Classical Arabic text
  • Field: English(str)
    • Description: English translation of the classical Arabic text

Dataset Loading

Code example

python from datasets import load_dataset

athar = load_dataset("mohamed-khalil/ATHAR")

Sample Examples

Example

  • Arabic: فَلم يزل ... (original Arabic text)
  • English: Al-Fals continued to be worshipped until the advent of the Prophet, ... (original English translation)

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Arabic Translation
Language Translation

Source

Organization: huggingface

Created: 7/18/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.