ATHAR
The ATHAR dataset is a comprehensive collection of classical Arabic texts and their English translations, comprising approximately 66,000 parallel lines of original classical Arabic and corresponding English translations. The dataset is split into test and training subsets. Each record contains a classical Arabic text field and its English translation field. It is suitable for Arabic‑to‑English translation tasks.
Description
Dataset Overview
Dataset Information
Features
- Name: arabic
- Data Type: string
- Name: english
- Data Type: string
Splits
- Name: train
- Size (bytes): 27878710
- Number of samples: 65043
- Name: test
- Size (bytes): 430500
- Number of samples: 1000
Size
- Download size: 14722818
- Dataset size: 28309210
Config
- Config name: default
- Data files:
- Split: train
- Path: data/train-*
- Split: test
- Path: data/test-*
- Split: train
- Data files:
Task type
- translation
Languages
- ar
- en
Name
- ATHAR
Size category
- 10K<n<100K
Dataset Structure
Fields
- Field: Arabic(str)
- Description: Classical Arabic text
- Field: English(str)
- Description: English translation of the classical Arabic text
Dataset Loading
Code example
python from datasets import load_dataset
athar = load_dataset("mohamed-khalil/ATHAR")
Sample Examples
Example
- Arabic: فَلم يزل ... (original Arabic text)
- English: Al-Fals continued to be worshipped until the advent of the Prophet, ... (original English translation)
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 7/18/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.