DATASET
Open Source Community
AnanthZeke/oscar_tamil_clean
The dataset oscar_tamil_clean may involve Tamil text data cleaning or processing, containing text and sentence token features.
Updated 4/5/2023
hugging_face
Description
Dataset Overview
Dataset Name
- Name: oscar_tamil_clean
Dataset Features
- Feature 1: text
- Data Type: string
- Feature 2: sent_token
- Data Type: string
- Attribute: sequence
Dataset Splits
- Training Set:
- Number of Samples: 1263180
- Data Size: 19533337624 bytes
Dataset Size
- Download Size: 6504957774 bytes
- Total Data Size: 19533337624 bytes
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Tamil
Natural Language Processing
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.