DATASET
Open Source Community
abdiharyadi/eli5-id-preprocessed-tokenized-filtered
The dataset contains training features: input_ids, attention_mask, and labels, each represented as integer sequences. The training split comprises 443,918 examples with a total size of approximately 1,004,301,613.693275 bytes. The download size is 235,069,151 bytes.
Updated 5/23/2024
hugging_face
Description
Dataset Information
Features
- input_ids: sequence of type
int32 - attention_mask: sequence of type
int8 - labels: sequence of type
int64
Data Splits
- train:
- Bytes: 1,004,301,613.693275
- Samples: 443,918
Data Size
- Download Size: 235,069,151 bytes
- Dataset Size: 1,004,301,613.693275 bytes
Configuration
- config_name: default
- data_files:
- split: train
- path: data/train-*
- data_files:
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Natural Language Processing
Text Classification
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.