Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingText Classification
abdiharyadi/eli5-id-preprocessed-tokenized-filtered
The dataset contains training features: input_ids, attention_mask, and labels, each represented as integer sequences. The training split comprises 443,918 examples with a total size of approximately 1,004,301,613.693275 bytes. The download size is 235,069,151 bytes.
Source
hugging_face
Created
Nov 28, 2025
Updated
May 23, 2024
Signals
87 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Information
Features
- input_ids: sequence of type
int32 - attention_mask: sequence of type
int8 - labels: sequence of type
int64
Data Splits
- train:
- Bytes: 1,004,301,613.693275
- Samples: 443,918
Data Size
- Download Size: 235,069,151 bytes
- Dataset Size: 1,004,301,613.693275 bytes
Configuration
- config_name: default
- data_files:
- split: train
- path: data/train-*
- data_files:
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.