Dataset assetOpen Source CommunityLong Text ProcessingNLP Reasoning Evaluation

RMT-team/babilong-1k-samples

BABILong is a benchmark for evaluating NLP models' ability to handle distributed facts across long documents. It comprises nine configurations corresponding to different sequence lengths (0k, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k). The dataset builds on the bAbI fact set and PG‑19 as background text, simulating the task of locating crucial information among extensive irrelevant details. Additionally, it includes ten tasks to assess basic reasoning capabilities.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jun 17, 2024

Signals

110 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Name

BABILong

Configurations

Configuration Names: 0k, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k
Features:
- question: String
- target: String
- input: String

Size

Download Size: Varies per configuration, ranging from 8,143,277 bytes to 1,567,936,012 bytes
Dataset Size: Varies per configuration, ranging from 13,838,997 bytes to 2,532,955,312 bytes

Splits

Split Names: qa1, qa2, …, qa20
Example Count: Between 999 and 1,000 per split
Bytes per Split: Varies with configuration, ranging from 2,801,155 bytes to 507,056,606 bytes

File Paths

Data files follow the pattern <config_name>/qa<split_number>-*

Tasks

Task Types: Include single supporting fact, two supporting facts, three supporting facts, etc.
Number of Facts: Ranges from 2 to 126 depending on the task
Supporting Facts: 1 to 3 per task

Intended Use

Evaluating NLP model performance on handling distributed facts in long documents

Sources

Uses the bAbI dataset for facts and PG‑19 for background text

License

Apache 2.0 License, BSD License

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio