RMT-team/babilong-1k-samples
Long Text ProcessingNLP Reasoning Evaluation
BABILong is a benchmark for evaluating NLP models' ability to handle distributed facts across long documents. It comprises nine configurations corresponding to different sequence lengths (0k, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k). The dataset builds on the bAbI fact set and PG‑19 as background text, simulating the task of locating crucial information among extensive irrelevant details. Additionally, it includes ten tasks to assess basic reasoning capabilities.
Source hugging_faceUpdated Jun 17, 2024110 viewsLinked
Inspect dataset