Explore high-quality datasets for your AI and machine learning projects.
BABILong is a benchmark for evaluating NLP models' ability to handle distributed facts across long documents. It comprises nine configurations corresponding to different sequence lengths (0k, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k). The dataset builds on the bAbI fact set and PG‑19 as background text, simulating the task of locating crucial information among extensive irrelevant details. Additionally, it includes ten tasks to assess basic reasoning capabilities.