Explore high-quality datasets for your AI and machine learning projects.
TriviaQA is a reading‑comprehension dataset containing over 650,000 question‑answer‑evidence triples. It includes 95,000 question‑answer pairs authored by trivia enthusiasts and independently collected evidence documents, with an average of six documents per question, providing high‑quality distant supervision. The dataset is monolingual (English) and is suitable for QA and text‑generation tasks.
DROP是一个众包创建的、包含约96,000个问题的阅读理解基准数据集,要求系统在段落中进行引用解析并执行离散操作(如加法、计数或排序)。这些操作需要对段落内容有更全面的理解,超越了之前数据集的要求。数据集包含段落、问题和答案跨度等字段,分为训练集和验证集,分别包含77,400和9,535个示例。