Dataset assetOpen Source CommunityNatural Language ProcessingReading Comprehension

ucinlp/drop

DROP是一个众包创建的、包含约96,000个问题的阅读理解基准数据集，要求系统在段落中进行引用解析并执行离散操作（如加法、计数或排序）。这些操作需要对段落内容有更全面的理解，超越了之前数据集的要求。数据集包含段落、问题和答案跨度等字段，分为训练集和验证集，分别包含77,400和9,535个示例。

Source

hugging_face

Created

Nov 28, 2025

Updated

Jan 17, 2024

Signals

154 views

Availability

Linked source ready

Overview

Dataset description and usage context

数据集概述

数据集名称

名称: DROP
别名: drop

数据集属性

语言: 英语 (en)
许可证: CC-BY-SA-4.0
多语言性: 单语种
大小: 10K<n<100K
来源: 原始数据
任务类别:
- 问答
- 文本到文本生成
任务ID:
- 提取式问答 (extractive-qa)
- 摘要式问答 (abstractive-qa)

数据集结构

特征:
- section_id: 字符串类型
- query_id: 字符串类型
- passage: 字符串类型
- question: 字符串类型
- answers_spans: 序列类型，包含:
  - spans: 字符串类型
  - types: 字符串类型
数据分割:
- 训练集: 77400个样本，105572506字节
- 验证集: 9535个样本，11737755字节

数据集创建

注释创建者: 众包
语言创建者: 众包

数据集下载和大小

下载大小: 11538387字节
数据集大小: 117310261字节

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio