Dataset assetOpen Source CommunityQuestion Answering SystemsText Summarization

qmsum

The dataset is used for the QMSum task and contains two features: text content and answer length. It is split into a training set with 1,257 samples and a test set with 200 samples. The test set originates from the LongBench QMSum task, while the training set comes from the original QMSum repository. No built‑in validation set is provided; it is recommended to partition a portion of the training set for validation.

Source

huggingface

Created

Sep 25, 2024

Updated

Sep 25, 2024

Signals

201 views

Availability

Linked source ready

Overview

Dataset description and usage context

QMSum Dataset Overview

Dataset Information

Features

text: data type string
answer_length: data type int64

Data Splits

train: contains 1,257 samples, occupying 66,437,471 bytes
test: contains 200 samples, occupying 11,622,102 bytes

Dataset Size

Download Size: 32,972,862 bytes
Total Dataset Size: 78,059,573 bytes

Configuration

config_name: default
- data_files:
  - train: path data/train-*
  - test: path data/test-*

Additional Information

test dataset: from LongBench QMSum task
train dataset: from the original QMSum repository
validation set: none; it is recommended to carve out a portion of the training data for validation

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio