Back to datasets
Dataset assetOpen Source CommunityQuestion Answering SystemsText Summarization
qmsum
The dataset is used for the QMSum task and contains two features: text content and answer length. It is split into a training set with 1,257 samples and a test set with 200 samples. The test set originates from the LongBench QMSum task, while the training set comes from the original QMSum repository. No built‑in validation set is provided; it is recommended to partition a portion of the training set for validation.
Source
huggingface
Created
Sep 25, 2024
Updated
Sep 25, 2024
Signals
201 views
Availability
Linked source ready
Overview
Dataset description and usage context
QMSum Dataset Overview
Dataset Information
Features
- text: data type
string - answer_length: data type
int64
Data Splits
- train: contains 1,257 samples, occupying 66,437,471 bytes
- test: contains 200 samples, occupying 11,622,102 bytes
Dataset Size
- Download Size: 32,972,862 bytes
- Total Dataset Size: 78,059,573 bytes
Configuration
- config_name: default
- data_files:
- train: path
data/train-* - test: path
data/test-*
- train: path
- data_files:
Additional Information
- test dataset: from LongBench QMSum task
- train dataset: from the original QMSum repository
- validation set: none; it is recommended to carve out a portion of the training data for validation
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.