NLPCC-KBQA

NLPCC-KBQA

The NLPCC‑KBQA dataset contains data used in the NLPCC open‑domain QA evaluations from 2016 to 2018. It includes 24,479 training instances and test sets for each of the three years. Each instance consists of a knowledge triple and a manually annotated natural‑language question derived from that triple.

Updated 12/7/2021

github

Dataset Overview

Dataset Name

NLPCC‑KBQA

Dataset Content

Training set: nlpcc2016-2018.kbqa.train
- Contains 24,479 training instances
- Each instance includes a knowledge triple <subject entity, relation, object entity> and a human‑written natural‑language question based on the triple; the answer is the object entity.
Test sets:
- nlpcc2016.kbqa.test
- nlpcc2017.kbqa.test
- nlpcc2018.kbqa.test
- Correspond to the NLPCC KBQA tests for 2016, 2017, and 2018 respectively
- Note: the 2017 test set does not provide the knowledge triple for each test instance.

Citation

When using the dataset, please cite the following papers:
- Duan, Nan. "Overview of the NLPCC‑ICCPOL 2016 Shared Task: Open Domain Chinese Question Answering". 2016.
- Duan, Nan and Tang, Duyu. "Overview of the NLPCC 2017 Shared Task: Open Domain Chinese Question Answering". 2018.

Description

Dataset Overview

Dataset Name

Dataset Content

Citation

AI studio

Access Dataset

Topics

Source