Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingKnowledge Base QA

NLPCC-KBQA

The NLPCC‑KBQA dataset contains data used in the NLPCC open‑domain QA evaluations from 2016 to 2018. It includes 24,479 training instances and test sets for each of the three years. Each instance consists of a knowledge triple and a manually annotated natural‑language question derived from that triple.

Source
github
Created
May 27, 2021
Updated
Dec 7, 2021
Signals
135 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • NLPCC‑KBQA

Dataset Content

  • Training set: nlpcc2016-2018.kbqa.train

    • Contains 24,479 training instances
    • Each instance includes a knowledge triple <subject entity, relation, object entity> and a human‑written natural‑language question based on the triple; the answer is the object entity.
  • Test sets:

    • nlpcc2016.kbqa.test
    • nlpcc2017.kbqa.test
    • nlpcc2018.kbqa.test
    • Correspond to the NLPCC KBQA tests for 2016, 2017, and 2018 respectively
    • Note: the 2017 test set does not provide the knowledge triple for each test instance.

Citation

  • When using the dataset, please cite the following papers:
    • Duan, Nan. "Overview of the NLPCC‑ICCPOL 2016 Shared Task: Open Domain Chinese Question Answering". 2016.
    • Duan, Nan and Tang, Duyu. "Overview of the NLPCC 2017 Shared Task: Open Domain Chinese Question Answering". 2018.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio