JUHE API Marketplace
DATASET
Open Source Community

NLPCC-KBQA

The NLPCC‑KBQA dataset contains data used in the NLPCC open‑domain QA evaluations from 2016 to 2018. It includes 24,479 training instances and test sets for each of the three years. Each instance consists of a knowledge triple and a manually annotated natural‑language question derived from that triple.

Updated 12/7/2021
github

Description

Dataset Overview

Dataset Name

  • NLPCC‑KBQA

Dataset Content

  • Training set: nlpcc2016-2018.kbqa.train

    • Contains 24,479 training instances
    • Each instance includes a knowledge triple <subject entity, relation, object entity> and a human‑written natural‑language question based on the triple; the answer is the object entity.
  • Test sets:

    • nlpcc2016.kbqa.test
    • nlpcc2017.kbqa.test
    • nlpcc2018.kbqa.test
    • Correspond to the NLPCC KBQA tests for 2016, 2017, and 2018 respectively
    • Note: the 2017 test set does not provide the knowledge triple for each test instance.

Citation

  • When using the dataset, please cite the following papers:
    • Duan, Nan. "Overview of the NLPCC‑ICCPOL 2016 Shared Task: Open Domain Chinese Question Answering". 2016.
    • Duan, Nan and Tang, Duyu. "Overview of the NLPCC 2017 Shared Task: Open Domain Chinese Question Answering". 2018.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Natural Language Processing
Knowledge Base QA

Source

Organization: github

Created: 5/27/2021

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.