DATASET
Open Source Community
NLPCC-KBQA
The NLPCC‑KBQA dataset contains data used in the NLPCC open‑domain QA evaluations from 2016 to 2018. It includes 24,479 training instances and test sets for each of the three years. Each instance consists of a knowledge triple and a manually annotated natural‑language question derived from that triple.
Updated 12/7/2021
github
Description
Dataset Overview
Dataset Name
- NLPCC‑KBQA
Dataset Content
-
Training set:
nlpcc2016-2018.kbqa.train- Contains 24,479 training instances
- Each instance includes a knowledge triple
<subject entity, relation, object entity>and a human‑written natural‑language question based on the triple; the answer is the object entity.
-
Test sets:
nlpcc2016.kbqa.testnlpcc2017.kbqa.testnlpcc2018.kbqa.test- Correspond to the NLPCC KBQA tests for 2016, 2017, and 2018 respectively
- Note: the 2017 test set does not provide the knowledge triple for each test instance.
Citation
- When using the dataset, please cite the following papers:
- Duan, Nan. "Overview of the NLPCC‑ICCPOL 2016 Shared Task: Open Domain Chinese Question Answering". 2016.
- Duan, Nan and Tang, Duyu. "Overview of the NLPCC 2017 Shared Task: Open Domain Chinese Question Answering". 2018.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Natural Language Processing
Knowledge Base QA
Source
Organization: github
Created: 5/27/2021
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.