DATASET

Open Source Community

Chinese-Literature-NER-RE-Dataset

A discourse‑level Named Entity Recognition and Relation Extraction dataset for Chinese literary texts.

Updated 4/1/2020

github

Description

Dataset Overview

Dataset Name

Chinese-Literature-NER-RE-Dataset

Dataset Purpose

For Named Entity Recognition (NER) and Relation Extraction (RE) on Chinese literary texts.

Dataset Description

Detailed dataset description is provided in the arXiv paper.

Tag Set

Entity tags: defines 7 entity types.
Relation tags: defines 9 relation types.

Annotation Format

Entity Annotation

T tag: identifies an entity.
- Id: unique identifier of the entity in the document, starting from 0 and incremented for each new entity.
- Type: entity type, corresponding to one of the entity tags.
- Begin Index: starting index of the entity, starting from 0 and incremented per character.
- End Index: ending index of the entity, starting from 0 and incremented per character.
- Value: the word representing the identified object.

Relation Annotation

R tag: identifies a relation.
- Id: unique identifier of the relation in the document, starting from 0 and incremented for each new relation.
- Arg1 and Arg2: the two entities involved.
- Type: relation type, corresponding to one of the relation tags.

Citation Information

Authors: Jingjing Xu, Ji Wen, Xu Sun, Qi Su
Title: A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text
Year: 2017
Link: arXiv article link

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Natural Language Processing

Literary Text Analysis

Source

Organization: github

Created: 10/4/2019

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →