strombergnlp/nlpcc-stance
This is a Chinese stance‑prediction dataset specifically designed for detecting stance in Chinese micro‑blogs. The data originate from the NLPCC‑ICCPOL 2016 shared task, aiming to identify stance toward five target topics given annotated data. Each instance contains a unique ID, target, text, and stance label (against, favor, or none). The dataset was annotated by Chinese students, ensuring consistency and reliability. It contains only Chinese data and is released under a CC‑BY‑4.0 license.
Description
Dataset Overview
Basic Information
- Name: NLPCC 2016: Stance Detection in Chinese Microblogs
- Language: Chinese (
bcp47:zh) - License: Creative Commons Attribution 4.0 (CC‑BY‑4.0)
- Multilinguality: Monolingual
- Size: 1K < n < 10K
- Source: Original data
- Task Category: Text classification
- Task ID: Sentiment analysis
- Tags: Stance detection
Description
- Overview: This dataset focuses on stance prediction in Chinese micro‑blogs. It originates from the NLPCC‑ICCPOL 2016 shared task, a mandatory supervised task that detects stance toward five interest targets.
- Supported Task: Stance detection in Chinese micro‑blogs
Structure
- Instances: Each instance includes four fields:
id(unique identifier),target(stance target),text(text containing the stance),stance(stance class, 0 = AGAINST, 1 = FAVOR, 2 = NONE). - Fields:
id: String, unique identifiertarget: String, stance targettext: String, text containing the stancestance: Integer, stance class (0: AGAINST, 1: FAVOR, 2: NONE)
- Splits: Training set contains 2,986 instances
Creation
- Motivation: To create a stance‑annotated dataset for micro‑blog texts. Six stance targets were selected and data were collected from Sina Weibo for annotation.
- Source Data: Content generated by Sina Weibo users
- Annotation:
- Each target‑post pair was independently labeled by two students. If they agreed, the label was accepted; otherwise a third annotator resolved the disagreement through voting.
- Annotators: Chinese students
Considerations
- Social Impact: The dataset retains the original social‑media statements, which may involve privacy concerns.
- Bias Discussion: The data exhibit temporal, geographic, and topical biases.
Additional Information
- Dataset Curator: Paper authors
- License Information: Distributed under CC‑BY‑4.0
- Citation:
@incollection{xu2016overview, title={Overview of nlpcc shared task 4: Stance detection in chinese microblogs}, author={Xu, Ruifeng and Zhou, Yu and Wu, Dongyin and Gui, Lin and Du, Jiachen and Xue, Yun}, booktitle={Natural language understanding and intelligent applications}, pages={907--916}, year={2016}, publisher={Springer} }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.