Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingSocial Media Analysis
strombergnlp/nlpcc-stance
This is a Chinese stance‑prediction dataset specifically designed for detecting stance in Chinese micro‑blogs. The data originate from the NLPCC‑ICCPOL 2016 shared task, aiming to identify stance toward five target topics given annotated data. Each instance contains a unique ID, target, text, and stance label (against, favor, or none). The dataset was annotated by Chinese students, ensuring consistency and reliability. It contains only Chinese data and is released under a CC‑BY‑4.0 license.
Source
hugging_face
Created
Nov 28, 2025
Updated
Oct 25, 2022
Signals
270 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Basic Information
- Name: NLPCC 2016: Stance Detection in Chinese Microblogs
- Language: Chinese (
bcp47:zh) - License: Creative Commons Attribution 4.0 (CC‑BY‑4.0)
- Multilinguality: Monolingual
- Size: 1K < n < 10K
- Source: Original data
- Task Category: Text classification
- Task ID: Sentiment analysis
- Tags: Stance detection
Description
- Overview: This dataset focuses on stance prediction in Chinese micro‑blogs. It originates from the NLPCC‑ICCPOL 2016 shared task, a mandatory supervised task that detects stance toward five interest targets.
- Supported Task: Stance detection in Chinese micro‑blogs
Structure
- Instances: Each instance includes four fields:
id(unique identifier),target(stance target),text(text containing the stance),stance(stance class, 0 = AGAINST, 1 = FAVOR, 2 = NONE). - Fields:
id: String, unique identifiertarget: String, stance targettext: String, text containing the stancestance: Integer, stance class (0: AGAINST, 1: FAVOR, 2: NONE)
- Splits: Training set contains 2,986 instances
Creation
- Motivation: To create a stance‑annotated dataset for micro‑blog texts. Six stance targets were selected and data were collected from Sina Weibo for annotation.
- Source Data: Content generated by Sina Weibo users
- Annotation:
- Each target‑post pair was independently labeled by two students. If they agreed, the label was accepted; otherwise a third annotator resolved the disagreement through voting.
- Annotators: Chinese students
Considerations
- Social Impact: The dataset retains the original social‑media statements, which may involve privacy concerns.
- Bias Discussion: The data exhibit temporal, geographic, and topical biases.
Additional Information
- Dataset Curator: Paper authors
- License Information: Distributed under CC‑BY‑4.0
- Citation:
@incollection{xu2016overview, title={Overview of nlpcc shared task 4: Stance detection in chinese microblogs}, author={Xu, Ruifeng and Zhou, Yu and Wu, Dongyin and Gui, Lin and Du, Jiachen and Xue, Yun}, booktitle={Natural language understanding and intelligent applications}, pages={907--916}, year={2016}, publisher={Springer} }
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.