Dataset assetOpen Source CommunityNatural Language ProcessingSocial Media Analysis

strombergnlp/nlpcc-stance

This is a Chinese stance‑prediction dataset specifically designed for detecting stance in Chinese micro‑blogs. The data originate from the NLPCC‑ICCPOL 2016 shared task, aiming to identify stance toward five target topics given annotated data. Each instance contains a unique ID, target, text, and stance label (against, favor, or none). The dataset was annotated by Chinese students, ensuring consistency and reliability. It contains only Chinese data and is released under a CC‑BY‑4.0 license.

Source

hugging_face

Created

Nov 28, 2025

Updated

Oct 25, 2022

Signals

270 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Basic Information

Name: NLPCC 2016: Stance Detection in Chinese Microblogs
Language: Chinese (bcp47:zh)
License: Creative Commons Attribution 4.0 (CC‑BY‑4.0)
Multilinguality: Monolingual
Size: 1K < n < 10K
Source: Original data
Task Category: Text classification
Task ID: Sentiment analysis
Tags: Stance detection

Description

Overview: This dataset focuses on stance prediction in Chinese micro‑blogs. It originates from the NLPCC‑ICCPOL 2016 shared task, a mandatory supervised task that detects stance toward five interest targets.
Supported Task: Stance detection in Chinese micro‑blogs

Structure

Instances: Each instance includes four fields: id (unique identifier), target (stance target), text (text containing the stance), stance (stance class, 0 = AGAINST, 1 = FAVOR, 2 = NONE).
Fields:
- id: String, unique identifier
- target: String, stance target
- text: String, text containing the stance
- stance: Integer, stance class (0: AGAINST, 1: FAVOR, 2: NONE)
Splits: Training set contains 2,986 instances

Creation

Motivation: To create a stance‑annotated dataset for micro‑blog texts. Six stance targets were selected and data were collected from Sina Weibo for annotation.
Source Data: Content generated by Sina Weibo users
Annotation:
- Each target‑post pair was independently labeled by two students. If they agreed, the label was accepted; otherwise a third annotator resolved the disagreement through voting.
- Annotators: Chinese students

Considerations

Social Impact: The dataset retains the original social‑media statements, which may involve privacy concerns.
Bias Discussion: The data exhibit temporal, geographic, and topical biases.

Additional Information

Dataset Curator: Paper authors
License Information: Distributed under CC‑BY‑4.0

Citation:

@incollection{xu2016overview,
  title={Overview of nlpcc shared task 4: Stance detection in chinese microblogs},
  author={Xu, Ruifeng and Zhou, Yu and Wu, Dongyin and Gui, Lin and Du, Jiachen and Xue, Yun},
  booktitle={Natural language understanding and intelligent applications},
  pages={907--916},
  year={2016},
  publisher={Springer}
}

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio