JUHE API Marketplace
DATASET
Open Source Community

strombergnlp/nlpcc-stance

This is a Chinese stance‑prediction dataset specifically designed for detecting stance in Chinese micro‑blogs. The data originate from the NLPCC‑ICCPOL 2016 shared task, aiming to identify stance toward five target topics given annotated data. Each instance contains a unique ID, target, text, and stance label (against, favor, or none). The dataset was annotated by Chinese students, ensuring consistency and reliability. It contains only Chinese data and is released under a CC‑BY‑4.0 license.

Updated 10/25/2022
hugging_face

Description

Dataset Overview

Basic Information

  • Name: NLPCC 2016: Stance Detection in Chinese Microblogs
  • Language: Chinese (bcp47:zh)
  • License: Creative Commons Attribution 4.0 (CC‑BY‑4.0)
  • Multilinguality: Monolingual
  • Size: 1K < n < 10K
  • Source: Original data
  • Task Category: Text classification
  • Task ID: Sentiment analysis
  • Tags: Stance detection

Description

  • Overview: This dataset focuses on stance prediction in Chinese micro‑blogs. It originates from the NLPCC‑ICCPOL 2016 shared task, a mandatory supervised task that detects stance toward five interest targets.
  • Supported Task: Stance detection in Chinese micro‑blogs

Structure

  • Instances: Each instance includes four fields: id (unique identifier), target (stance target), text (text containing the stance), stance (stance class, 0 = AGAINST, 1 = FAVOR, 2 = NONE).
  • Fields:
    • id: String, unique identifier
    • target: String, stance target
    • text: String, text containing the stance
    • stance: Integer, stance class (0: AGAINST, 1: FAVOR, 2: NONE)
  • Splits: Training set contains 2,986 instances

Creation

  • Motivation: To create a stance‑annotated dataset for micro‑blog texts. Six stance targets were selected and data were collected from Sina Weibo for annotation.
  • Source Data: Content generated by Sina Weibo users
  • Annotation:
    • Each target‑post pair was independently labeled by two students. If they agreed, the label was accepted; otherwise a third annotator resolved the disagreement through voting.
    • Annotators: Chinese students

Considerations

  • Social Impact: The dataset retains the original social‑media statements, which may involve privacy concerns.
  • Bias Discussion: The data exhibit temporal, geographic, and topical biases.

Additional Information

  • Dataset Curator: Paper authors
  • License Information: Distributed under CC‑BY‑4.0
  • Citation:
    @incollection{xu2016overview,
      title={Overview of nlpcc shared task 4: Stance detection in chinese microblogs},
      author={Xu, Ruifeng and Zhou, Yu and Wu, Dongyin and Gui, Lin and Du, Jiachen and Xue, Yun},
      booktitle={Natural language understanding and intelligent applications},
      pages={907--916},
      year={2016},
      publisher={Springer}
    }
    

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Natural Language Processing
Social Media Analysis

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.