JUHE API Marketplace
DATASET
Open Source Community

fhamborg/news_sentiment_newsmtsc

NewsMTSC is a high‑quality dataset containing over 11k manually annotated sentences from English news articles. Each sentence is labeled by five human annotators and includes only examples where the annotators’ sentiment judgments are the same or similar. The dataset is split into two subsets (`rw` and `mt`), each containing training, validation, and test parts.

Updated 10/25/2022
hugging_face

Description

数据集概述

名称: NewsMTSC

语言: 英语(en-US)

许可证: MIT

多语言性: 单语

规模: 10K<n<100K

来源: 原始数据

任务类别: 文本分类

具体任务: 情感分类

数据集创建者:

  • 注释创建者: 众包, 专家生成
  • 语言创建者: 专家生成

数据集详情

描述: NewsMTSC是一个包含超过11,000个手动标记的英语新闻文章句子的高质量数据集。每个句子由五个人类编码员标记,只包含五位编码员评估的情感相同或相似的例子。

子集与分割:

  • 包含两个子集 (rwmt),每个子集包含三个分割(训练、验证、测试)。
  • 推荐使用 rw 子集,该子集的验证和测试集反映了新闻文章中情感的实际分布。
  • mt 子集的验证和测试集仅包含每个句子有两个或更多不同目标的句子,每个目标的情感单独标记。

数据格式:

  • 每个分割存储为JSONL文件,每行代表一个JSON对象。
  • 关键属性包括:
    1. polarity: 句子关于目标提及的情感(-1 = 负面, 0 = 中性, 1 = 正面)
    2. from: 目标提及在句子中的起始位置(基于字符,0索引)
    3. to: 目标提及的结束位置
    4. sentence: 句子文本
    5. id: 在NewsMTSC中唯一的标识符

引用信息:

  • 如使用此数据集,请引用论文:

    @InProceedings{Hamborg2021b, author = {Hamborg, Felix and Donnay, Karsten}, title = {NewsMTSC: (Multi-)Target-dependent Sentiment Classification in News Articles}, booktitle = {Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)}, year = {2021}, month = {Apr.}, location = {Virtual Event}, }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Sentiment Analysis
Natural Language Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.