fhamborg/news_sentiment_newsmtsc
NewsMTSC is a high‑quality dataset containing over 11k manually annotated sentences from English news articles. Each sentence is labeled by five human annotators and includes only examples where the annotators’ sentiment judgments are the same or similar. The dataset is split into two subsets (`rw` and `mt`), each containing training, validation, and test parts.
Description
数据集概述
名称: NewsMTSC
语言: 英语(en-US)
许可证: MIT
多语言性: 单语
规模: 10K<n<100K
来源: 原始数据
任务类别: 文本分类
具体任务: 情感分类
数据集创建者:
- 注释创建者: 众包, 专家生成
- 语言创建者: 专家生成
数据集详情
描述: NewsMTSC是一个包含超过11,000个手动标记的英语新闻文章句子的高质量数据集。每个句子由五个人类编码员标记,只包含五位编码员评估的情感相同或相似的例子。
子集与分割:
- 包含两个子集 (
rw和mt),每个子集包含三个分割(训练、验证、测试)。 - 推荐使用
rw子集,该子集的验证和测试集反映了新闻文章中情感的实际分布。 mt子集的验证和测试集仅包含每个句子有两个或更多不同目标的句子,每个目标的情感单独标记。
数据格式:
- 每个分割存储为JSONL文件,每行代表一个JSON对象。
- 关键属性包括:
polarity: 句子关于目标提及的情感(-1 = 负面, 0 = 中性, 1 = 正面)from: 目标提及在句子中的起始位置(基于字符,0索引)to: 目标提及的结束位置sentence: 句子文本id: 在NewsMTSC中唯一的标识符
引用信息:
-
如使用此数据集,请引用论文:
@InProceedings{Hamborg2021b, author = {Hamborg, Felix and Donnay, Karsten}, title = {NewsMTSC: (Multi-)Target-dependent Sentiment Classification in News Articles}, booktitle = {Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)}, year = {2021}, month = {Apr.}, location = {Virtual Event}, }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.