stanfordnlp/sentiment140
The Sentiment140 dataset contains Twitter messages with emojis, which are used as noisy sentiment labels. It is primarily used for sentiment classification tasks, containing 1,600,000 training instances and 498 test instances. Fields include text, date, user, sentiment, and query.
Description
Dataset Overview
Dataset Name
- Name: Sentiment140
- Configuration Name: sentiment140
Dataset Features
- Text: string
- Date: string
- User: string
- Sentiment: int32
- Query: string
Dataset Splits
- Training Set: 1,600,000 records
- Test Set: 498 records
Dataset Size
- Download Size: 81.36 MB
- Dataset Size: 225.82 MB
Training & Evaluation Metrics
- Task: Text Classification
- Task ID: multi_class_classification
- Train Split: train
- Eval Split: test
- Column Mapping:
text: textsentiment: target
- Evaluation Metrics:
- Accuracy
- F1 macro
- F1 micro
- F1 weighted
- Precision macro
- Precision micro
- Precision weighted
- Recall macro
- Recall micro
- Recall weighted
Citation
@article{go2009twitter, title={Twitter sentiment classification using distant supervision}, author={Go, Alec and Bhayani, Richa and Huang, Lei}, journal={CS224N project report, Stanford}, volume={1}, number={12}, pages={2009}, year={2009} }
Contributors
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.