Back to datasets
Dataset assetOpen Source CommunitySentiment AnalysisText Classification
stanfordnlp/sentiment140
The Sentiment140 dataset contains Twitter messages with emojis, which are used as noisy sentiment labels. It is primarily used for sentiment classification tasks, containing 1,600,000 training instances and 498 test instances. Fields include text, date, user, sentiment, and query.
Source
hugging_face
Created
Nov 28, 2025
Updated
Oct 20, 2023
Signals
322 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- Name: Sentiment140
- Configuration Name: sentiment140
Dataset Features
- Text: string
- Date: string
- User: string
- Sentiment: int32
- Query: string
Dataset Splits
- Training Set: 1,600,000 records
- Test Set: 498 records
Dataset Size
- Download Size: 81.36 MB
- Dataset Size: 225.82 MB
Training & Evaluation Metrics
- Task: Text Classification
- Task ID: multi_class_classification
- Train Split: train
- Eval Split: test
- Column Mapping:
text: textsentiment: target
- Evaluation Metrics:
- Accuracy
- F1 macro
- F1 micro
- F1 weighted
- Precision macro
- Precision micro
- Precision weighted
- Recall macro
- Recall micro
- Recall weighted
Citation
@article{go2009twitter, title={Twitter sentiment classification using distant supervision}, author={Go, Alec and Bhayani, Richa and Huang, Lei}, journal={CS224N project report, Stanford}, volume={1}, number={12}, pages={2009}, year={2009} }
Contributors
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.