Back to datasets
Dataset assetOpen Source CommunitySentiment AnalysisText Classification

stanfordnlp/sentiment140

The Sentiment140 dataset contains Twitter messages with emojis, which are used as noisy sentiment labels. It is primarily used for sentiment classification tasks, containing 1,600,000 training instances and 498 test instances. Fields include text, date, user, sentiment, and query.

Source
hugging_face
Created
Nov 28, 2025
Updated
Oct 20, 2023
Signals
322 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • Name: Sentiment140
  • Configuration Name: sentiment140

Dataset Features

  • Text: string
  • Date: string
  • User: string
  • Sentiment: int32
  • Query: string

Dataset Splits

  • Training Set: 1,600,000 records
  • Test Set: 498 records

Dataset Size

  • Download Size: 81.36 MB
  • Dataset Size: 225.82 MB

Training & Evaluation Metrics

  • Task: Text Classification
  • Task ID: multi_class_classification
  • Train Split: train
  • Eval Split: test
  • Column Mapping:
    • text: text
    • sentiment: target
  • Evaluation Metrics:
    • Accuracy
    • F1 macro
    • F1 micro
    • F1 weighted
    • Precision macro
    • Precision micro
    • Precision weighted
    • Recall macro
    • Recall micro
    • Recall weighted

Citation

@article{go2009twitter, title={Twitter sentiment classification using distant supervision}, author={Go, Alec and Bhayani, Richa and Huang, Lei}, journal={CS224N project report, Stanford}, volume={1}, number={12}, pages={2009}, year={2009} }

Contributors

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio