JUHE API Marketplace
DATASET
Open Source Community

stanfordnlp/sentiment140

The Sentiment140 dataset contains Twitter messages with emojis, which are used as noisy sentiment labels. It is primarily used for sentiment classification tasks, containing 1,600,000 training instances and 498 test instances. Fields include text, date, user, sentiment, and query.

Updated 10/20/2023
hugging_face

Description

Dataset Overview

Dataset Name

  • Name: Sentiment140
  • Configuration Name: sentiment140

Dataset Features

  • Text: string
  • Date: string
  • User: string
  • Sentiment: int32
  • Query: string

Dataset Splits

  • Training Set: 1,600,000 records
  • Test Set: 498 records

Dataset Size

  • Download Size: 81.36 MB
  • Dataset Size: 225.82 MB

Training & Evaluation Metrics

  • Task: Text Classification
  • Task ID: multi_class_classification
  • Train Split: train
  • Eval Split: test
  • Column Mapping:
    • text: text
    • sentiment: target
  • Evaluation Metrics:
    • Accuracy
    • F1 macro
    • F1 micro
    • F1 weighted
    • Precision macro
    • Precision micro
    • Precision weighted
    • Recall macro
    • Recall micro
    • Recall weighted

Citation

@article{go2009twitter, title={Twitter sentiment classification using distant supervision}, author={Go, Alec and Bhayani, Richa and Huang, Lei}, journal={CS224N project report, Stanford}, volume={1}, number={12}, pages={2009}, year={2009} }

Contributors

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Sentiment Analysis
Text Classification

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.