Dataset Catalog

Browse trusted datasets for evaluation, enrichment, and production use.

Category index
Showing 18 of 18 datasets
Category: Sentiment Analysis

Ajitava/go_emotions_multi_label

Sentiment AnalysisModel Evaluation

This is a multi‑label emotion classification dataset based on the Go Emotion parameters. The dataset was annotated by a team of 12 engineers with custom tags. Additionally, evaluation results of three models (RoBERTa, BERT‑cased, and BERT‑uncased) on this dataset are presented.

Source hugging_faceUpdated Aug 21, 202364 viewsLinked
Inspect dataset

Yelp/yelp_review_full

Text ClassificationSentiment Analysis

The YelpReviewFull dataset contains review data collected from the Yelp website, mainly used for sentiment classification tasks. It includes 650,000 training samples and 50,000 test samples, each with a text field and a label field, where the label indicates the review rating (1 to 5 stars). The dataset was created via crowdsourcing and is in English.

Source hugging_faceUpdated Jan 4, 2024368 viewsLinked
Inspect dataset

DEAP dataset

Sentiment AnalysisEEG Signals

The DEAP dataset is a physiological‑signal database for emotion analysis, primarily using EEG signals to assess emotional states, including arousal and valence.

Source githubUpdated Dec 31, 2019248 viewsLinked
Inspect dataset

GEM/xsum

Sentiment AnalysisText Summarization

XSum is an English news summarization dataset, the task is to predict the first sentence of an article based on the rest of the article. The dataset originates from BBC articles, language is British English, primarily used for abstractive summarization. The dataset structure includes document, summary, and ID fields, and is randomly split into training, validation, and test sets. The creators are from the University of Edinburgh, and the license is CC BY‑SA 4.0.

Source hugging_faceUpdated Oct 24, 2022173 viewsLinked
Inspect dataset

jniimi/tripadvisor-review-rating

Hotel Review AnalysisSentiment Analysis

This dataset contains hotel reviews and ratings collected from TripAdvisor. After processing, only the review text and multiple aspect scores are retained. Originally released by Jiwei Li et al., the processed data is provided as a single pandas DataFrame. It is primarily intended for aspect‑based sentiment analysis (ABSA). The dataset includes columns such as hotel ID, user ID, review title, review text, overall rating, cleanliness rating, and others.

Source hugging_faceUpdated Apr 24, 2024393 viewsLinked
Inspect dataset

ISEAR

Sentiment AnalysisText Analysis

The ISEAR dataset, developed by the Swiss National Center for Ability Research, is an international survey of emotional antecedents and reactions, suitable for text analysis and sentiment analysis.

Source githubUpdated May 15, 2024514 viewsLinked
Inspect dataset

fhamborg/news_sentiment_newsmtsc

Sentiment AnalysisNatural Language Processing

NewsMTSC is a high‑quality dataset containing over 11k manually annotated sentences from English news articles. Each sentence is labeled by five human annotators and includes only examples where the annotators’ sentiment judgments are the same or similar. The dataset is split into two subsets (`rw` and `mt`), each containing training, validation, and test parts.

Source hugging_faceUpdated Oct 25, 2022161 viewsLinked
Inspect dataset

stanfordnlp/sentiment140

Sentiment AnalysisText Classification

The Sentiment140 dataset contains Twitter messages with emojis, which are used as noisy sentiment labels. It is primarily used for sentiment classification tasks, containing 1,600,000 training instances and 498 test instances. Fields include text, date, user, sentiment, and query.

Source hugging_faceUpdated Oct 20, 2023322 viewsLinked
Inspect dataset

IVLLab/MultiDialog

Multimodal DialogueSentiment Analysis

The dataset contains manually annotated metadata linking audio files with transcriptions, emotions, and other attributes. It supports tasks such as multimodal dialogue generation, automatic speech recognition, and text‑to‑speech conversion. The language is English, and a gold‑standard emotional dialogue subset is provided for studying emotion dynamics in conversations.

Source hugging_faceUpdated Aug 29, 2024660 viewsLinked
Inspect dataset

mr

Text ClassificationSentiment Analysis

This dataset is intended for text‑classification tasks and contains two features: the text content and a label. Labels are binary, with 'neg' (negative) and 'pos' (positive). The data are split into training, validation, and test sets for model training, validation, and testing, respectively.

Source huggingfaceUpdated Nov 28, 2024384 viewsLinked
Inspect dataset

Synthetic Lyrics Dataset

Lyric AnalysisSentiment Analysis

A synthetic lyrics dataset obtained via the Genius API and web crawling, annotated with theme, emotion, style, tone, and narrative using the Mistral API.

Source githubUpdated Apr 2, 2024197 viewsLinked
Inspect dataset

sst2_combined

Sentiment AnalysisText Classification

The dataset includes three primary features: 'sentence' (string), 'label' (categorical with two classes: 0 for negative sentiment, 1 for positive sentiment), and 'idx' (integer index). The training set has 68,221 samples, the validation set 872 samples, and the test set 1,821 samples. Total download size is 3,403,184 bytes; total dataset size is 5,110,747 bytes.

Source huggingfaceUpdated Dec 14, 2024102 viewsLinked
Inspect dataset

matthewfranglen/aste-v2

Sentiment AnalysisNatural Language Processing

Aspect Sentiment Triplet Extraction v2 is designed for extracting tuples consisting of a target entity, its associated sentiment, and the opinion span that explains the sentiment. It focuses on aspect‑based sentiment analysis (ABSA) to identify aspects of target entities and the polarity expressed for each aspect. The data are derived from SemEval 2014, 2015, and 2016 datasets, pre‑processed with spell correction and tokenization. The dataset includes training, validation, and test splits, each line containing index, text, start and end indices for aspect and opinion spans, the aspect and opinion terms, and the sentiment class.

Source hugging_faceUpdated Oct 9, 2023130 viewsLinked
Inspect dataset

asajjad/isaac_steam_reviews_en

Game ReviewsSentiment Analysis

This dataset consists of user comments on various popular games, each paired with a sentiment label (negative or positive), the game name, and a rating. It is divided into training and test sets for potential sentiment analysis or game‑review research.

Source hugging_faceUpdated May 1, 2024305 viewsLinked
Inspect dataset

Yelp Reviews Dataset

Sentiment AnalysisNatural Language Processing

The dataset comprises Yelp review data for sentiment analysis, specifically comparing the effectiveness of BERT and RoBERTa models on Yelp review sentiment classification.

Source githubUpdated Dec 2, 2023386 viewsLinked
Inspect dataset

fancyzhx/amazon_polarity

Sentiment AnalysisText Classification

The Amazon Review Polarity dataset contains product reviews from Amazon, primarily for text‑classification tasks, especially sentiment classification. Reviews rated 1‑2 are labeled negative, 4‑5 positive, and rating 3 is omitted. The dataset includes 3.6 M training samples and 0.4 M test samples; each record comprises a review title, content, and a label (positive or negative). It was created by Xiang Zhang and is widely used as a benchmark for text‑classification research.

Source hugging_faceUpdated Jan 9, 2024212 viewsLinked
Inspect dataset

takala/financial_phrasebank

FinanceSentiment Analysis

The FinancialPhrasebank is a dataset of financial news sentences for sentiment classification. It contains 4,840 English sentences, each classified according to the agreement rate of 5–8 annotators. The dataset is provided in four configurations based on annotator agreement levels (50%, 66%, 75%, and 100%). The purpose of creating the dataset is to address the lack of high‑quality training data for financial sentiment analysis. The dataset was annotated by 16 individuals with background knowledge of financial markets, including researchers and master's students. Use of the dataset is governed by the Creative Commons Attribution‑NonCommercial‑ShareAlike 3.0 Unported License.

Source hugging_faceUpdated Jan 18, 20241,189 viewsLinked
Inspect dataset

SMILE Twitter Emotion dataset

Sentiment AnalysisSocial Media

The SMILE Twitter Emotion dataset was created by Wang et al. in 2016 and contains tweets annotated with multiple emotions (e.g., happiness, anger, sadness), providing a rich resource for sentiment analysis tasks.

Source githubUpdated Mar 30, 2024177 viewsLinked
Inspect dataset