JUHE API Marketplace
DATASET
Open Source Community

Yelp/yelp_review_full

The YelpReviewFull dataset contains review data collected from the Yelp website, mainly used for sentiment classification tasks. It includes 650,000 training samples and 50,000 test samples, each with a text field and a label field, where the label indicates the review rating (1 to 5 stars). The dataset was created via crowdsourcing and is in English.

Updated 1/4/2024
hugging_face

Description

Dataset Card for YelpReviewFull

Dataset Description

Overview

Yelp reviews dataset contains reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015.

Supported Tasks and Leaderboards

  • text-classification, sentiment-classification: primarily used for text classification; given a text, predict sentiment.

Language

Reviews are primarily written in English.

Dataset Structure

Data Example

A typical data point contains text and its corresponding label.

YelpReviewFull test example:

{
    "label": 0,
    "text": "I got new tires from them and within two weeks got a flat. I took my car to a local mechanic to see if i could get the hole patched, but they said the reason I had a flat was because the previous patch had blown - WAIT, WHAT? I just got the tire and never needed to have it patched? This was supposed to be a new tire. I took the tire over to Flynns and they told me that someone punctured my tire, then tried to patch it. So there are resentful tire slashers? I find that very unlikely. After arguing with the guy and telling him that his logic was far fetched he said he'd give me a new tire \"this time\". I will never go back to Flynns b/c of the way this guy treated me and the simple fact that they gave me a used tire!"
}

Data Fields

  • text: review text, escaped with double quotes. Internal double quotes are escaped by double double‑quotes. Newlines are escaped as "\n".
  • label: corresponds to the rating associated with the review (1 to 5).

Data Splits

Yelp reviews full‑star dataset was built by randomly sampling 130,000 training samples and 10,000 test samples from each star rating.

Dataset Creation

Motivation

Yelp reviews full‑star dataset was constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the Yelp Dataset Challenge 2015. It was first used as a text‑classification benchmark in the paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character‑level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

Usage Notes

Social Impact

[More information needed]

Bias Discussion

[More information needed]

Other Known Limitations

[More information needed]

Additional Information

Curators

[More information needed]

License

You can view the official yelp‑dataset‑agreement.

Citation

Xiang Zhang, Junbo Zhao, Yann LeCun. Character‑level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

Contributions

Thanks to @hfawaz for adding this dataset.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Text Classification
Sentiment Analysis

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.