Yelp/yelp_review_full
The YelpReviewFull dataset contains review data collected from the Yelp website, mainly used for sentiment classification tasks. It includes 650,000 training samples and 50,000 test samples, each with a text field and a label field, where the label indicates the review rating (1 to 5 stars). The dataset was created via crowdsourcing and is in English.
Description
Dataset Card for YelpReviewFull
Dataset Description
Overview
Yelp reviews dataset contains reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015.
Supported Tasks and Leaderboards
text-classification,sentiment-classification: primarily used for text classification; given a text, predict sentiment.
Language
Reviews are primarily written in English.
Dataset Structure
Data Example
A typical data point contains text and its corresponding label.
YelpReviewFull test example:
{
"label": 0,
"text": "I got new tires from them and within two weeks got a flat. I took my car to a local mechanic to see if i could get the hole patched, but they said the reason I had a flat was because the previous patch had blown - WAIT, WHAT? I just got the tire and never needed to have it patched? This was supposed to be a new tire. I took the tire over to Flynns and they told me that someone punctured my tire, then tried to patch it. So there are resentful tire slashers? I find that very unlikely. After arguing with the guy and telling him that his logic was far fetched he said he'd give me a new tire \"this time\". I will never go back to Flynns b/c of the way this guy treated me and the simple fact that they gave me a used tire!"
}
Data Fields
- text: review text, escaped with double quotes. Internal double quotes are escaped by double double‑quotes. Newlines are escaped as "\n".
- label: corresponds to the rating associated with the review (1 to 5).
Data Splits
Yelp reviews full‑star dataset was built by randomly sampling 130,000 training samples and 10,000 test samples from each star rating.
Dataset Creation
Motivation
Yelp reviews full‑star dataset was constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the Yelp Dataset Challenge 2015. It was first used as a text‑classification benchmark in the paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character‑level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).
Usage Notes
Social Impact
[More information needed]
Bias Discussion
[More information needed]
Other Known Limitations
[More information needed]
Additional Information
Curators
[More information needed]
License
You can view the official yelp‑dataset‑agreement.
Citation
Xiang Zhang, Junbo Zhao, Yann LeCun. Character‑level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).
Contributions
Thanks to @hfawaz for adding this dataset.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.