Dataset Card for YelpReviewFull

Dataset Description

Overview

Yelp reviews dataset contains reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015.

Supported Tasks and Leaderboards

text-classification, sentiment-classification: primarily used for text classification; given a text, predict sentiment.

Language

Reviews are primarily written in English.

Dataset Structure

Data Example

A typical data point contains text and its corresponding label.

YelpReviewFull test example:

{
    "label": 0,
    "text": "I got new tires from them and within two weeks got a flat. I took my car to a local mechanic to see if i could get the hole patched, but they said the reason I had a flat was because the previous patch had blown - WAIT, WHAT? I just got the tire and never needed to have it patched? This was supposed to be a new tire. I took the tire over to Flynns and they told me that someone punctured my tire, then tried to patch it. So there are resentful tire slashers? I find that very unlikely. After arguing with the guy and telling him that his logic was far fetched he said he'd give me a new tire \"this time\". I will never go back to Flynns b/c of the way this guy treated me and the simple fact that they gave me a used tire!"
}

Data Fields

text: review text, escaped with double quotes. Internal double quotes are escaped by double double‑quotes. Newlines are escaped as "\n".
label: corresponds to the rating associated with the review (1 to 5).

Data Splits

Yelp reviews full‑star dataset was built by randomly sampling 130,000 training samples and 10,000 test samples from each star rating.

Dataset Creation

Motivation

Yelp reviews full‑star dataset was constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the Yelp Dataset Challenge 2015. It was first used as a text‑classification benchmark in the paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character‑level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

Usage Notes

Social Impact

[More information needed]

Bias Discussion

[More information needed]

Other Known Limitations

[More information needed]

Additional Information

Curators

[More information needed]

License

You can view the official yelp‑dataset‑agreement.

Citation

Xiang Zhang, Junbo Zhao, Yann LeCun. Character‑level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

Contributions

Thanks to @hfawaz for adding this dataset.

Yelp/yelp_review_full

Description

Dataset Card for YelpReviewFull

Dataset Description

Overview

Supported Tasks and Leaderboards

Language

Dataset Structure

Data Example

Data Fields

Data Splits

Dataset Creation

Motivation

Usage Notes

Social Impact

Bias Discussion

Other Known Limitations

Additional Information

Curators

License

Citation

Contributions

AI studio

Access Dataset

Topics

Source