fancyzhx/amazon_polarity

The Amazon Review Polarity dataset contains product reviews from Amazon, primarily for text‑classification tasks, especially sentiment classification. Reviews rated 1‑2 are labeled negative, 4‑5 positive, and rating 3 is omitted. The dataset includes 3.6 M training samples and 0.4 M test samples; each record comprises a review title, content, and a label (positive or negative). It was created by Xiang Zhang and is widely used as a benchmark for text‑classification research.

Updated 1/9/2024

hugging_face

Description

Dataset Overview

Dataset Name

Name: Amazon Review Polarity
Alias: AmazonPolarity

Basic Information

Language: English (en)
License: Apache‑2.0
Multilinguality: Monolingual
Size: 1 M < n < 10 M
Source Data: Raw data
Task Category: Text classification
Task ID: Sentiment classification

Dataset Structure

Features:
- label: Classification label, 0 = negative, 1 = positive
- title: String, review title
- content: String, review content
Splits:
- Training set: 3,600,000 samples, total size 1,604,364,432 bytes
- Test set: 400,000 samples, total size 178,176,193 bytes

Usage

Training / Evaluation Metrics:
- Accuracy
- F1 (macro, micro, weighted)
- Precision (macro, micro, weighted)
- Recall (macro, micro, weighted)

Creation

Creator: Xiang Zhang (xiang.zhang@nyu.edu)
Purpose: Benchmark for text classification, used in the paper "Character‑level Convolutional Networks for Text Classification"

License Information

License: Apache License 2.0

Citation

McAuley, Julian, and Jure Leskovec. "Hidden factors and hidden topics: understanding rating dimensions with review text." In Proceedings of the 7th ACM conference on Recommender systems, pp. 165‑172. 2013.
Xiang Zhang, Junbo Zhao, Yann LeCun. "Character‑level Convolutional Networks for Text Classification." Advances in Neural Information Processing Systems 28 (NIPS 2015).

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Sentiment Analysis

Text Classification

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →