fancyzhx/amazon_polarity
The Amazon Review Polarity dataset contains product reviews from Amazon, primarily for text‑classification tasks, especially sentiment classification. Reviews rated 1‑2 are labeled negative, 4‑5 positive, and rating 3 is omitted. The dataset includes 3.6 M training samples and 0.4 M test samples; each record comprises a review title, content, and a label (positive or negative). It was created by Xiang Zhang and is widely used as a benchmark for text‑classification research.
Description
Dataset Overview
Dataset Name
- Name: Amazon Review Polarity
- Alias: AmazonPolarity
Basic Information
- Language: English (en)
- License: Apache‑2.0
- Multilinguality: Monolingual
- Size: 1 M < n < 10 M
- Source Data: Raw data
- Task Category: Text classification
- Task ID: Sentiment classification
Dataset Structure
- Features:
- label: Classification label, 0 = negative, 1 = positive
- title: String, review title
- content: String, review content
- Splits:
- Training set: 3,600,000 samples, total size 1,604,364,432 bytes
- Test set: 400,000 samples, total size 178,176,193 bytes
Usage
- Training / Evaluation Metrics:
- Accuracy
- F1 (macro, micro, weighted)
- Precision (macro, micro, weighted)
- Recall (macro, micro, weighted)
Creation
- Creator: Xiang Zhang (xiang.zhang@nyu.edu)
- Purpose: Benchmark for text classification, used in the paper "Character‑level Convolutional Networks for Text Classification"
License Information
- License: Apache License 2.0
Citation
- McAuley, Julian, and Jure Leskovec. "Hidden factors and hidden topics: understanding rating dimensions with review text." In Proceedings of the 7th ACM conference on Recommender systems, pp. 165‑172. 2013.
- Xiang Zhang, Junbo Zhao, Yann LeCun. "Character‑level Convolutional Networks for Text Classification." Advances in Neural Information Processing Systems 28 (NIPS 2015).
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.