Back to datasets
Dataset assetOpen Source CommunitySentiment AnalysisText Classification

fancyzhx/amazon_polarity

The Amazon Review Polarity dataset contains product reviews from Amazon, primarily for text‑classification tasks, especially sentiment classification. Reviews rated 1‑2 are labeled negative, 4‑5 positive, and rating 3 is omitted. The dataset includes 3.6 M training samples and 0.4 M test samples; each record comprises a review title, content, and a label (positive or negative). It was created by Xiang Zhang and is widely used as a benchmark for text‑classification research.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jan 9, 2024
Signals
212 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • Name: Amazon Review Polarity
  • Alias: AmazonPolarity

Basic Information

  • Language: English (en)
  • License: Apache‑2.0
  • Multilinguality: Monolingual
  • Size: 1 M < n < 10 M
  • Source Data: Raw data
  • Task Category: Text classification
  • Task ID: Sentiment classification

Dataset Structure

  • Features:
    • label: Classification label, 0 = negative, 1 = positive
    • title: String, review title
    • content: String, review content
  • Splits:
    • Training set: 3,600,000 samples, total size 1,604,364,432 bytes
    • Test set: 400,000 samples, total size 178,176,193 bytes

Usage

  • Training / Evaluation Metrics:
    • Accuracy
    • F1 (macro, micro, weighted)
    • Precision (macro, micro, weighted)
    • Recall (macro, micro, weighted)

Creation

  • Creator: Xiang Zhang (xiang.zhang@nyu.edu)
  • Purpose: Benchmark for text classification, used in the paper "Character‑level Convolutional Networks for Text Classification"

License Information

  • License: Apache License 2.0

Citation

  • McAuley, Julian, and Jure Leskovec. "Hidden factors and hidden topics: understanding rating dimensions with review text." In Proceedings of the 7th ACM conference on Recommender systems, pp. 165‑172. 2013.
  • Xiang Zhang, Junbo Zhao, Yann LeCun. "Character‑level Convolutional Networks for Text Classification." Advances in Neural Information Processing Systems 28 (NIPS 2015).
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio