Back to datasets
Dataset assetOpen Source CommunitySentiment AnalysisHotel Review Analysis

jniimi/tripadvisor-review-rating

This dataset contains hotel reviews and ratings collected from TripAdvisor. After processing, only the review text and multiple aspect scores are retained. Originally released by Jiwei Li et al., the processed data is provided as a single pandas DataFrame. It is primarily intended for aspect‑based sentiment analysis (ABSA). The dataset includes columns such as hotel ID, user ID, review title, review text, overall rating, cleanliness rating, and others.

Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 24, 2024
Signals
393 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • Name: TripAdvisor Easy Dataset
  • Alias: sentiment

Dataset Features

  • Feature List:
    • hotel_id: Unique hotel identifier, type int64
    • user_id: Unique user identifier, type string
    • title: Review title submitted by the user, type string
    • text: Full review body, type string
    • overall: Overall rating given by the user, type float64
    • cleanliness: Cleanliness rating, type float64
    • value: Value rating, type float64
    • location: Location rating, type float64
    • rooms: Room rating, type float64
    • sleep_quality: Sleep quality rating, type float64
    • stay_year: Year of stay, type int64
    • post_date: Review publication date, type timestamp[ns]
    • freq: Frequency, type int64
    • review: Full review, type string
    • char: Number of characters, type int64
    • lang: Language, type string

Dataset Splits

  • Training Set:
    • num_examples: 201295
    • num_bytes: 368237342

Dataset Size

  • Download Size: 220909380
  • Dataset Size: 368237342

Dataset Configuration

  • Default Configuration:
    • config_name: default
    • data_files:
      • split: train
      • path: data/train-*

Task Categories

  • Task: text-classification

Language

  • Language: en

Size Category

  • Size: 10K<n<100K

License

  • License: Apache-2.0

Intended Uses

  • Direct Use: Suitable for Aspect‑based Sentiment Analysis (ABSA)
  • Out‑of‑Scope Use: Follow the original data source policy

Dataset Structure

  • Column Information:
    • hotel_id: Unique hotel identifier
    • user_id: Unique user identifier
    • title: Review title
    • text: Review body
    • review: Combined title + body
    • overall: Overall rating
    • cleanliness: Cleanliness rating
    • value: Value rating
    • location: Location rating
    • rooms: Room rating
    • sleep_quality: Sleep quality rating
    • date_stayed: Stay date
    • date: Review publication date
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio