JUHE API Marketplace
DATASET
Open Source Community

jniimi/tripadvisor-review-rating

This dataset contains hotel reviews and ratings collected from TripAdvisor. After processing, only the review text and multiple aspect scores are retained. Originally released by Jiwei Li et al., the processed data is provided as a single pandas DataFrame. It is primarily intended for aspect‑based sentiment analysis (ABSA). The dataset includes columns such as hotel ID, user ID, review title, review text, overall rating, cleanliness rating, and others.

Updated 4/24/2024
hugging_face

Description

Dataset Overview

Dataset Name

  • Name: TripAdvisor Easy Dataset
  • Alias: sentiment

Dataset Features

  • Feature List:
    • hotel_id: Unique hotel identifier, type int64
    • user_id: Unique user identifier, type string
    • title: Review title submitted by the user, type string
    • text: Full review body, type string
    • overall: Overall rating given by the user, type float64
    • cleanliness: Cleanliness rating, type float64
    • value: Value rating, type float64
    • location: Location rating, type float64
    • rooms: Room rating, type float64
    • sleep_quality: Sleep quality rating, type float64
    • stay_year: Year of stay, type int64
    • post_date: Review publication date, type timestamp[ns]
    • freq: Frequency, type int64
    • review: Full review, type string
    • char: Number of characters, type int64
    • lang: Language, type string

Dataset Splits

  • Training Set:
    • num_examples: 201295
    • num_bytes: 368237342

Dataset Size

  • Download Size: 220909380
  • Dataset Size: 368237342

Dataset Configuration

  • Default Configuration:
    • config_name: default
    • data_files:
      • split: train
      • path: data/train-*

Task Categories

  • Task: text-classification

Language

  • Language: en

Size Category

  • Size: 10K<n<100K

License

  • License: Apache-2.0

Intended Uses

  • Direct Use: Suitable for Aspect‑based Sentiment Analysis (ABSA)
  • Out‑of‑Scope Use: Follow the original data source policy

Dataset Structure

  • Column Information:
    • hotel_id: Unique hotel identifier
    • user_id: Unique user identifier
    • title: Review title
    • text: Review body
    • review: Combined title + body
    • overall: Overall rating
    • cleanliness: Cleanliness rating
    • value: Value rating
    • location: Location rating
    • rooms: Room rating
    • sleep_quality: Sleep quality rating
    • date_stayed: Stay date
    • date: Review publication date

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Hotel Review Analysis
Sentiment Analysis

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.