JUHE API Marketplace
DATASET
Open Source Community

Steam and Steam Spy raw datasets

This dataset contains raw game data extracted from the Steam Store API and Steam Spy API, comprising two files: steam_app_data.csv and steamspy_data.csv. It provides extensive information such as app type, name, age rating, free‑to‑play indicator, DLC availability, app description, supported languages, PC requirements, developer and publisher names, demo availability, platforms, reviews, categories and genres, release date, number of users owning the app, current and initial price, discounts, CCU, etc.

Updated 4/5/2024
github

Description

Dataset Overview

Data Source

  • Source: Kaggle website
  • Publisher: Vicente Arce
  • Release Date: February 2022

Dataset Content

  • Files: Includes two CSV files, "steam_app_data.csv" and "steamspy_data.csv"
  • Size: Total 124 MB
  • Features:
    • "steam_app_data.csv" contains 39 features, 66,414 unique values
    • "steamspy_data.csv" contains 20 features, 63,504 unique values
  • Information Types: Includes app type, name, unique ID, age rating, free‑to‑play indicator, DLC availability, app description, supported languages, PC requirements, developer and publisher names, demo availability, platforms, reviews, categories and genres, release date, number of users owning the app, current and initial price, discount, CCU, etc.

Dataset Purpose

  • Primary Objective: Apply big data analysis techniques such as clustering to explore relationships between game type/category and parameters like price, initial price, discount, gameplay time, rating, number of users owning digital copies, and CCU.
  • Secondary Objective: Discover interesting findings during analysis

Analytical Methods

  • Initial Analysis: Merge the two raw data files, clean duplicate columns and values, resulting in a new dataset with 52 features and 66,902 applications.
  • Unsupervised Analysis: Use K‑Means clustering on data including Steam app IDs, types, price, gameplay time, rating, etc.
  • Supervised Analysis: Use Naive Bayes clustering for supervised analysis to evaluate clustering performance.

Analysis Results

  • Clustering Performance: Results are suboptimal, requiring further testing and adjustment of dataset values to achieve satisfactory outcomes.

Dataset Applications

This dataset is suitable for in‑depth analysis of the gaming industry, including market trends, game genre preferences, pricing strategies, and related studies.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Game Data Analysis
Game Market Research

Source

Organization: github

Created: 4/4/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.