JUHE API Marketplace
DATASET
Open Source Community

E-Commerce Dataset

The dataset consists of five interrelated tables, each containing key information about customers, transactions, branches, and merchants. These tables include the customers table, genders table, cities table, transactions table, branches table, and merchants table.

Updated 9/19/2024
github

Description

Dataset Overview

The dataset is composed of five interrelated tables, each containing key information about customers, transactions, branches, and merchants. The detailed description is as follows:

Table Structure

  1. Customers Table

    • customer_id: Unique identifier for the customer.
    • join_date: Date the customer joined.
    • city_id: ID of the city where the customer resides.
    • gender_id: ID of the customer's gender.
  2. Genders Table

    • gender_id: Unique identifier for the gender.
    • gender_name: Name of the gender (e.g., male, female).
  3. Cities Table

    • city_id: Unique identifier for the city.
    • city_name: Name of the city.
  4. Transactions Table

    • transaction_id: Unique identifier for the coupon transaction.
    • customer_id: ID of the customer who performed the transaction.
    • transaction_date: Date the coupon was claimed.
    • transaction_status: Status of the coupon (e.g., claimed, used).
    • coupon_name: Name of the coupon.
    • burn_date: Date the coupon was used.
    • branch_id: ID of the branch where the coupon was used.
  5. Branches Table

    • branch_id: Unique identifier for the branch.
    • merchant_id: ID of the merchant that owns the branch.
  6. Merchants Table

    • merchant_id: Unique identifier for the merchant.
    • merchant_name: Name of the merchant.

Data Processing Workflow

  1. Import essential libraries and load the dataset.
  2. Load the data.
  3. Convert transaction_date to datetime objects.
  4. Compute the most recent transaction date (recency) for each customer.
  5. Compute the transaction frequency for each customer.
  6. Aggregate the number of coupons claimed and used.
  7. Merge all data into a single DataFrame.
  8. Handle missing values.
  9. Feature selection: select recency, transaction_count, coupons_claimed, coupons_burned, city_id, gender_id.
  10. Feature scaling to ensure comparable scales.
  11. Determine the optimal number of clusters using the elbow method; the optimal number is 3.
  12. Apply KMeans clustering.
  13. Print inertia and silhouette score.
  14. Visualize scatter plot matrix using seaborn and matplotlib, comparing three clusters based on recency, transaction_count, coupons_claimed, coupons_burned.
  15. Conduct segmentation analysis and provide recommendations.
  16. Create a 3D scatter plot of customer clusters using Plotly and reduce dimensionality with PCA.

Conclusions and Recommendations

1. Cluster 0

  • Recency: Customers have high recency values (they have not visited the store for a while).
  • Transaction Count: Low transaction frequency.
  • Coupons Claimed: Some coupons were claimed, but activity is lower than other clusters.
  • Coupons Burned: Low usage.
  • Behavioral Insight: Low engagement; risk of churn.
  • Recommendation: Send attractive, high‑value coupons to re‑engage them, offering personalized, time‑limited offers.

2. Cluster 1

  • Recency: Low to medium recency (frequent store visits).
  • Transaction Count: Higher than Cluster 0 but lower than Cluster 2.
  • Coupons Claimed: Substantial number of coupons claimed.
  • Coupons Burned: Medium usage, lower than Cluster 2.
  • Behavioral Insight: Moderate engagement, high purchase frequency, active coupon usage.
  • Recommendation: Offer bundle discounts or coupons for products they have not recently purchased to increase basket size.

3. Cluster 2

  • Recency: Lowest recency (very frequent visits).
  • Transaction Count: Highest transaction frequency.
  • Coupons Claimed: Most coupons claimed.
  • Coupons Burned: Highest usage, possibly indicating coupon fatigue.
  • Behavioral Insight: Most loyal and active shoppers; however, the large number of used coupons may indicate they are overwhelmed.
  • Recommendation: Optimize coupon types for this group, avoid fatigue, provide personalized coupons based on past purchases, and offer loyalty rewards or exclusive benefits.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

E-commerce
Dataset Structure

Source

Organization: github

Created: 9/16/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.