DATASET
Open Source Community
Brazilian E-Commerce Public Dataset by Olist
The Brazilian E‑Commerce Public Dataset by Olist contains order information from 2016‑2018 across multiple marketplaces in Brazil, with 100,000 orders. Features allow multi‑dimensional analysis of orders, including status, price, payment, shipping performance, customer location, product attributes, and customer reviews. A geographic dataset with latitude‑longitude coordinates linked to Brazilian postal codes is also provided.
Updated 5/13/2024
github
Description
Dataset Overview
Dataset Name
- Name: Brazilian E‑Commerce Public Dataset by Olist
Dataset Description
- Description: The dataset contains order information from 2016‑2018 across multiple marketplaces in Brazil, covering 100,000 orders. Features allow multi‑dimensional analysis of orders, including status, price, payment, shipping performance, customer location, product attributes, and customer reviews. Geographic information with latitude‑longitude coordinates linked to Brazilian postal codes is also provided.
Characteristics
- Anonymization: All store and partner names have been replaced with the names of noble houses from Game of Thrones.
- Multi‑Dimensional Analysis: Supports analyses such as order status, price trends, customer satisfaction, sales forecasting, delivery performance optimization, and product quality assessment.
- Geographic Data: Provides latitude‑longitude coordinates for Brazilian postal codes.
Uses
- NLP: Analyze customer review texts.
- Clustering: Study customer satisfaction among those who did not leave reviews.
- Sales Forecasting: Predict future sales using purchase dates.
- Delivery Performance: Optimize delivery times.
- Product Quality: Identify product categories that lead to customer dissatisfaction.
- Feature Engineering: Create new features or combine external public data.
Structure
- Splits: Data are partitioned into several subsets for easier understanding and organization.
- Customer Identification: Each order is linked to a unique customer ID; the same customer may have different IDs across orders, but a global customer identifier can be used to detect repeat purchases.
Version Notes
- A previous classification dataset was removed in version 6; a new release is planned under a revised data architecture.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
E-commerce
Data Analysis
Source
Organization: github
Created: 4/28/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.