Back to datasets
Dataset assetOpen Source CommunityData AnalysisE-commerce
Brazilian E-Commerce Public Dataset by Olist
The Brazilian E‑Commerce Public Dataset by Olist contains order information from 2016‑2018 across multiple marketplaces in Brazil, with 100,000 orders. Features allow multi‑dimensional analysis of orders, including status, price, payment, shipping performance, customer location, product attributes, and customer reviews. A geographic dataset with latitude‑longitude coordinates linked to Brazilian postal codes is also provided.
Source
github
Created
Apr 28, 2024
Updated
May 13, 2024
Signals
446 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- Name: Brazilian E‑Commerce Public Dataset by Olist
Dataset Description
- Description: The dataset contains order information from 2016‑2018 across multiple marketplaces in Brazil, covering 100,000 orders. Features allow multi‑dimensional analysis of orders, including status, price, payment, shipping performance, customer location, product attributes, and customer reviews. Geographic information with latitude‑longitude coordinates linked to Brazilian postal codes is also provided.
Characteristics
- Anonymization: All store and partner names have been replaced with the names of noble houses from Game of Thrones.
- Multi‑Dimensional Analysis: Supports analyses such as order status, price trends, customer satisfaction, sales forecasting, delivery performance optimization, and product quality assessment.
- Geographic Data: Provides latitude‑longitude coordinates for Brazilian postal codes.
Uses
- NLP: Analyze customer review texts.
- Clustering: Study customer satisfaction among those who did not leave reviews.
- Sales Forecasting: Predict future sales using purchase dates.
- Delivery Performance: Optimize delivery times.
- Product Quality: Identify product categories that lead to customer dissatisfaction.
- Feature Engineering: Create new features or combine external public data.
Structure
- Splits: Data are partitioned into several subsets for easier understanding and organization.
- Customer Identification: Each order is linked to a unique customer ID; the same customer may have different IDs across orders, but a global customer identifier can be used to detect repeat purchases.
Version Notes
- A previous classification dataset was removed in version 6; a new release is planned under a revised data architecture.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.