Back to datasets
Dataset assetOpen Source CommunityData AnalysisEsports
CS:GO Pro Matches Comprehensive Dataset
The dataset comprises all professional CS:GO matches from 2012 to 2023, totaling 126,872 matches, each with 155 distinct data points.
Source
github
Created
Feb 1, 2024
Updated
Feb 1, 2024
Signals
831 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
CS:GO Pro Matches Comprehensive Dataset
Dataset Description
This dataset contains professional CS:GO game data from 2012 to 2023 and is currently the largest public CS:GO professional match dataset.
Dataset Size
- Shape: (126,872, 155)
- File size: approximately 87 MB
Data Content
- Detailed data for 126,872 matches, each recording 155 different features.
Data Collection
- Data were scraped from the HLTV website using Python scripts with Selenium and BeautifulSoup.
- The scraping process ran for 12 days on a Google Cloud Platform VM.
- Three CSV files were generated:
- historic_games_list.csv: list of games with corresponding webpages and basic game information.
- game_data.csv: detailed game data extracted from individual game webpages.
- exception_data: records of all exceptions encountered during scraping.
Data Processing
- historic_games_list.csv and game_data.csv were merged, cleaned, and feature‑engineered to produce csgp_pro_games_data.csv.
Technology Stack
- Data processing and analysis: Jupyter Notebook, Python
- Database: MySQL
- Cloud service: Google Cloud
- Web‑scraping tools: Selenium
Data Usage
Researchers, scientists, machine‑learning engineers, data engineers, and enthusiasts are encouraged to use this dataset for model building, analysis, and research, and to share findings with the esports and data communities.
Contact
- Project link: CS:GO Pro Matches Comprehensive Dataset
- Contact: @tedtay
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.