Back to datasets
Dataset assetOpen Source CommunitySports Data AnalysisUFC
complete_ufc_data.csv
This dataset integrates 30 years of UFC match history (starting from 1994), individual fighter statistics, and nine years of historical betting odds (starting from November 2014). It includes detailed information such as match date, name, weight class, fighter information, betting data, match outcomes, and methods of victory.
Source
github
Created
Sep 19, 2023
Updated
Dec 28, 2023
Signals
1,844 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Contents
- File name:
/data/complete_ufc_data.csv - Description: This dataset aggregates 30 years of UFC match history (since 1994), fighter statistics, and nine years of historical betting odds (since November 2014).
Data Dictionary
| Column | Example | Description | Source |
|---|---|---|---|
event_date | 2023-09-16 | UFC event date | Extracted from UFC match history |
event_name | UFC Fight Night: Grasso vs. Shevchenko 2 | UFC event name | Extracted from UFC match history |
weight_class | Womens Flyweight | UFC weight class | Extracted from UFC match history |
fighter1, fighter2 | Alexa Grasso, Valentina Shevchenko | Fighter names | Extracted from UFC match history |
favourite, underdog | Valentina Shevchenko, Alexa Grasso, NaN | Favourite and underdog fighters | Historical odds from betmma.tips |
favourite_odds, underdog_odds | 1.67, 2.88, NaN | Betting odds (decimal) | Historical odds from betmma.tips |
betting_outcome | favourite, underdog, NaN | Betting outcome | Historical odds from betmma.tips |
outcome | fighter1, fighter2, Draw | Match result | Extracted from UFC match history |
method | S-DEC, U-DEC, KO/TKO Punches | Victory method | Extracted from UFC match history |
round | 5 | Winning round | Extracted from UFC match history |
fighter1_*, fighter2_* | Fighter attributes | Extracted from UFC fighter statistics | |
events_extract_ts, odds_extract_ts, fighter_extract_ts | 2023-09-21 02:02:55.178363 | Data extraction timestamp |
Data Extraction
- Code: Python scripts were used for web scraping and data preprocessing.
- Functionality: Completed UFC data scraping (fighter stats and match results), historical betting odds scraping, and data cleaning.
Exploratory Data Analysis (EDA) / Data Visualization
- Insight: Historical win probability shows a strong correlation between age and average strikes per minute with match success. Younger fighters or those with higher strike output have a statistical advantage, winning about 60% of matches.
- Insight: The historical probability that the favourite wins rises from slightly above 50% to over 75% when the decimal odds difference exceeds 2.0. Moreover, as the odds gap widens, this likelihood increases, reaching about 90% when the odds difference exceeds 4.5.
Predictive Modeling
- Development status: Ongoing; machine‑learning models are being tested for predicting match outcomes based on fighter statistics.
- Preliminary test: Initial models (GBM, logistic regression) achieve roughly 65% accuracy without betting odds features.
- Future iterations: Planned testing of additional features such as win streaks, finish rates, derived attributes (endurance, wrestler/striker/slugger tags) and whether a fighter is a betting favourite.
Setup
- Dependency management: Managed with Poetry or pip.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.