Back to datasets
Dataset assetOpen Source CommunitySports Data AnalysisUFC

complete_ufc_data.csv

This dataset integrates 30 years of UFC match history (starting from 1994), individual fighter statistics, and nine years of historical betting odds (starting from November 2014). It includes detailed information such as match date, name, weight class, fighter information, betting data, match outcomes, and methods of victory.

Source
github
Created
Sep 19, 2023
Updated
Dec 28, 2023
Signals
1,844 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Contents

  • File name: /data/complete_ufc_data.csv
  • Description: This dataset aggregates 30 years of UFC match history (since 1994), fighter statistics, and nine years of historical betting odds (since November 2014).

Data Dictionary

ColumnExampleDescriptionSource
event_date2023-09-16UFC event dateExtracted from UFC match history
event_nameUFC Fight Night: Grasso vs. Shevchenko 2UFC event nameExtracted from UFC match history
weight_classWomens FlyweightUFC weight classExtracted from UFC match history
fighter1, fighter2Alexa Grasso, Valentina ShevchenkoFighter namesExtracted from UFC match history
favourite, underdogValentina Shevchenko, Alexa Grasso, NaNFavourite and underdog fightersHistorical odds from betmma.tips
favourite_odds, underdog_odds1.67, 2.88, NaNBetting odds (decimal)Historical odds from betmma.tips
betting_outcomefavourite, underdog, NaNBetting outcomeHistorical odds from betmma.tips
outcomefighter1, fighter2, DrawMatch resultExtracted from UFC match history
methodS-DEC, U-DEC, KO/TKO PunchesVictory methodExtracted from UFC match history
round5Winning roundExtracted from UFC match history
fighter1_*, fighter2_*Fighter attributesExtracted from UFC fighter statistics
events_extract_ts, odds_extract_ts, fighter_extract_ts2023-09-21 02:02:55.178363Data extraction timestamp

Data Extraction

  • Code: Python scripts were used for web scraping and data preprocessing.
  • Functionality: Completed UFC data scraping (fighter stats and match results), historical betting odds scraping, and data cleaning.

Exploratory Data Analysis (EDA) / Data Visualization

  • Insight: Historical win probability shows a strong correlation between age and average strikes per minute with match success. Younger fighters or those with higher strike output have a statistical advantage, winning about 60% of matches.
  • Insight: The historical probability that the favourite wins rises from slightly above 50% to over 75% when the decimal odds difference exceeds 2.0. Moreover, as the odds gap widens, this likelihood increases, reaching about 90% when the odds difference exceeds 4.5.

Predictive Modeling

  • Development status: Ongoing; machine‑learning models are being tested for predicting match outcomes based on fighter statistics.
  • Preliminary test: Initial models (GBM, logistic regression) achieve roughly 65% accuracy without betting odds features.
  • Future iterations: Planned testing of additional features such as win streaks, finish rates, derived attributes (endurance, wrestler/striker/slugger tags) and whether a fighter is a betting favourite.

Setup

  • Dependency management: Managed with Poetry or pip.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio