Back to datasets
Dataset assetOpen Source CommunityFIFA GamePlayer Data Analysis

FIFA 23 Players Dataset

This dataset contains extensive statistical data of players in the EA FIFA23 video game, with key attributes including name, age, height, overall rating, club, etc., which are essential for precise and effective research of player performance and characteristics.

Source
github
Created
Sep 2, 2024
Updated
Sep 3, 2024
Signals
283 views
Availability
Linked source ready
Overview

Dataset description and usage context

FIFA23 Player Data Analysis

Description

This project applies various supervised learning tools to a dataset containing FIFA 23 player statistics. The goal is to gain deep insight into the best and worst players and identify their characteristics based on their attributes.

Main Activities

  • Data Preprocessing: Initially optimize the dataset by transforming variables and changing formats to prepare for in‑depth analysis.
  • Supervised Learning Techniques: Apply a suite of classification and regression methods such as Random Forest, Decision Tree, Logistic Regression, KNN, etc., to analyze player statistics and extract meaningful insights.

Dataset

The dataset includes extensive statistical data of players from the EA FIFA23 video game. Key attributes include:

  • Name
  • Age
  • Height
  • Overall Rating
  • Club
  • and many others...

These attributes are crucial for precise and effective research of player performance and characteristics.

Preprocessing

The preprocessing stage employs several techniques to refine and enhance the dataset:

  • Dimensionality Reduction: Remove irrelevant variables to simplify analysis.
  • Feature Engineering: Convert variable types (e.g., from string to numeric) for better compatibility with analytical tools.
  • Data Visualization: Use visualization tools to better understand the dataset and identify key patterns and trends.

Modeling

Classification Techniques

  • LDA (Linear Discriminant Analysis): Finds linear combinations of features that best separate classes.
  • QDA (Quadratic Discriminant Analysis): Similar to LDA but allows quadratic decision boundaries.
  • Binary Classification (Logistic Regression): Applied to predict binary outcomes such as whether a player is top‑tier.
  • Penalized Logistic Regression: Handles overfitting by penalizing large coefficients.
  • Cost‑Sensitive Learning: Adjusts for different costs associated with misclassification.
  • Risk‑Aware Learning: Focuses on minimizing prediction‑related risk.
  • Decision Tree: Used for classification and regression, providing interpretable models.
  • Random Forest: An ensemble method to improve predictive accuracy.
  • Gradient Boosting: A powerful technique that enhances model accuracy by combining weak learners.
  • Sub‑sampling Techniques: Used to balance datasets and improve model performance.

Regression Techniques

  • Linear Regression: Basic model for predicting continuous outcomes.
  • Over‑fitted Linear Regression: Explores the impact of over‑fitting on model performance.
  • Forward and Backward Regression: Stepwise methods for feature selection.
  • Ridge and Lasso Regression: Regularization techniques that prevent over‑fitting by penalizing large coefficients.
  • KNN (K‑Nearest Neighbors): A non‑parametric method for classification and regression.
  • Random Forest: Also applied to regression tasks by averaging multiple decision trees.

Required Packages

To run the code, the following R packages are required:

r
c("tidyverse", "plyr", "ggplot2", "MASS", "caret", "e1071", "skimr", "mice", "VIM", "glmnet", "rpart", "pROC", "class", "randomForest")
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio