Back to datasets
Dataset assetOpen Source CommunityFraud DetectionSimulated Data

PaySim

The PaySim dataset contains over 6 million data points, each with 9 features, generated by the PaySim retail simulation software. It is used for fraud and anomaly detection, where fraudulent behavior simulates agents profiting by transferring funds and withdrawing cash from the system.

Source
github
Created
Feb 23, 2018
Updated
May 7, 2024
Signals
351 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

Fraud and Anomaly Detection using Synthetic Transactional Data

Dataset Goal

Develop a method to minimize false negatives when evaluating new data points.

Dataset Source

PaySim dataset, generated by PaySim Retail Simulation Software, containing over 6 million data points.

Dataset Location

Kaggle

Dataset Features

  1. type: Transaction type, including CASH-IN, CASH-OUT, DEBIT, PAYMENT, and TRANSFER.
  2. amount: Transaction amount in local currency.
  3. nameOrig: Customer initiating the transaction.
  4. oldbalanceOrg: Original balance before the transaction.
  5. newbalanceOrig: New balance after the transaction.
  6. nameDest: Recipient customer of the transaction.
  7. oldbalanceDest: Recipient's original balance before the transaction. Note: customers prefixed with M (merchant) do not have this information.
  8. newbalanceDest: Recipient's new balance after the transaction. Note: customers prefixed with M (merchant) do not have this information.

Target Variable

  1. isFraud: Transactions conducted by fraudulent agents in the simulation.

Additional Feature

Step: Synthetic timestamp of the transaction occurrence.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio