Back to datasets
Dataset assetOpen Source CommunityVehicle ClaimAudit Analysis

Vehicle Claim

This dataset is synthetic data created from the DVI dataset for vehicle claim auditing. It includes attributes such as vehicle make, model, color, registration year, body type, mileage, engine size, transmission type, fuel type, price, seat count, door count, damage type, specific damage, repair complexity, repair hours, and repair cost.

Source
github
Created
Sep 26, 2022
Updated
Dec 24, 2022
Signals
159 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset List

  1. Vehicle Claim - Synthetic dataset created using the DVI dataset.
  2. Car Insurance - Dataset from Kaggle, link: Car Insurance.
  3. Vehicle Insurance - Dataset from Github, link: Vehicle Insurance.

Vehicle Claim dataset details

  • Creation code: Code to create the dataset.
  • Dataset storage location: Dataset storage location.
  • Attribute list:
    • Maker - Categorical, vehicle make.
    • GenModel - Categorical, vehicle model.
    • Color - Categorical, vehicle color.
    • Reg_Year - Categorical, registration year.
    • Body_Type - Categorical, e.g., SUV, Convertible.
    • Runned_Miles - Numerical, vehicle mileage.
    • Engin_Size - Categorical, engine size.
    • GearBox - Categorical, automatic or manual.
    • FuelType - Categorical, gasoline or diesel.
    • Price - Numerical, vehicle price.
    • Seat_num - Numerical, number of seats.
    • Door_num - Numerical, number of doors.
    • issue - Categorical, damage type.
    • issue_id - Categorical, specific damage.
    • repair_complexity - Categorical, repair difficulty.
    • repair_hours - Numerical, repair time required.
    • repair_cost - Numerical, repair cost.

Training and Evaluation Parameters

  • Training parameters:
    • dataset - Training dataset selection (vehicle_claims, car_insurance, vehicle_insurance).
    • data - Data type (normal or mixed).
    • encoding - Categorical feature encoding method.
    • numerical - Whether to use only numerical features.
    • batch_size - Batch size.
    • epoch - Number of training epochs.
    • latent_dim - Latent space dimension.
  • Evaluation parameters:
    • threshold - Evaluation threshold.

Citation

  • Paper citation:

@article{ Author = {Ajay Chawda and Stefanie Grimm and Marius Kloft}, Title = {Unsupervised Anomaly detection for Auditing Data and Impact of Cetgorical Encodings}, Journal = {https://arxiv.org/abs/2210.14056}, Year = {2022}, }

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio