DATASET

Open Source Community

new_york_citibike

This public dataset contains two BigQuery tables; the table used is `citybike_trips`, containing over 58 million records. The `tripduration` field indicates the duration of each bike rental (in seconds); other fields serve as potential features.

Updated 6/27/2024

github

Description

Dataset Overview

Dataset Information

Dataset Name: new_york_citibike
Data Table: citybike_trips
Data Volume: Over 58 million records
Label: tripduration (ride duration, in seconds)
Features: Other fields

Data Processing

Preprocessing: Cleaning, handling missing values, converting datetime variables, feature scaling
Data Splitting: Divide the dataset into three parts for model selection, evaluation, and testing, using month as the split criterion

Model Selection and Evaluation

Model Choice: Linear regression model
Evaluation Metric: Mean Squared Error (MSE)
Model Optimization: Iterative adjustments to improve performance

Model Evaluation Results

Model 1: trip_duration_by_stations, MSE = 111.2176
Model 2: trip_duration_by_stations_and_day, MSE = 98.0522
Model 3: trip_duration_by_stations_day_age, MSE = 110.8004

Conclusion

Prediction Outcome: A total of 1,548,371 predictions were made; the predicted values differ from the actual values by less than 15 minutes
Accuracy: In 89.6% of cases, the model predicts ride duration within 15 minutes of the actual value, and the average absolute error for predicting ride cost is 6.8 minutes

Recommended Strategy

Pricing Model Strategy: Adopt quarterly ticket pricing and periodic payment mode
User Story: When a user selects a bike at a start station on a specific date and specifies a destination, the model can predict ride duration and cost
Model Performance: The model can predict ride duration and cost with accuracies of 89.6% and an average absolute error of 6.8 minutes respectively

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Bike Sharing

Data Analysis

Source

Organization: github

Created: 6/12/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →