DATASET
Open Source Community
new_york_citibike
This public dataset contains two BigQuery tables; the table used is `citybike_trips`, containing over 58 million records. The `tripduration` field indicates the duration of each bike rental (in seconds); other fields serve as potential features.
Updated 6/27/2024
github
Description
Dataset Overview
Dataset Information
- Dataset Name:
new_york_citibike - Data Table:
citybike_trips - Data Volume: Over 58 million records
- Label:
tripduration(ride duration, in seconds) - Features: Other fields
Data Processing
- Preprocessing: Cleaning, handling missing values, converting datetime variables, feature scaling
- Data Splitting: Divide the dataset into three parts for model selection, evaluation, and testing, using month as the split criterion
Model Selection and Evaluation
- Model Choice: Linear regression model
- Evaluation Metric: Mean Squared Error (MSE)
- Model Optimization: Iterative adjustments to improve performance
Model Evaluation Results
- Model 1:
trip_duration_by_stations, MSE = 111.2176 - Model 2:
trip_duration_by_stations_and_day, MSE = 98.0522 - Model 3:
trip_duration_by_stations_day_age, MSE = 110.8004
Conclusion
- Prediction Outcome: A total of 1,548,371 predictions were made; the predicted values differ from the actual values by less than 15 minutes
- Accuracy: In 89.6% of cases, the model predicts ride duration within 15 minutes of the actual value, and the average absolute error for predicting ride cost is 6.8 minutes
Recommended Strategy
- Pricing Model Strategy: Adopt quarterly ticket pricing and periodic payment mode
- User Story: When a user selects a bike at a start station on a specific date and specifies a destination, the model can predict ride duration and cost
- Model Performance: The model can predict ride duration and cost with accuracies of 89.6% and an average absolute error of 6.8 minutes respectively
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Bike Sharing
Data Analysis
Source
Organization: github
Created: 6/12/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.