JUHE API Marketplace
DATASET
Open Source Community

PEMS_SF UCI Machine learning dataset

The PEMS_SF UCI Machine Learning Dataset is a collection for training and testing machine learning models. It includes separate training and test files for model development and validation. Files include PEMS_train, PEMS_test, among others.

Updated 11/21/2024
github

Description

PEMSF_Project Dataset Overview

Dataset Files

  • PEMS_train: Training data file; due to its large size it is not uploaded to GitLab. It can be downloaded from the following link: PEMS_train
  • PEMS_trainlabels.txt: Training data label file
  • PEMS_test.txt: Test data file
  • PEMS_testlabels.txt: Test data label file
  • First_Day_Guess_label.txt: First‑day guess label file
  • First_Day_Guess_test.txt: First‑day guess test file
  • Second_Day_Guess_label.txt: Second‑day guess label file
  • Second_Day_Guess_test.txt: Second‑day guess test file
  • Third_Day_Guess_label.txt: Third‑day guess label file
  • Third_Day_Guess_test.txt: Third‑day guess test file
  • stations_list.txt: Text file containing all sensor IDs for data extraction

Code Files

  • project_group2.ipynb: Python notebook for model training
  • Group2_Project_Prototype.ipynb: Prototype notebook for the project
  • Project_Data_Extractions.ipynb: Python notebook for extracting occupancy data from https://pems.dot.ca.gov

Usage Instructions

  1. Run project_group2.ipynb:

    • Download and place PEMS_train in the same directory as the notebook.
    • Download PEMS_trainlabels.txt, PEMS_test.txt, and PEMS_testlabels.txt and ensure they are in the same directory.
  2. Run Group2_Project_Prototype.ipynb:

    • Download the notebook and the associated files (PEMS_test.txt, PEMS_trainlabels.txt, First_Day_Guess_label.txt, First_Day_Guess_test.txt, Second_Day_Guess_label.txt, Second_Day_Guess_test.txt, Third_Day_Guess_label.txt, Third_Day_Guess_test.txt) and place them alongside the notebook.
  3. Run Project_Data_Extractions.ipynb:

    • Create an account at https://pems.dot.ca.gov and enter the username and password on lines 110‑111 of the notebook.
    • Download stations_list.txt and keep it in the notebook’s directory.
    • Execute the notebook to collect and preprocess occupancy sensor data into self_test.txt.

Future Work

  • Project_Data_Extractions.ipynb is under development to automate the entire data collection and organization process for seamless model ingestion.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Traffic Management
Machine Learning

Source

Organization: github

Created: 11/18/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.