Back to datasets
Dataset assetOpen Source CommunityMachine LearningTraffic Management

PEMS_SF UCI Machine learning dataset

The PEMS_SF UCI Machine Learning Dataset is a collection for training and testing machine learning models. It includes separate training and test files for model development and validation. Files include PEMS_train, PEMS_test, among others.

Source
github
Created
Nov 18, 2024
Updated
Nov 21, 2024
Signals
455 views
Availability
Linked source ready
Overview

Dataset description and usage context

PEMSF_Project Dataset Overview

Dataset Files

  • PEMS_train: Training data file; due to its large size it is not uploaded to GitLab. It can be downloaded from the following link: PEMS_train
  • PEMS_trainlabels.txt: Training data label file
  • PEMS_test.txt: Test data file
  • PEMS_testlabels.txt: Test data label file
  • First_Day_Guess_label.txt: First‑day guess label file
  • First_Day_Guess_test.txt: First‑day guess test file
  • Second_Day_Guess_label.txt: Second‑day guess label file
  • Second_Day_Guess_test.txt: Second‑day guess test file
  • Third_Day_Guess_label.txt: Third‑day guess label file
  • Third_Day_Guess_test.txt: Third‑day guess test file
  • stations_list.txt: Text file containing all sensor IDs for data extraction

Code Files

  • project_group2.ipynb: Python notebook for model training
  • Group2_Project_Prototype.ipynb: Prototype notebook for the project
  • Project_Data_Extractions.ipynb: Python notebook for extracting occupancy data from https://pems.dot.ca.gov

Usage Instructions

  1. Run project_group2.ipynb:

    • Download and place PEMS_train in the same directory as the notebook.
    • Download PEMS_trainlabels.txt, PEMS_test.txt, and PEMS_testlabels.txt and ensure they are in the same directory.
  2. Run Group2_Project_Prototype.ipynb:

    • Download the notebook and the associated files (PEMS_test.txt, PEMS_trainlabels.txt, First_Day_Guess_label.txt, First_Day_Guess_test.txt, Second_Day_Guess_label.txt, Second_Day_Guess_test.txt, Third_Day_Guess_label.txt, Third_Day_Guess_test.txt) and place them alongside the notebook.
  3. Run Project_Data_Extractions.ipynb:

    • Create an account at https://pems.dot.ca.gov and enter the username and password on lines 110‑111 of the notebook.
    • Download stations_list.txt and keep it in the notebook’s directory.
    • Execute the notebook to collect and preprocess occupancy sensor data into self_test.txt.

Future Work

  • Project_Data_Extractions.ipynb is under development to automate the entire data collection and organization process for seamless model ingestion.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio