DATASET
Open Source Community
Microplastics in Drinking Water
This dataset records microplastic presence in drinking water. Each row represents a water‑sample record, containing microplastic material and type, color, water source type (tap or bottled), and the sampling location's latitude and longitude. The dataset focuses on polyethylene (PE) material for predicting PE levels across different geographic locations.
Updated 2/24/2024
github
Description
Dataset Overview
Dataset Name
- The dataset is named “Microplastics in Drinking Water,” with the specific file called “Microplastics Sample Data (wide).”
Dataset Source
- Released by the California State Water Resources Control Board, accessible via: Microplastics in Drinking Water.
Dataset Content
- Each row represents a water‑sample record with associated information.
- Key columns include microplastic material and type (content per sample), color, tap vs. bottled water, sampling location, and approximate coordinates.
- The project focuses on PE (polyethylene); other “material” columns are removed.
Data Processing
- The original dataset had over 100 columns; columns with fewer than 40 values were removed.
- Additional cleaning removed unnecessary columns such as
Sample_IDand handled allNANorPresentvalues. - Samples from Chinese reservoirs with extreme values were excluded.
Dataset Usage
- Random Forest, k‑NN regression, and Decision Tree regression models were employed for prediction.
- Model evaluation indicated Decision Tree regression performed best, though its predictive power is limited by sample size and data quality.
Dataset Limitations
- The dataset suffers from many missing values and mismatched data types; after cleaning, only about 60 samples remain usable.
- Updated continuously since 21 July 2022, but current data reliability and standardization are insufficient for robust predictive modeling.
Conclusion
- Despite testing multiple models, the dataset’s quality prevents reliable predictions of drinking‑water safety based on microplastic content. Further data collection and standardization are required.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Water Quality Monitoring
Plastic Pollution
Source
Organization: github
Created: 2/22/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.