JUHE API Marketplace
DATASET
Open Source Community

Microplastics in Drinking Water

This dataset records microplastic presence in drinking water. Each row represents a water‑sample record, containing microplastic material and type, color, water source type (tap or bottled), and the sampling location's latitude and longitude. The dataset focuses on polyethylene (PE) material for predicting PE levels across different geographic locations.

Updated 2/24/2024
github

Description

Dataset Overview

Dataset Name

  • The dataset is named “Microplastics in Drinking Water,” with the specific file called “Microplastics Sample Data (wide).”

Dataset Source

Dataset Content

  • Each row represents a water‑sample record with associated information.
  • Key columns include microplastic material and type (content per sample), color, tap vs. bottled water, sampling location, and approximate coordinates.
  • The project focuses on PE (polyethylene); other “material” columns are removed.

Data Processing

  • The original dataset had over 100 columns; columns with fewer than 40 values were removed.
  • Additional cleaning removed unnecessary columns such as Sample_ID and handled all NAN or Present values.
  • Samples from Chinese reservoirs with extreme values were excluded.

Dataset Usage

  • Random Forest, k‑NN regression, and Decision Tree regression models were employed for prediction.
  • Model evaluation indicated Decision Tree regression performed best, though its predictive power is limited by sample size and data quality.

Dataset Limitations

  • The dataset suffers from many missing values and mismatched data types; after cleaning, only about 60 samples remain usable.
  • Updated continuously since 21 July 2022, but current data reliability and standardization are insufficient for robust predictive modeling.

Conclusion

  • Despite testing multiple models, the dataset’s quality prevents reliable predictions of drinking‑water safety based on microplastic content. Further data collection and standardization are required.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Water Quality Monitoring
Plastic Pollution

Source

Organization: github

Created: 2/22/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.