Back to datasets
Dataset assetOpen Source CommunityMachine LearningReal Estate Analysis

Houses.csv

Dataset for a machine learning project on Polish house price prediction, containing detailed information such as location, size, and floor.

Source
github
Created
Dec 9, 2023
Updated
Dec 18, 2023
Signals
144 views
Availability
Linked source ready
Overview

Dataset description and usage context

Polish House Price Prediction Dataset Overview

Dataset Structure

  • data/: Contains the raw dataset Houses.csv and preprocessed data files X_train.csv, X_test.csv, y_train.csv, y_test.csv.
  • models/: Stores trained models, including linear_regression_model.pkl and knn_model.pkl.
  • src/: Contains source code for data preprocessing, model training, and evaluation, such as preprocessing.py, linear_regression.py, knn.py, main.py.
  • notebooks/: Contains Jupyter notebooks for exploratory data analysis and model building, EDA.ipynb and Modeling.ipynb.

Model Information

  1. Linear Regression Model:

    • Trained using scikit‑learn's LinearRegression.
    • Model saved as models/linear_regression_model.pkl.
    • Evaluation metrics include mean squared error, R² score, and cross‑validation score.
  2. K‑Nearest Neighbors (KNN) Model:

    • Trained using scikit‑learn's KNeighborsRegressor.
    • Model saved as models/knn_model.pkl.
    • Evaluation metrics include mean squared error, R² score, and cross‑validation score.

Future Improvement Directions

  • Hyperparameter tuning: Try different configurations to improve model performance.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio