Back to datasets
Dataset assetOpen Source CommunityMachine LearningReal Estate Analysis
Houses.csv
Dataset for a machine learning project on Polish house price prediction, containing detailed information such as location, size, and floor.
Source
github
Created
Dec 9, 2023
Updated
Dec 18, 2023
Signals
144 views
Availability
Linked source ready
Overview
Dataset description and usage context
Polish House Price Prediction Dataset Overview
Dataset Structure
- data/: Contains the raw dataset
Houses.csvand preprocessed data filesX_train.csv,X_test.csv,y_train.csv,y_test.csv. - models/: Stores trained models, including
linear_regression_model.pklandknn_model.pkl. - src/: Contains source code for data preprocessing, model training, and evaluation, such as
preprocessing.py,linear_regression.py,knn.py,main.py. - notebooks/: Contains Jupyter notebooks for exploratory data analysis and model building,
EDA.ipynbandModeling.ipynb.
Model Information
-
Linear Regression Model:
- Trained using scikit‑learn's
LinearRegression. - Model saved as
models/linear_regression_model.pkl. - Evaluation metrics include mean squared error, R² score, and cross‑validation score.
- Trained using scikit‑learn's
-
K‑Nearest Neighbors (KNN) Model:
- Trained using scikit‑learn's
KNeighborsRegressor. - Model saved as
models/knn_model.pkl. - Evaluation metrics include mean squared error, R² score, and cross‑validation score.
- Trained using scikit‑learn's
Future Improvement Directions
- Hyperparameter tuning: Try different configurations to improve model performance.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.