Skip to content

Latest commit

 

History

History
24 lines (21 loc) · 1.17 KB

README.md

File metadata and controls

24 lines (21 loc) · 1.17 KB

Predicting House Prices - Kaggle Competition

This notebook is my submission for the Kaggle House Prices Competition, which contains a dataset containing the the features of 1460 houses, plus their selling prices.

This notebook uses several different regression methodologies in order to predict the price of a house given its features (its area, how many bedrooms it has, the year it was built, etc). My methodology is outlined as follows:

  • Load the dataset into a Pandas dataframe
  • Feature selection:
    • Identify which numerical features to keep based on their correlation with the sales price
    • Identify which features to keep from those based on their correlation with each other
    • Identify which categorical features to keep
  • Handle missing data
  • Feature Engineering
  • Data Pre-Processing:
    • One-hot encode the categorical features
    • Standardise and normalise the numerical features
  • Perform different regression models on the data and compare their performance (in terms of MRSE):
    • Linear Regression
    • Ridge Regression
    • Lasso Regression
    • Random Forest
    • Support Vector Machine
    • Extreme Gradient Boosting
  • And finally, make the submission on the test data.