Skip to content

Automated Valuation Machine Learning Model for Lima House Pricing

Notifications You must be signed in to change notification settings

PBenavides/Automated-Valuation-Model-Lima

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Valuation Model - Lima

A Spatial Machine Learning project to price Houses

This project contains mainly three files:

  • Scraper 🕷️
  • notebooks 📓
  • webapp 💻

Exploratory Spatial Analysis

This notebook uses Spatial Analysis in order to take advantage of the coordinates features extracted of the webpage, in that sense, we made sure that Spatial analysis could leverage the model performance. You can watch the results in the Spatial Autocorrelations Notebook. The conclusions about these features can be sumarry as follows:

  • Moran’s test says to us that our data contains a relationship between our target variable and the space.
  • The relationship mentioned above it’s not strong but exists. Meaning the Spatial Features could add some value to the model.
  • Local Spatial Autocorrelation test validates the well-known hypothesis that Lima is a centric city since the clusters are spread out around the center of the city.

You could also see Choropleths and other exploration images on Exploratory Data Analysis Notebook

Choropleth

choropleth

Moran Plot:

Moran Plot

About the Machine Learning Model

Results are stored mainly in 2020_Notebook04_Model_Selection Notebook and Pycaret were used for the rapid development on model selection and features. The main problem with this dataset is that is apparently small to solve the problem of outliers. Outliers are the main thing when it came to overperformance the first benchmarks that we tested.

Also, there is multiple integrations such as Point Of interest Clusters or Crime Clusters added to the model. But since there is many development cost on going with these into production, in comparison with value added on the benchmark metrics, the ML Model is maintained as a basic version in their API. It's also importat to add that the value of the ML Model is totally dependent on the quality of data. This project has only been trained by one-period housing data, there is much pontentiality on seeing trends through time but the Urbania webpage doesn't allow to scrap recurrently information of their webpage.

Webapp

For the deploynment,

  • [CSS & HTML] - Basic thing for web apps!
  • [Flask] - As a Backend. (API DONE, form interface in progress)
  • [Heroku] - Server app (In progress)

Architecture

In progress Architecture

License

MIT

About

Automated Valuation Machine Learning Model for Lima House Pricing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages