Rental Analysis - Pune: Project Overview

Secured a tool that estimates the rental price for Pune city which will help a person to estimates rent according to the parameters like comfort, location & many more.
Scrapped over 16000+ houses descriptions from 99acres.com using Python and Beautiful Soup.
Generated features from total facilities provided by the owner, also created valuable features related cost from.
Engineered Geospatial features using text address data.
Optimised Linear, Random forest regressor using Randomsearchcv.
Built a client-facing API using flask.

Code & Resources used

Python Version: 3.8

Packages: Pandas, Scikit-learn, Numpy, Geocoder, Folium, BeautifulSoup, flask, JSON, pickle, Matplotlib, seaborn.

For WebFrame requirements: pip install requirements.txt

rent Traffic: Check map.html to check current rent traffic.

Flask deployment Tutorial - https://towardsdatascience.com/productionize-a-machine-learning-model-with-flask-and-heroku-8201260503d2

Motivation

DS Project from Scratch by Ken Jee - https://www.youtube.com/playlist?list=PL2zq7klxX5ASFejJj80ob9ZAnBHdz5O1t
#66daysofdata - https://www.youtube.com/watch?v=iiSZqsQKNX8&ab_channel=KenJee

Web Scrapping

This code helps you to find different house's data which is the available web.

I referred to 99acres.com for this project. The interface of the web page looks like this:

Run the web scraper to scrape 15000 housing data from 99acres.com Each sample(row) has the following points:

rent (Target Variable)
Number of bedrooms
Number of bathrooms
Number of Balconies
Brokerage amount
Deposit Amount
Maintenance amount
Built-Up Area
Super Built-Up Area
Type of Furnishing
Availability for
Address
Floor Number
Home Facing
Floor-type
Gate Community
Corner Property
Parking Count
WheelChairFacility
Pet-Friendly
Agreement Duration
Electricity Bill
Power Backup Facility
Property age

Data Cleaning

After scrapping 16000+ data from the web. Data contained a large amount of text data combined with numerical data. I needed to clean it so that it was usable for the model for ML model.

I performed few operations of data cleaning are listed below:

Derived locality of address by cleaning text parameter.
Parsed numerical values from the area.
Removed rows which doesn't value.
Parsed facilities 4 features from a list of text data.
Made cost-wise columns such as deposit(security amount), maintenance amount & brokerage amount from the text data list.
Parking count has simplified, merged open & closed parking values.
Parsed numeric data from agreement duration, notice durations.

EDA

I used folium & geocoder packages to create & analyze rent quantitative distribution area-wise in the city.

MAP Visual

Continuous Data

Also, Based on a list of the continuous data variables did bivariate.

Ordinal Varibale

You can check the bivariate analysis of data for ordinal variables.

Model Building

Following step I have before building model
Used Binary Encoding for Binary variables
Used Ordinal Encoding on Ordinal Variables

I have 2 differnt models & evaulted them with using mean absolute error & mean_log_squared_error. I chose MAE & MLSE for because it is easy to interpret.

Model Listing :

Linear Regression: Create a baseline for model.
Random forest regressor: This model is selected due to good fitment on data.

Apart from that I use lasso & Ridge to check performace imporvement. But this part is not included in notebook.

Model Performace :

Linear Regression : r2_score : 0.69.

Random_forest : r2_score : 0.98 on train data & ~0.90 on test data / MSLE is 0.23.

Productionization

In this step, I built a flask API endpoint that was hosted on a local webserver by following along with the TDS tutorial in the reference section above. The API endpoint takes in a request with a list of values from a job listing and returns an estimated salary.

deployment

This is a client-facing API using flask run into cmd

To use please clone repository first in local enviroment.

Install requirements from pip install -r requirements.txt

activate venv : activate env

requests.py : contain the get request file.

data_input : Input file of requests.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
Code		Code
Data		Data
Scrapping output		Scrapping output
deployement files		deployement files
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rental Analysis - Pune: Project Overview

Code & Resources used

Motivation

Web Scrapping

Data Cleaning

EDA

Model Building

Model Performace :

Productionization

deployment

Thank You

About

Releases

Packages

Languages

senhorinfinito/rental_price_analysis

Folders and files

Latest commit

History

Repository files navigation

Rental Analysis - Pune: Project Overview

Code & Resources used

Motivation

Web Scrapping

Data Cleaning

EDA

Model Building

Model Performace :

Productionization

deployment

Thank You

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages