Skip to content

Measuring the impact proximity to subway stations have on rental prices in Manhattan and Brooklyn

Notifications You must be signed in to change notification settings

slieb74/nyc-subways-impact-rental-prices

Repository files navigation

Predicting NYC Rental Prices Based on Proximity and Access to Subways

Goal

Measuring the impact proximity to subway stations has on rental prices in Manhattan and Brooklyn, and how future station openings and/or closures will impact neighborhood prices.

ETL

We gathered data from four sources:

  • Location data for station entrances and line access from MTA's API
  • Apartment sales from the NYC Department of Finance
  • Median neighborhood rental prices and sale-to-rent ratios from Zillow
  • Apartment coordinates from GoogleMaps API

Our apartment data consisted of sales instead of rentals, so we used the median neighborhood rental prices from Zillow to convert the apartment sale prices into a rent estimate that better suited our goal. We wanted to focus on rentals instead of sales because it is less static of a market and therefore would see a greater impact from changes in subway access.

In order to get the distance from each apartment to the subway entrance, we used GoogleMaps API to convert addresses into coordinates, from which we could calculate the distance in miles using the Haversine formula. From there, we found for each apartment every station with unique subway access within 0.55 miles of the apartment (roughly a 10-minute walk)

Mapping using GeoPandas

To get a sense of where our apartments were located, and to ensure that we were not focusing on a few neighborhoods, we used GeoPandas to map each neighborhood, apartment, and subway line.

Each apartment plotted in its neighborhood using GeoPandas

Each subway station entrance plotted in using GeoPandas

Machine Learning Models

We used 4 different classification models to predict whether an apartment's rental price would be above or below its neighborhood median, given its access and proximity to different lines.

The four models we used were:

  • Logistic Regression
  • Random Forest
  • Gradient Boosting
  • AdaBoost

The best performing model was the Random Forest Classifier, which had an Accuracy of 74.52% and AUC of 81.57%.

The optimal hyperparameters for the Random Forest Classifier, cross-validated using GridSearchCV, were:

  • 250 estimators
  • Gini impurity
  • Minimum 5 sample splits
  • Minimum 5 sample leafs
Results

Confusion Matrix

ROC Curve

Next Steps

  • Due to time constraints, we had to limit the scope of our project to Manhattan and Brooklyn, but in the future, I would love to explore both the Bronx and Queens as well
  • Would like to predict how the upcoming L Train shutdown will affect rental prices in Williamsburg
  • Add Citibike data to our project and measure the impact dock openings have had on rental prices
  • Make the maps interactive so that when the user hovers over an apartment, it sees the rental price, address, and neighborhood median price

About

Measuring the impact proximity to subway stations have on rental prices in Manhattan and Brooklyn

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published