Skip to content

wl2522/MSR2

Repository files navigation

Predicting the 2018 United States House of Representatives Elections and Voting Patterns with Polling Data

This repository contains the relevant model and visualization code for Team MSR2's Columbia University Spring 2018 Data Science Capstone Project with industry partner Microsoft Research.

Post-Election Update

In hindsight, given how far off our model's predictions were from the actual 2018 midterm election results (where the Democratic Party won 41 seats rather than losing 19 as predicted by our model), we feel that it would be worthwhile to highlight some possible explanations for the heavily Republican-favored predictions. Having discussed this issue with our mentors from Microsoft Research, we found that they obtained similarly Republican-favored predictions using the same dataset despite using more advanced modeling techniques than what was used here. This led us to believe that the main culprit was a Republican-leaning bias in the surveying methodology through which the polling data was collected. We go into further detail about how this bias may have been introduced in section 5.2 of our capstone report.

Abstract

Nationwide surveys are widely adopted for gathering household information, evaluating public policies, and predicting elections. However, no unofficial survey can achieve census-level coverage. With survey data gathered by PredictWise using Pollfish, a mobile survey platform, we use a multilevel regression and poststratification (MRP) model to predict the two-party vote shares for each of the 435 congressional districts and the District of Columbia in the forthcoming 2018 United States House of Representatives Elections. We use these predictions to investigate the changes in voter turnout in specific demographics that would be needed to change the balance of power in the House of Representatives. In addition to widely-used demographic and geographic information, we incorporate responses to psychometric survey questions using three weighting schemes to evaluate their effects on the model. We identify several question topics that improve our model’s predictive accuracy and find evidence that adding multiple topics simultaneously produces approximately linear improvements in accuracy.

Contents

  • baseline: Python code for our baseline prediction model and our predicted outcomes for the 2018 Midterm Elections
  • demographics: supplementary data that was used to augment our model or impute missing data
  • dynamic: an experimental dynamic model of Trump's approval rate
  • plots: exploratory data analysis code
  • psychometric: a collection of models that incorporate various psychometric variables along with the models without psychometric variables that were used as a baseline comparison
  • authoritarianism: Python code for our prediction model that includes identification with authoritarianism as an additional variable and updated predictions for the 2018 Midterm Elections produced by this model
  • report_map: the HTML, CSS, and JavaScript (D3.js) code used for the interactive map of our predictions
  • turnout_adjustment: Python code that modifies the poststratification space by adjusting voter turnout for specific demographic groups along with updated predictions produced by those adjustments

About

Microsoft Research Capstone Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •