Skip to content

afnanhaq/political-data-builder

Repository files navigation

political-data-builder

INTRODUCTION VIDEO: https://youtu.be/gscW91zEgXA

Back end code: MAIN_FLASK_SERVER.py

Jupyter notebooks and other files populated too


The primary goal of our website is to generate methodologically sound surrogates for use in research. This entails a clean, sampled, and anonymized dataset that they can use for sophisticated analysis combined with the accessibility of just a few clicks. Through our website, users can draw on datasets personally catered for their needs, that would normally be unattainable. Once the user defines their sample, they are able to interact with a data visualization platform. This platform allows users to have a more hands-on experience with the data through the forms of zooming, plotting, layering, and further querying. They also have the choice of saving this data frame in multiple different forms of files.

Time Frame Submissions

[X] Declare team (Saturday) [X] Idea Video (Sunday 4pm) [X] Submission Video [X] Submission (Friday 1:00pm)

Technology

  • Python
  • React
  • Flask
  • GIS
  • Juypter Notebook
  • ArcGIS REST API -> interactive mapping system from the extracted, sample, aggregated and queried data.
  • Git

Opening Data, Jittering

Goal:

  1. Juypyter notebook
  2. Download and store data (csv, tab file, shapefile)
  • look into efficiency of storing, runtime, etc. and combining the tables
  • Drop columns with NaN values above a certain threshold (~75%)
  • Drop columns that have personal information
  1. Join datasets based on identifiers
  2. Set up ways for sampling and aggregated data sets
  • research for best
  1. Data jittering
  • By surrogates, we mean files that are jittered, aggregated, or obscured enough to have a reasonable level of privacy; files that are true national samples that are random, but also samples that follow other established methodologies.
  1. Interactive Data Visualization and further Data Analysis
  2. KNN Classification
  3. Variance Observation Using PCA
  4. Data Annonomization
  5. Aggeration based on user input
  6. Data Cleaning
  7. Making it usable for the common person

Making data more Attainable

Goal:

  1. Create Website hosted on github
  2. Page for juypter code
  3. Home page with search bars
  • also have data visualization hosting platform for further analysis
  1. Aggregated data to reduce size and time complexity

Bonus Video; Classification and ML Possibilities

https://youtu.be/U_ekKl-uQO8

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •