Big Data Project

Team Members:

Matthew Avallone, Garima Chaudhary, Dinesh Sreekanthan

Objective:

The goal of the project was to gain hands-on experience with multiple steps of the data lifecycle that benefit from big data infrastructure. The project is broken down into two tasks: Data Cleaning/Profiling and Semantic Profiling. All of the datasets used come from the NYC Open Data initiative (https://opendata.cityofnewyork.us/).

The code was run on NYU Hadoop Cluster using Python 3.6.5 and Spark 2.4.0

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
task1		task1
task2		task2
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Project

Team Members:

Matthew Avallone, Garima Chaudhary, Dinesh Sreekanthan

Objective:

About

Releases

Packages

Contributors 2

Languages

mattavallone/Big-Data-Project

Folders and files

Latest commit

History

Repository files navigation

Big Data Project

Team Members:

Matthew Avallone, Garima Chaudhary, Dinesh Sreekanthan

Objective:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages