Repository for projects completed during the UC Berkeley Masters of Information and Data Science (MIDS) W261 course, "Machine Learning at Scale"

class website: https://www.ischool.berkeley.edu/courses/datasci/261

The course covers fundamental concepts of MapReduce parallel computing, through the eyes of Hadoop, MrJob, and Spark, while diving deep into Spark Core, data frames, the Spark Shell, Spark Streaming, Spark SQL, MLlib, and more, as well as on hands-on algorithmic design and development in parallel computing environments (Spark), developing algorithms (decision tree learning), graph processing algorithms (pagerank/shortest path), gradient descent algorithms (support vectors machines), and matrix factorization.

Projects in this repo included developing the following algorithms from scratch:

Map Reduce in the Command Line
Naive Bayes Implementation in Hadoop
Synonym Detection in PySpark
Distributed Linear Regression in PySpark
Search engine optimization (SEO) using PageRank on Wikipedia pages in PySpark

Then using mllib in Pyspark, I created a ML model that utilizes 2015-2019 airport and weather data (>30,000 million rows of data in HDFS) to predict airline flight delays over two hours in advance of the scheduled departure time with over 70% model accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Final Project_Predicting_Airport_Delays_in_PySpark		Final Project_Predicting_Airport_Delays_in_PySpark
HW1_MapReduce_in_the_CommandLine		HW1_MapReduce_in_the_CommandLine
HW2_NaiveBayes_with_MapReduce_in_Hadoop		HW2_NaiveBayes_with_MapReduce_in_Hadoop
HW3_Synonym_Detection_in_PySpark		HW3_Synonym_Detection_in_PySpark
HW4_Distributed_LinearRegression_in_PySpark		HW4_Distributed_LinearRegression_in_PySpark
HW5_PageRank_with_WikipediaDataset_in_PySpark		HW5_PageRank_with_WikipediaDataset_in_PySpark
README.md		README.md
docker-compose.yaml		docker-compose.yaml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repository for projects completed during the UC Berkeley Masters of Information and Data Science (MIDS) W261 course, "Machine Learning at Scale"

About

Releases

Packages

Languages

lschroyer/W261_Machine_Learning_at_Scale_Projects

Folders and files

Latest commit

History

Repository files navigation

Repository for projects completed during the UC Berkeley Masters of Information and Data Science (MIDS) W261 course, "Machine Learning at Scale"

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages