Skip to content

A principal component analysis in scala and the spark framework to compute and visualize the eigenvectors on an image dataset

Notifications You must be signed in to change notification settings

chrlen/spark-pca

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains an implementation of the principal component analysis in scala and spark. The PCA was part of an easy face-detector trained on the faces in the wild dataset. This implementation was part of a lecture in big-data analytics where a final project with free choice of the topic and used programming languages was mandatory. The implementation was constrained to run on the university cluster running the cloudera distribution of Spark in the version 1.6. This version was old at the time of the project and did not provide any functions to load and decode images. Therefore, the images where converted to grayscale and stored as csv-files with a python script and then loaded as text-files.

Check the file main.scala for the implementation.

About

A principal component analysis in scala and the spark framework to compute and visualize the eigenvectors on an image dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages