Skip to content

Projects on Big Data, Data Analysis and Sentiment Analysis (implemented in R and Python language using Spark and Hadoop frameworks)

Notifications You must be signed in to change notification settings

deep-mishra/CSE-587-Data_Intensive_Computing

Repository files navigation

CSE-587-Data_Intensive_Computing

Project 1 - Analyse Twitter reaction vs CDC surveillance report on influenza activity in US States. Repo..

The project fetches the tweets on influenza and plots the heat map to compare how twitter reacted on influenza affected States. The complete project is implemented in R with the help of twitteR and geocode API for collecting tweets.

TwtsVsCdsAnalysis.R

HeatMap

Project 2 - Sentiment Analysis on Gun Violence using Hadoop. Repo..

Performed Sentiment analysis of People on gun violence on Twitter data and compared that with NYTimes articles. Hadoop is used to perform the word count and co-occurance of top words in two sets of data. I have used d3 for word-could and python to implement mapper and reducer of Hadoop framework.

TopWordComp.html

Comparison1

CooccurTopWordComp.html

Comparison2

Project 3 - Document Classification using Spark Infrastructure. Repo..

News articles can be from different categories like sports, business, etc. This project uses Spark infrastructure with machine learning to predict the category of articles. The first step is to train our model using the training set, test it, and finally predict the unknow set of articles and evaluate the performance of trained model.

ArticleCollection Python Code

DocumentClassification Python Code

Prediction Result:

Prediction using Random Forest Classification- RandomForestClassification

Prediction using Logistic Regression Model LogisticRegression

About

Projects on Big Data, Data Analysis and Sentiment Analysis (implemented in R and Python language using Spark and Hadoop frameworks)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published