Skip to content

generating some analytics on twitter tweet data, regarding US elections, organized in partitions. This project is part of Big Data Analytics using Spark course from edx.com

Notifications You must be signed in to change notification settings

vicmar57/Tweets-analysis-ETL-pipeline-using-pySpark-

Repository files navigation

Tweets analysis ETL pipeline using PySpark

In this project, we generate some analytics on twitter tweet data, regarding US elections tweets. We parse the JSON data and extract the tweet data, partition the data into groups, count the number of posts from each partition and finally find popular tokens the each partition's tweets.
This project is part of Big Data Analytics using Spark course from edx.com.

Prerequisites

Python Jupyter Notebook

Usage

  1. run as a regular Jupyter notebook

Credits

credit to Big Data Analytics using Spark on edx.com

About

generating some analytics on twitter tweet data, regarding US elections, organized in partitions. This project is part of Big Data Analytics using Spark course from edx.com

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published