Tweets analysis ETL pipeline using PySpark

In this project, we generate some analytics on twitter tweet data, regarding US elections tweets. We parse the JSON data and extract the tweet data, partition the data into groups, count the number of posts from each partition and finally find popular tokens the each partition's tweets.
This project is part of Big Data Analytics using Spark course from edx.com.

Prerequisites

Python Jupyter Notebook

Usage

run as a regular Jupyter notebook

Credits

credit to Big Data Analytics using Spark on edx.com

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
tweet analysis using spark.ipynb		tweet analysis using spark.ipynb
tweet-files-10mb.txt		tweet-files-10mb.txt
users-partition.pickle		users-partition.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweets analysis ETL pipeline using PySpark

Prerequisites

Usage

Credits

About

Releases

Packages

Languages

vicmar57/Tweets-analysis-ETL-pipeline-using-pySpark-

Folders and files

Latest commit

History

Repository files navigation

Tweets analysis ETL pipeline using PySpark

Prerequisites

Usage

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages