Skip to content

Interactive analytics for Reddit | Real-Time data analysis

Notifications You must be signed in to change notification settings

NetGopher/Real-Time-Data-Analytics

Repository files navigation

Real-Time-Data-Analytics

Interactive analytics for Reddit | Real-Time data analysis. The purpose of this project was to get metrics about Reddit posts in Real time using various technologies such as Angular, Apache Kafka, Spring KStream, Apache Spark, Spring Kafka Spring Webflux with Reactor.
It contains five main components:

  • client which is the front end app using Angular.

  • a web service (Reddit-producer) that calls the Reddit API and gets posts. it then sends it to a Kafka topic. Here is an example of a Rest call to the reddit API: Rapport_stage_d'application67

  • two Consumers (Kafka-Stream-Consumer and Spark-Consumer) that are basically stream processors. These get the data from Kafka as a Stream, and process it in Real-time, producing metrics and statistics that are put back in a Kafka Topic metrics to be consumed later.

  • Spring-Kafka-Reactive-Backend is a service that is connected to the reddit metrics topic and waits sends it to the frontend using a websocket.

Architecture

Rapport_stage_d'application66 Note: Hive wasn't used in this project.

Screenshots

Rapport_stage_d'application Rapport_stage_d'application 4jpg Rapport_stage_d'application3 Rapport_stage_d'application2 Rapport_stage_d'application9 Rapport_stage_d'application8 Rapport_stage_d'application7 Rapport_stage_d'application6 Rapport_stage_d'application5