Skip to content

COVID-19 Vaccine Tweets sentiment analysis using PySpark and mongoDB.

Notifications You must be signed in to change notification settings

aungkhantmyat/Vaccine-Tweets-Sentiment-Analysis

Repository files navigation

Vaccine-Tweets-Sentiment-Analysis 💉

Brief

The development and distribution of vaccines have become pivotal milestones in our collective journey toward normalcy in an era defined by the COVID-19 pandemic. Tweets become an important real-time hub for public discourse as COVID-19 vaccine discussions unfold across different platforms. With Natural Language Processing (NLP) and data analysis, our project explores this vast and dynamic landscape of Twitter for insights and sentiment patterns related to COVID-19 vaccinations. Sentiment analysis, often referred to as opinion mining, is a technique within natural language processing (NLP) that focuses on deciphering the emotional tone or sentiment expressed in textual data.

Dataset

  • The dataset includes 11020 rows and 17 columns. The data type includes a mix of textual data, categorical data, and numerical data.
  • More info about the dataset can be found here.

System Architecture

Drawing4 (2)

Step By Step Approach

Our sentiment analysis approach involves the following steps:

  1. Data Preprocessing
  2. Accessing sentiment using polarity function of TextBlob
  3. Feature extraction
  4. Splitting Training and Testing Dataset
  5. Applying Machine Learning Models
  6. Fine-tuning Parameters
  7. Evaluation Metrics

Paper

  • Published this project's paper on the ICCR2023 (2023 International Conference Communication and Research), category under the "B4. Medical Informatics and Applications".
  • Can find the paper here.

Also, more about the project information can be found here.

About

COVID-19 Vaccine Tweets sentiment analysis using PySpark and mongoDB.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published