We carry out sentiment analysis on tweets containing the problems of each major U.S. airline. We do exploratory data analysis, modelling to classify the tweets in positive, negative, and neutral sentiments.
Wanted to apply NLP gained knowledge learnt in Coursera Course (Natural Language Processing in Tensorflow) on real world problem to get exposure to working with Textual data .
Sentiment Analysis and a Classification Problem solved using machine learning algorithms like Logistic Regression, Support Vector Machines, XgBOOST, Random Forests Classifier, Niave Bayes . Cleaned textual tweets using techniques like Lowercasing, Stopword Removal, Lemmatization. Applied word2Vec,by generating word embeddings using pre-trained word embeddings. Trained model and classified .
Result Metrics Checked classification accuracy using Precision, Recall, F1 Score .
- Python 3.6
- Spacy
- Gensim
- NLTK
- Pandas
- Matplotlib
- Sklearn
- Seaborn
- Plotly
Tweets containing the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service").
Link to Dataset : https://www.kaggle.com/crowdflower/twitter-airline-sentiment
https://machinelearningmastery.com/what-are-word-embeddings/ https://towardsdatascience.com/nlp-in-python-data-cleaning-6313a404a470 https://www.analyticsvidhya.com/blog/2020/11/text-cleaning-nltk-library/ https://machinelearningmastery.com/clean-text-machine-learning-python/