Skip to content

This project will focus on using Natural Language Processing (NLP) techniques to find broad trends in the written thoughts of the customers. The goal in this project is to predict whether customers recommend the product they purchased using the information in their review text. One of the challenges in this project is to extract useful informati…

Notifications You must be signed in to change notification settings

RaseemAhamed/Sentiment-Analysis-on-Women-s-Clothes-Reviews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

Sentiment-Analysis-on-Women-s-Clothes-Reviews

Capstone Project on E-commerce

ABSTRACT

In today's fields of education and literature, electronic commerce (e-commerce) is a well-known subject. Excellent research on e-commerce focuses predominantly on developing economies. Customer behaviour analysis is gaining more prominence as consumers move from visiting retail shops to shopping online from the e-commerce view.

In this Dataset, review is taken into account in the case of sentiment analysis to evaluate, implement and analyse the most effective techniques. In numerous research ventures, including in the area of text classification and information retrieval, probabilistic models currently play a major role.Online shopping seems to become a central platform for shoppers with the development of e-commerce, and it is necessary to forecast the scale of e-commerce transactions.

This project will focus on using Natural Language Processing (NLP) techniques to find broad trends in the written thoughts of the customers. The goal in this project is to predict whether customers recommend the product they purchased using the information in their review text.

One of the challenges in this project is to extract useful information from the "Review Text" variable using text mining techniques. The other challenge is that we need to convert text files into numeric feature vectors to run machine learning algorithms.

1.1 TITLE & OBJECTIVE OF THE PROJECT

The project titled “SENTIMENT ANALYSIS ON WOMEN'S CLOTHES REVIEWS” is under category “E-commerce”, This project will focus on using Natural Language Processing (NLP) techniques to find broad trends in the written thoughts of the customers. The goal in this project is to predict whether customers recommend the product they purchased using the information in their review text.which is a binary classification done by using ML-Supervised classification algorithms and Deep Learning Algorithm for predicting.

OBJECTIVE: One of the challenges in this project is to extract useful information from the "Review Text" variable using text mining techniques.  Implementation of supervised ML classification Algorithms Implementation of Deep Learning on RNN with Gated Recurrent Unit

1.2 NEED OF THE PROJECT

In today's fields of education and literature, electronic commerce (e-commerce) is a well-known subject. Excellent research on e-commerce focuses predominantly on developing economies. Customer behaviour analysis is gaining more prominence as consumers move from visiting retail shops to shopping online from the e-commerce view. In this article, review is taken into account in the case of sentiment analysis to evaluate, implement and analyse the most effective techniques. In numerous research ventures, including in the area of text classification and information retrieval, probabilistic models currently play a major role.Online shopping seems to become a central platform for shoppers with the development of e-commerce, and it is necessary to forecast the scale of e-commerce transactions.

1.3 PROBLEM STATEMENT

In this context, the basic goal of this project is to predict whether customers, especially assumed as women, recommend the product they purchased using the information in their Review Text. Especially, it should be noted that the expectation in this project is to use only the "Review Text" variable and neglect the other ones. Of course, if you want, you can work on other variables individually.

The data is a collection of 22641 Rows and 10 column variables. Each row includes a written comment as well as additional customer information. Also each row corresponds to a customer review, and includes the variables. Because this is real commercial data, it has been anonymized, and references to the company in the review text and body have been replaced with "retailer".

1.4 DATASET DESCRIPTION

This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables: Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed. Age: Positive Integer variable of the reviewers age. Title: String variable for the title of the review. Review Text: String variable for the review body. Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best. Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended. Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive. Division Name: Categorical name of the product high level division. Department Name: Categorical name of the product department name. Class Name: Categorical name of the product class name.

1.5 ANALYTICS TOOLS

Python Notebook (Jupyter Notebook/ google Collab)

1.6 ANALYTICS APPROACH

Machine Learning Algorithms:

Model Build with Logistic Regression Model Build with Naive Bayes Model Build with Support Vector Machine (SVM) Model Build with Random Forest Classifier (RF) Model Build with Adaboost (AB)

Deep Learning Algorithm:

Model Build with Recurrent Neural Network (RNN) - Gated Recurrent Unit (GRU)

CONCLUSION

In this project we have used sentiment analysis to determine whether the product is recommended or not. We have used different machine learning algorithms to get more accurate predictions and deep learning algorithm for comparing it with machine learning models. The following classification algorithms have been used: Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Random Forest and Ada Boosting. In general, when we compare the models it's hard to decide which model can be picked up among the ones that have been sharing top 5 since their scores are very close to each other. However, Ada Boosting, Naive Bayes, Deep Learning, SVM and LR's scores interchangeably look like better than other models' scores. There is no simple answer to the question of which one is better; each work better in different data sets and conditions. Each modelling algorithm has some pros and cons to each other. So we could select one of these algorithms in consistent with what we need, accuracy or precision.

About

This project will focus on using Natural Language Processing (NLP) techniques to find broad trends in the written thoughts of the customers. The goal in this project is to predict whether customers recommend the product they purchased using the information in their review text. One of the challenges in this project is to extract useful informati…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published