Skip to content

omarragi9/Tweets-preprocessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Tweets-preprocessing

Preprocessing for tweets dataset using NLTK.

As we are all know we are in the era of data and most of this data are unstructured and based on article on mongodb :

From 80 to 90 percent of data generated and collected by organizations, is unstructured,, and its volumes are growing rapidly — many times faster than the rate of growth for structured databases.

So part of our work is to handle and clean this data so that it becomes useful and meaningful.

So here is my work as part of my assignment for natural language preprocessing.

I'm beginner so any improvements even a little ones will be appreciated.

Link of the dataset : https://www.kaggle.com/manchunhui/us-election-2020-tweets

Link of the article : https://www.mongodb.com/unstructured-data