Twitter is an online news and social networking site where people communicate in short messages called tweets. The data provided by Twitter, and the insights we're able to glean from them, can be truly world-changing, in more ways than most people realize. Considering the importance of tweets in daily life and the amount of data they provide, it would be interesting to develop a solution that can provide some knowledge about regional thoughts.
For this purpose, we chose to create this solution that allows to users, based on geo-tagged tweets, to have an idea about the most discussed subjects in any area (in this example, we focused on USA as study area, considering that most tweets are in english), the polarity of tweets, tweets shared in a specific time and other information.
- Get Following Statistics For Bounding Box
- State Name | Total Tweets | Area of bbox
- Tweets intersection OSM Roads
- Wordcloud
- Sentiment over time
- Device Usage
- Sentiment For 5 Time Durations For USA
- Sample Location of Positive/Negative/Neutral Tweets
- Firstly, to get the geo-tagged tweets, the user need some keys provided by twitter (see "config.ini" file);
- We used twitter API to get the tweets we need ("twitter_api.py" file);
- we proceeded by cleaning the data (organised in a DataFrame) and convert it to a GeoDataFrame (with the geometry column);
- we used "KeyBert/Yake" as a library for Natural Language Processing in Python to extract keywords from tweets and store them in the column "keywords";
- we proceeded then by a sentimental analysis: we calculated the subjectivity and the polarity of tweets using "textblob" library in python and according to the resulting values we filtred the tweets by "positive", "negative" and "neutral";
- we did some analysis to understand the results
- we have established a connection with a spatial database under PostgreSQL;
- The next step was the SQL queries to extract keywords and polarity, calculate the number of people tweeting using each device (Android/Iphone) inside a polygon chosen by the user. the definition of the dominant polarity and the wordcloud were done using python;
- Also using SQL we created queries that allow to select from the database the tweets based on time of sharing or their polarity;
- The visualisation of the interactive map was done using HTML and Javascript.
The next Figure resume the steps of the creation of this solution:
Figure 1. Steps of the creation of this solution
- Postgres 14.1
- Python 3.10
Configure the following API Keys in config.ini
for twitter connection using Tweepy
[twitter]
api_key =
api_key_secret =
access_token =
access_token_secret =
Following parameters can be configure in init.py
and app.py
database = "gps"
user = "postgres"
password = "postgres"
host = "localhost"
port = 5432
table_name = "geo_tweets"
Setup Python Environment
git clone https://github.com/mareyam0/Regional-Thoughts
conda install -n py10 python=3.10
conda activate py10
pip3 install -r requirements.txt
Load Data
cd Regional-Thoughts
python init.py
Launch Application
set FLASK_APP=app.py (Windows)
export FLASK_APP=app.py (Linux)
flask run
View Web Page
http://localhost:5000
Jaskaran Singh PURI
Master's degree in Geospatial Technologies at NOVA University of Lisbon, WWU Münster and UJI
Master's degree in Geospatial Technologies at NOVA University of Lisbon, WWU Münster and UJI
Maryeme Akhatar
Master's degree in Geospatial Technologies at NOVA University of Lisbon, WWU Münster and UJI