Skip to content
This repository has been archived by the owner on May 17, 2021. It is now read-only.
/ tweet-archiveur Public archive
generated from fastai/nbdev_template

Script to store tweets of a list of users in a databases for NLP processing.

License

Notifications You must be signed in to change notification settings

leximpact/tweet-archiveur

Repository files navigation


The repository has now been moved to https://git.leximpact.dev/leximpact/tweet-archiveur/

Pour consulter la dernière version du projet, merci de vous rendre sur https://git.leximpact.dev/leximpact/tweet-archiveur/


Tweet Archiveur

This project aim at storing tweets in a database. But you could use it without database.

  • Input : tweetos id in a CSV file
  • Output : A databases of tweets and hastags

The goal for us is to store tweets of all members of the French Parliament to get an idea of the trendings topics.

But you could use the project for other purpose with other people.

How to install the package

TODO : push it to Pipy when :

pip install tweetarchiveur

How to use the package in your project

There is two class :

  • A Scrapper() to use the Twitter API
  • A Database() to store tweets and hastags in it
from tweet_archiveur.scrapper import Scrapper
from tweet_archiveur.database import Database

# Force some variable outside Docker
from os import environ
environ["DATABASE_PORT"] = '8479'
environ["DATABASE_HOST"] = 'localhost'
environ["DATABASE_USER"] = 'tweet_archiveur_user'
environ["DATABASE_PASS"] = '1234leximpact'
environ["DATABASE_NAME"] = 'tweet_archiveur'

scrapper = Scrapper()
df_users = scrapper.get_users_accounts('../tests/sample-users.csv')
users_id = df_users.twitter_id.tolist()
database = Database()
database.create_tables_if_not_exist()
database.insert_twitter_users(df_users)
scrapper.get_all_tweet_and_store_them(database, users_id[0:2])
del database
del scrapper
2021-03-22 10:21:59,837 -  tweet-archiveur INFO     Scrapper ready
2021-03-22 10:21:59,841 -  tweet-archiveur INFO     Loading database module...
2021-03-22 10:21:59,842 -  tweet-archiveur DEBUG    DEBUG : connect(user=tweet_archiveur_user, password=XXXX, host=localhost, port=8479, database=tweet_archiveur, url=None)
2021-03-22 10:22:03,915 -  tweet-archiveur INFO     Done scrapping, we got 400 tweets from 2 tweetos.

How we use it

We get the tweets of the 577 French Parliament member's every 8 hours and store them in a PostgreSQL database.

We then explore them with Apache Superset.

How we deploy it

Prepare the environment :

git clone https://github.com/leximpact/tweet-archiveur.git
cd tweet-archiveur
cp docker/docker.env .env

Edit the .env to your needs.

Run the application :

docker-compose up -d

To view what's going on :

docker logs tweet-archiveur_tweet_archiveur_1 -f

The script archiveur.py use the package to get the parliament accounts from https://github.com/regardscitoyens/twitter-parlementaires

The parameters is read in a .env file.

It is launched by the entrypoint.sh script every 8 hours.

To stop it :

docker-compose down

The data is kept in a docker volume, to clean them :

docker-compose down -v

What to do with it ?

  • Most used hashtag (per period, per person)
  • Most/Less active user
  • Timeline of
  • NLP Topic detection
  • Word cloud

Annexes

Exit code :

  • 1 : Unknown error when storing tweets
  • 2 : Unknown error getting tweets
  • 3 : Failed more than 3 consecutive times
  • 4 : no env

If one thing fail no tweet will be saved.

status code = 429 : 429 'Too many requests' error is returned when you exceed the maximum number of requests allowed

About

Script to store tweets of a list of users in a databases for NLP processing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published