Skip to content

A tailored program that scrapes data from the website and updates my DB

License

Notifications You must be signed in to change notification settings

Tezcatlipoca0000/trevi-spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trevi Spider

What it does:

It can execute three distinct functions to populate the local business database with carefully extracted data from the provider's website: - It scrapes the price data from the provider's website. - Retrieves the placed ordered via email to the provider. - Creates a dataframe with the new data when available.

What technologies does it need:

- Python
- google-api-python-client 
- google-auth-httplib2 
- google-auth-oauthlib
- numpy
- Selenium
- Pandas
- openpyxl 

What files does it need (stdin):

- To get placed-order:
	- token.json ~ GMAIL API **git ignored**
	- credentials.json ~ GMAIL.API **git ignored**
	- mygmailaccount.inbox.message_with_pedido.xlsx

- To update:
	- Provedores Todos.xlsm ~ Local database **git ignored**
	- pedido.xlsx ~ Placed-order to provider, retrieved from GMAIL **git ignored**
	- trevi_full.xlsx ~ Scraped data from provider's website **git ignored**

What files does it create (stdout):

- After scraping the data:
	- trevi_full.xlsx ~ Scraped data from provider's website **git ignored**

- After retrieving placed-order:
	- pedido.xlsx ~ Placed-order to provider, retrieved from GMAIL **git ignored**

- After creating an updtaded dataframe:
	- Final.xlsx ~ Updated dataframe with necesary information **git ignored**

What I'm thinking about:

- Uploading all the necessary sample_files. 

About

A tailored program that scrapes data from the website and updates my DB

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages