ETL-webscraper

ETL-webscraper is an app that combines a web scraper that scans and grabs "recently added" release data from aboveboard distribution website, and a helper "cleaner" module that reformats data into a required format.

Features

Gather new releases data and save it to a .json file
Clean, reformat and modify data
Save cleaned data to an Excel Spreadsheet

Tech

ETL-webscraper uses a number of open source projects to work properly:

[node.js]
puppeteer
Python3
Pandas
Numpy
Openpyxl

Pre-requisites

You need a B2B account with abboveboarddist (KRD). You need Microsoft Excel to open the output data file.

Installation (MACOS / LINUX)

Update config Files

In ./CONFIG.py update venv_path with a path to your venv you'll use, and update variable "discount" with your actual discount percentage value as a string (ie 50% = "50").

In ./modules/scraper update CONSTS.js file with your own user and password details.

Dependencies

ETL-webscraper requires [Node.js] and Python3 to run.

Install Python3 dependencies...

cd ETL-webscraper
pip3 install -r requirements.txt

...and node dependencies...

cd ./modules/scraper/
npm i

Start the application

...and run the application...

cd ETL-webscraper
python3 scrape.py

Output

Cleaned and transformed Excel file is exported to ./output/DATAFILE.xlsx.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
modules		modules
.gitignore		.gitignore
CONFIG.py		CONFIG.py
README.md		README.md
requirements.txt		requirements.txt
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL-webscraper

Features

Tech

Pre-requisites

Installation (MACOS / LINUX)

Update config Files

Dependencies

Start the application

Output

About

Releases

Packages

Languages

totallycrow/ETL-webscraper

Folders and files

Latest commit

History

Repository files navigation

ETL-webscraper

Features

Tech

Pre-requisites

Installation (MACOS / LINUX)

Update config Files

Dependencies

Start the application

Output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages