The Scrapy program from this repo allows the user
to scrap data from multiple Jumia websites simultaneously by running a single command line.
Data is retrieved from sites based in the following countries:
- Kenya
- Nigeria
- Uganda
- Algeria
- Tunisia
- Morocco
- Ivory Coast
- Senegal
Jumia is a Pan-African technology company that is built around a marketplace, logistics service and payment service. The logistics service enables the delivery of packages through a network of local partners while the payment services facilitate the payments of online transactions within Jumia’s ecosystem. It has partnered with more than 100,000 active sellers and individuals and is a direct competitor to Konga in Nigeria and Amazon in Egypt.
- Python
versions3.10
or 3.8
create a virtual environment
virtualenv venv
... activate it
source venv/bin/activate
- Clone the repo
- open JUMIA_INTER folder
- install dependencies
pip install -r requirements.txt
or
pip install scrapy
To scrape all sites simultaneously from the root of the project run :
python run_spider.py
To scrape a single spider :
- from the root :
scrapy crawl <spidername> ex: jumia_kenya or jumia_senegal>
- from a single spider :
got to spiders folder
cd JUMIA_INTER/JUMIA_INTER/spiders
choose your spider and run it :
scrapy runspider jumia_kenya.py