The Scrapy program from this repo allows the user to scrap data from multiple Jumia websites simultaneously by running a single command line.
Data is retrieved from sites based in the following countries:
- Kenya
- Nigeria
- Uganda
- Algeria
- Tunisia
- Morocco
- Ivory Coast
- Senegal
Jumia is a Pan-African technology company that is built around a marketplace, logistics service and payment service. The logistics service enables the delivery of packages through a network of local partners while the payment services facilitate the payments of online transactions within Jumia’s ecosystem. It has partnered with more than 100,000 active sellers and individuals and is a direct competitor to Konga in Nigeria and Amazon in Egypt.
- Python
versions3.10 or 3.8
create a virtual environment
virtualenv venv... activate it
source venv/bin/activate- Clone the repo
- open JUMIA_INTER folder
- install dependencies
pip install -r requirements.txt or
pip install scrapyTo scrape all sites simultaneously from the root of the project run :
python run_spider.pyTo scrape a single spider :
- from the root :
scrapy crawl <spidername> ex: jumia_kenya or jumia_senegal>- from a single spider :
got to spiders folder
cd JUMIA_INTER/JUMIA_INTER/spiderschoose your spider and run it :
scrapy runspider jumia_kenya.py .png)