Skip to content

MahdiRahmani80/divar-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DivarScraper

دیوار

1. Clone the Repository

First, we should download the repository (obviously!):

$ git clone https://github.com/MahdiRahmani80/divar-scraper.git

2. Create a Virtual Environment

Next, we typically want to create a virtual environment. On Linux:

$ virtualenv .venv/
$ source .venv/bin/activate
$ pip install -r requirements.txt  # Install required packages

3. Configure the Project

After setting up the environment, we can configure the project in src/config/Setting.py:

  • SAND_BOX_MODE: If True, the project will run in debug mode with more logs and limited scraping.
  • USER_AGENT: Set your user agent here. It’s easier to manage it in one place.
  • IS_URL_UNIQUE_IN_DATA_BASE: If True, the scraper will avoid collecting duplicate entries (as URLs must be unique).
  • IRAN_CITIES_JSON_PATH: Set the path to Iran’s cities and provinces data to store them in the database

4. Constants

In utils/Constant.py, we have several constants:

  • DIVAR_ADDON: Use this to add query options to the scraper URL, like IDENTITY_VERIFIED, to filter ads accordingly.
  • Other settings include scroll speed and behavior during scraping.

5. Main Entry Point: main.py

This is the main part of the program:

asyncio.run(main(
    Setting.DEFAULT_SAVE_METHOD,
    check_page=check_page,
    pages=get_page(start=1, page=2, step=1),
    interval_sec=10,
    max_iterations=None
))

Here, you can configure how your scraper operates. For example, do you want it to always visit pages 1 and 2? Why? Because when you scroll down during scraping, new data might arrive. Rechecking the first pages helps ensure you're collecting fresh data for your dataset. 😊

About

with this project you can easily scrap Divar

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published