GitHub - MahdiRahmani80/divar-scraper: with this project you can easily scrap Divar

DivarScraper

1. Clone the Repository

First, we should download the repository (obviously!):

$ git clone https://github.com/MahdiRahmani80/divar-scraper.git

2. Create a Virtual Environment

Next, we typically want to create a virtual environment. On Linux:

$ virtualenv .venv/
$ source .venv/bin/activate
$ pip install -r requirements.txt  # Install required packages

3. Configure the Project

After setting up the environment, we can configure the project in src/config/Setting.py:

SAND_BOX_MODE: If True, the project will run in debug mode with more logs and limited scraping.
USER_AGENT: Set your user agent here. It’s easier to manage it in one place.
IS_URL_UNIQUE_IN_DATA_BASE: If True, the scraper will avoid collecting duplicate entries (as URLs must be unique).
IRAN_CITIES_JSON_PATH: Set the path to Iran’s cities and provinces data to store them in the database

4. Constants

In utils/Constant.py, we have several constants:

DIVAR_ADDON: Use this to add query options to the scraper URL, like IDENTITY_VERIFIED, to filter ads accordingly.
Other settings include scroll speed and behavior during scraping.

5. Main Entry Point: main.py

This is the main part of the program:

asyncio.run(main(
    Setting.DEFAULT_SAVE_METHOD,
    check_page=check_page,
    pages=get_page(start=1, page=2, step=1),
    interval_sec=10,
    max_iterations=None
))

Here, you can configure how your scraper operates. For example, do you want it to always visit pages 1 and 2? Why? Because when you scroll down during scraping, new data might arrive. Rechecking the first pages helps ensure you're collecting fresh data for your dataset. 😊

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
__pycache__		__pycache__
data		data
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DivarScraper

1. Clone the Repository

2. Create a Virtual Environment

3. Configure the Project

4. Constants

5. Main Entry Point: main.py

About

Uh oh!

Releases

Packages

Languages

MahdiRahmani80/divar-scraper

Folders and files

Latest commit

History

Repository files navigation

DivarScraper

1. Clone the Repository

2. Create a Virtual Environment

3. Configure the Project

4. Constants

5. Main Entry Point: main.py

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages