Skip to content

Commit db61b2d

Browse files
authored
Merge pull request #38 from lenarsaitov/significant-refactor-code
update file name, add suburban type, refactor, update description on …
2 parents 926db71 + 36660c2 commit db61b2d

File tree

9 files changed

+68
-38
lines changed

9 files changed

+68
-38
lines changed

README.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ print(data[0])
2121
```
2222
Preparing to collect information from pages..
2323
The absolute path to the file:
24-
/Users/macbook/some_project/cianparser/cian_parsing_result_flat_sale_1_2_moskva_12_Jan_2024_21_48_43_100892.csv
24+
/Users/macbook/some_project/cianparser/cian_flat_sale_1_2_moskva_12_Jan_2024_21_48_43_100892.csv
2525
2626
The page from which the collection of information begins:
2727
https://cian.ru/cat.php?engine_version=2&p=1&with_neighbors=0&region=1&deal_type=sale&offer_type=flat&room1=1&room2=1
@@ -53,7 +53,7 @@ Total number of parsed offers: 56.
5353
```
5454
### Инициализация
5555
Параметры, используемые при инициализации парсера через функциою CianParser:
56-
* __location__ - локация объявления, к примеру, _Санкт-Петербург_ (для просмотра доступных мест используйте _cianparser.list_locations())_
56+
* __location__ - локация объявления, к примеру, _Москва_ (для просмотра доступных мест используйте _cianparser.list_locations())_
5757
* __proxies__ - прокси (см раздел __Cloudflare, CloudScraper, Proxy__), по умолчанию _None_
5858

5959
### Метод get_flats
@@ -64,13 +64,19 @@ Total number of parsed offers: 56.
6464
* __with_extra_data__ - необходимо ли сбор дополнительных данных, но с кратным продолжительности по времени (см. ниже в __Примечании__), по умолчанию _False_
6565
* __additional_settings__ - дополнительные настройки поиска (см. ниже в __Дополнительные настройки поиска__), по умолчанию _None_
6666

67-
Пример использования данного метода представлен выше
67+
Пример:
68+
```python
69+
import cianparser
70+
71+
moscow_parser = cianparser.CianParser(location="Москва")
72+
data = moscow_parser.get_flats(deal_type="sale", rooms=(1, 2), additional_settings={"start_page":1, "end_page":2})
73+
```
6874

6975
В проекте предусмотрен функционал корректного завершения в случае окончания страниц. По данному моменту, следует изучить раздел __Ограничения__
7076

7177
### Метод get_suburban (сбор объявлений домов/участков/танхаусав итп)
7278
Данный метод принимает следующий аргументы:
73-
* __object_type__ - тип здания, к примеру, дом/дача, часть дома, участок, танхаус _("house", "house-part", "land-plot", "townhouse")_
79+
* __suburban_type__ - тип здания, к примеру, дом/дача, часть дома, участок, танхаус _("house", "house-part", "land-plot", "townhouse")_
7480
* __deal_type__ - тип объявления, к примеру, долгосрочная аренда, продажа _("rent_long", "sale")_
7581
* __with_saving_csv__ - необходимо ли сохранение собираемых данных (в реальном времени в процессе сбора данных) или нет, по умолчанию _False_
7682
* __with_extra_data__ - необходимо ли сбор дополнительных данных, но с кратным продолжительности по времени, по умолчанию _False_
@@ -81,7 +87,7 @@ Total number of parsed offers: 56.
8187
import cianparser
8288

8389
moscow_parser = cianparser.CianParser(location="Москва")
84-
data = moscow_parser.get_suburban(object_type="townhouse", deal_type="sale", with_saving_csv=True, additional_settings={"start_page":1, "end_page":1})
90+
data = moscow_parser.get_suburban(suburban_type="townhouse", deal_type="sale", additional_settings={"start_page":1, "end_page":1})
8591
```
8692

8793
### Метод get_newobjects (сбор даннных по новостройкам)
@@ -93,7 +99,7 @@ data = moscow_parser.get_suburban(object_type="townhouse", deal_type="sale", wit
9399
import cianparser
94100

95101
moscow_parser = cianparser.CianParser(location="Москва")
96-
data = moscow_parser.get_newobjects(with_saving_csv=True)
102+
data = moscow_parser.get_newobjects()
97103
```
98104

99105
### Дополнительные настройки поиска
@@ -227,7 +233,7 @@ __with_saving_csv__ значение ___True___.
227233
#### Пример получаемого файла при вызове метода __get_flats__ с __with_extra_data__ = __True__:
228234

229235
```bash
230-
cian_parsing_result_flat_sale_1_1_moskva_12_Jan_2024_22_29_48_117413.csv
236+
cian_flat_sale_1_1_moskva_12_Jan_2024_22_29_48_117413.csv
231237
```
232238
| author | author_type | url | location | deal_type | accommodation_type | floor | floors_count | rooms_count | total_meters | price_per_m2 | price | year_of_construction | object_type | house_material_type | heating_type | finish_type | living_meters | kitchen_meters | phone | district | street | house_number | underground | residential_complex
233239
| ------ | ------ | ------ | ------ | ------ | ------ | ----------- | ---- | ---- | --------- | ------------------ | ----- | ------------ | ----------- | ------------ | --------------- | ----------- | ----------- | -------------------- | --- | --- | --- | --- | --- | ---
@@ -238,7 +244,7 @@ cian_parsing_result_flat_sale_1_1_moskva_12_Jan_2024_22_29_48_117413.csv
238244
#### Пример получаемого файла при вызове метода __get_suburban__ с __with_extra_data__ = __True__:
239245

240246
```bash
241-
cian_parsing_result_suburban_sale_15_15_moskva_13_Jan_2024_04_30_47_963046.csv
247+
cian_suburban_townhouse_sale_15_15_moskva_13_Jan_2024_04_30_47_963046.csv
242248
```
243249
| author | author_type | url | location | deal_type | accommodation_type | price | year_of_construction | house_material_type | land_plot | land_plot_status | heating_type | gas_type | water_supply_type | sewage_system | bathroom | living_meters | floors_count | phone | district | underground | street | house_number
244250
| ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ------------ | ----------- | ------------ | --------------- | ----------- | ----------- | -------------------- | --- | --- | --- | --- | --- | ---
@@ -249,7 +255,7 @@ cian_parsing_result_suburban_sale_15_15_moskva_13_Jan_2024_04_30_47_963046.csv
249255
#### Пример получаемого файла при вызове метода __get_newobjects__:
250256

251257
```bash
252-
cian_parsing_result_newobject_sale_1_1_moskva_13_Jan_2024_01_27_32_734734.csv
258+
cian_newobject_13_Jan_2024_01_27_32_734734.csv
253259
```
254260
| name | location | accommodation_type | url | full_location_address | year_of_construction | house_material_type | finish_type | ceiling_height | class | parking_type | floors_from | floors_to | builder
255261
| ----- | ------------ | ----------- | ------------ | --------------- | ----------- | ----------- | -------------------- | --- | --- | --- | --- | --- | ---

cianparser/base.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,7 @@
11
import math
22
import csv
3-
import pathlib
4-
from datetime import datetime
5-
import transliterate
63

7-
from cianparser.constants import FILE_NAME_BASE, SPECIFIC_FIELDS_FOR_RENT_LONG, SPECIFIC_FIELDS_FOR_RENT_SHORT, SPECIFIC_FIELDS_FOR_SALE
4+
from cianparser.constants import SPECIFIC_FIELDS_FOR_RENT_LONG, SPECIFIC_FIELDS_FOR_RENT_SHORT, SPECIFIC_FIELDS_FOR_SALE
85

96

107
class BaseListPageParser:
@@ -41,9 +38,13 @@ def is_rent_short(self):
4138
return self.deal_type == "rent" and self.rent_period_type == 2
4239

4340
def build_file_path(self):
44-
now_time = datetime.now().strftime("%d_%b_%Y_%H_%M_%S_%f")
45-
file_name = FILE_NAME_BASE.format(self.accommodation_type, self.deal_type, self.start_page, self.end_page, transliterate.translit(self.location_name.lower(), reversed=True), now_time)
46-
return pathlib.Path(pathlib.Path.cwd(), file_name.replace("'", ""))
41+
pass
42+
43+
def define_average_price(self, price_data):
44+
if "price" in price_data:
45+
self.average_price = (self.average_price * self.count_parsed_offers + price_data["price"]) / self.count_parsed_offers
46+
elif "price_per_month" in price_data:
47+
self.average_price = (self.average_price * self.count_parsed_offers + price_data["price_per_month"]) / self.count_parsed_offers
4748

4849
def print_parse_progress(self, page_number, count_of_pages, offers, ind):
4950
total_planed_offers = len(offers) * count_of_pages

cianparser/cianparser.py

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -118,22 +118,22 @@ def get_flats(self, deal_type: str, rooms, with_saving_csv=False, with_extra_dat
118118
additional_settings=additional_settings))
119119
return self.__parser__.result
120120

121-
def get_suburban(self, object_type: str, deal_type: str, with_saving_csv=False, with_extra_data=False, additional_settings=None):
121+
def get_suburban(self, suburban_type: str, deal_type: str, with_saving_csv=False, with_extra_data=False, additional_settings=None):
122122
"""
123123
Parse information of suburbans from cian website
124124
Examples:
125125
>>> moscow_parser = cianparser.CianParser(location="Москва")
126-
>>> data = moscow_parser.get_suburbans(object_type="house",deal_type="rent_long")
127-
>>> data = moscow_parser.get_suburbans(object_type="house",deal_type="rent_short", with_saving_csv=True)
128-
>>> data = moscow_parser.get_suburbans(object_type="townhouse",deal_type="sale", additional_settings={"start_page": 1, "end_page": 1, "sort_by":"price_from_min_to_max"})
129-
:param object_type: type of object, e.g. "house", "house-part", "land-plot", "townhouse"
126+
>>> data = moscow_parser.get_suburbans(suburban_type="house",deal_type="rent_long")
127+
>>> data = moscow_parser.get_suburbans(suburban_type="house",deal_type="rent_short", with_saving_csv=True)
128+
>>> data = moscow_parser.get_suburbans(suburban_type="townhouse",deal_type="sale", additional_settings={"start_page": 1, "end_page": 1, "sort_by":"price_from_min_to_max"})
129+
:param suburban_type: type of suburban building, e.g. "house", "house-part", "land-plot", "townhouse"
130130
:param deal_type: type of deal, e.g. "rent_long", "rent_short", "sale"
131131
:param with_saving_csv: is it necessary to save data in csv, default False
132132
:param with_extra_data: is it necessary to collect additional data (but with increasing time duration), default False
133133
:param additional_settings: additional settings such as min_price, sort_by and others, default None
134134
"""
135135

136-
__validation_get_suburban__(deal_type=deal_type, object_type=object_type)
136+
__validation_get_suburban__(suburban_type=suburban_type, deal_type=deal_type)
137137
deal_type, rent_period_type = __define_deal_type__(deal_type)
138138
self.__parser__ = SuburbanListPageParser(
139139
session=self.__session__,
@@ -143,10 +143,11 @@ def get_suburban(self, object_type: str, deal_type: str, with_saving_csv=False,
143143
with_saving_csv=with_saving_csv,
144144
with_extra_data=with_extra_data,
145145
additional_settings=additional_settings,
146+
object_type=suburban_type,
146147
)
147148
self.__run__(
148149
__build_url_list__(location_id=self.__location_id__, deal_type=deal_type, accommodation_type="suburban",
149-
rooms=None, rent_period_type=rent_period_type, object_type=object_type,
150+
rooms=None, rent_period_type=rent_period_type, suburban_type=suburban_type,
150151
additional_settings=additional_settings))
151152
return self.__parser__.result
152153

@@ -213,18 +214,18 @@ def __validation_get_flats__(deal_type, rooms):
213214
f'It is correct int, str and tuple types. Example 1, (1,3, "studio"), "studio, "all".')
214215

215216

216-
def __validation_get_suburban__(deal_type, object_type):
217+
def __validation_get_suburban__(suburban_type, deal_type):
218+
if suburban_type not in OBJECT_SUBURBAN_TYPES.keys():
219+
raise ValueError(f'You entered suburban_type={suburban_type}, which is not valid value. '
220+
f'Try entering one of these values: "house", "house-part", "land-plot", "townhouse".')
221+
217222
if deal_type not in DEAL_TYPES:
218223
raise ValueError(f'You entered deal_type={deal_type}, which is not valid value. '
219224
f'Try entering one of these values: "rent_long", "sale".')
220225

221-
if object_type not in OBJECT_SUBURBAN_TYPES.keys():
222-
raise ValueError(f'You entered object_type={object_type}, which is not valid value. '
223-
f'Try entering one of these values: "house", "house-part", "land-plot", "townhouse".')
224-
225226

226227
def __build_url_list__(location_id, deal_type, accommodation_type, rooms=None, rent_period_type=None,
227-
object_type=None, additional_settings=None):
228+
suburban_type=None, additional_settings=None):
228229
url_builder = URLBuilder(accommodation_type == "newobject")
229230
url_builder.add_location(location_id)
230231
url_builder.add_deal_type(deal_type)
@@ -236,8 +237,8 @@ def __build_url_list__(location_id, deal_type, accommodation_type, rooms=None, r
236237
if rent_period_type is not None:
237238
url_builder.add_rent_period_type(rent_period_type)
238239

239-
if object_type is not None:
240-
url_builder.add_object_suburban_type(object_type)
240+
if suburban_type is not None:
241+
url_builder.add_object_suburban_type(suburban_type)
241242

242243
if additional_settings is not None:
243244
url_builder.add_additional_settings(additional_settings)

cianparser/constants.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@
88

99
FLOATS_NUMBERS_REG_EXPRESSION = r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?"
1010

11-
FILE_NAME_BASE = 'cian_parsing_result_{}_{}_{}_{}_{}_{}.csv'
11+
FILE_NAME_FLAT_FORMAT = 'cian_{}_{}_{}_{}_{}_{}.csv'
12+
FILE_NAME_SUBURBAN_FORMAT = 'cian_{}_{}_{}_{}_{}_{}_{}.csv'
13+
FILE_NAME_NEWOBJECT_FORMAT = 'cian_{}_{}_{}.csv'
1214

1315
BASE_URL = "https://cian.ru"
1416
DEFAULT_POSTFIX_PATH = "/cat.php?"

cianparser/flat_list.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,21 @@
11
import bs4
22
import time
3+
import pathlib
4+
from datetime import datetime
5+
from transliterate import translit
36

7+
from cianparser.constants import FILE_NAME_FLAT_FORMAT
48
from cianparser.helpers import union_dicts, define_author, define_location_data, define_specification_data, define_deal_url_id, define_price_data
59
from cianparser.flat import FlatPageParser
610
from cianparser.base import BaseListPageParser
711

812

913
class FlatListPageParser(BaseListPageParser):
14+
def build_file_path(self):
15+
now_time = datetime.now().strftime("%d_%b_%Y_%H_%M_%S_%f")
16+
file_name = FILE_NAME_FLAT_FORMAT.format(self.accommodation_type, self.deal_type, self.start_page, self.end_page, translit(self.location_name.lower(), reversed=True), now_time)
17+
return pathlib.Path(pathlib.Path.cwd(), file_name.replace("'", ""))
18+
1019
def parse_list_offers_page(self, html, page_number: int, count_of_pages: int, attempt_number: int):
1120
list_soup = bs4.BeautifulSoup(html, 'html.parser')
1221

@@ -55,6 +64,7 @@ def parse_offer(self, offer):
5564
time.sleep(4)
5665

5766
self.count_parsed_offers += 1
67+
self.define_average_price(price_data=price_data)
5868
self.result_set.add(define_deal_url_id(common_data["url"]))
5969
self.result.append(union_dicts(author_data, common_data, specification_data, price_data, page_data, location_data))
6070

cianparser/newobject_list.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44
import csv
55
import pathlib
66
from datetime import datetime
7-
import transliterate
7+
from transliterate import translit
88
import urllib.parse
99

10-
from cianparser.constants import FILE_NAME_BASE
10+
from cianparser.constants import FILE_NAME_NEWOBJECT_FORMAT
1111
from cianparser.helpers import union_dicts
1212
from cianparser.newobject import NewObjectPageParser
1313

@@ -30,7 +30,7 @@ def __init__(self, session, location_name: str, with_saving_csv=False):
3030

3131
def build_file_path(self):
3232
now_time = datetime.now().strftime("%d_%b_%Y_%H_%M_%S_%f")
33-
file_name = FILE_NAME_BASE.format(self.accommodation_type, self.deal_type, self.start_page, self.end_page, transliterate.translit(self.location_name.lower(), reversed=True), now_time)
33+
file_name = FILE_NAME_NEWOBJECT_FORMAT.format(self.accommodation_type, translit(self.location_name.lower(), reversed=True), now_time)
3434
return pathlib.Path(pathlib.Path.cwd(), file_name.replace("'", ""))
3535

3636
def print_parse_progress(self, page_number, count_of_pages, offers, ind):

cianparser/suburban_list.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,21 @@
11
import bs4
22
import time
3+
import pathlib
4+
from datetime import datetime
5+
from transliterate import translit
36

7+
from cianparser.constants import FILE_NAME_SUBURBAN_FORMAT
48
from cianparser.helpers import union_dicts, define_author, parse_location_data, define_price_data, define_deal_url_id
59
from cianparser.suburban import SuburbanPageParser
610
from cianparser.base import BaseListPageParser
711

812

913
class SuburbanListPageParser(BaseListPageParser):
14+
def build_file_path(self):
15+
now_time = datetime.now().strftime("%d_%b_%Y_%H_%M_%S_%f")
16+
file_name = FILE_NAME_SUBURBAN_FORMAT.format(self.accommodation_type, self.object_type, self.deal_type, self.start_page, self.end_page, translit(self.location_name.lower(), reversed=True), now_time)
17+
return pathlib.Path(pathlib.Path.cwd(), file_name.replace("'", ""))
18+
1019
def parse_list_offers_page(self, html, page_number: int, count_of_pages: int, attempt_number: int):
1120
list_soup = bs4.BeautifulSoup(html, 'html.parser')
1221

@@ -39,7 +48,7 @@ def parse_offer(self, offer):
3948
common_data["location"] = self.location_name
4049
common_data["deal_type"] = self.deal_type
4150
common_data["accommodation_type"] = self.accommodation_type
42-
common_data["object_type"] = self.object_type
51+
common_data["suburban_type"] = self.object_type
4352

4453
author_data = define_author(block=offer)
4554
location_data = parse_location_data(block=offer)
@@ -55,6 +64,7 @@ def parse_offer(self, offer):
5564
time.sleep(4)
5665

5766
self.count_parsed_offers += 1
67+
self.define_average_price(price_data=price_data)
5868
self.result_set.add(define_deal_url_id(common_data["url"]))
5969
self.result.append(union_dicts(author_data, common_data, price_data, page_data, location_data))
6070

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[metadata]
22
name = cianparser
3-
version = 1.0.0
3+
version = 1.0.1
44
description = Parser information from Cian website
55
url = https://github.com/lenarsaitov/cianparser
66
author = Lenar Saitov

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
setup(
88
name='cianparser',
9-
version='1.0.0',
9+
version='1.0.1',
1010
description='Parser information from Cian website',
1111
url='https://github.com/lenarsaitov/cianparser',
1212
author='Lenar Saitov',

0 commit comments

Comments
 (0)