Merge pull request #775 from Rahuls66/zomato

pawangeek · web-flow · commit 63812d62745d · 2021-10-19T20:43:01.000+05:30
Zomato Restaurant Scraper
diff --git a/zomato_dinein_restaurant_scraper/Images/Sample.png b/zomato_dinein_restaurant_scraper/Images/Sample.png
diff --git a/zomato_dinein_restaurant_scraper/README.md b/zomato_dinein_restaurant_scraper/README.md
@@ -0,0 +1,31 @@
+# Zomato Dine-in Resaturant Scraper 
+
+This python script scrapes the Name, Cuisine, Area, and Rate for Two details of Dine-in Reaturants from Zomato.
+This script cotnains a user defined function which scrapes the restaurant details from certain city and return a DataFrame of the fetched details.
+
+## Setup instructions
+
+Steps to run the script:
+1. Clone this repository
+2. Download the [Chrome Webdriver](https://chromedriver.chromium.org/downloads) for your current Google Chrome version. Save the downloaded file to the cloned `zomato_dinein_restaurant_scraper` folder.
+3. Install the required dependencies by running the command `pip install -r requirements.txt`
+4. Run the `zomato_scraper.py` file.
+
+## Addtional Information of script
+
+In the Python script, we have scraped the restaurants for `Indore`. One can scrape resaturants of other citties by changing the `url` variable.
+For example, if you want to scrape the Resaturants for `Mumbai`, change the url to `https://www.zomato.com/mumbai/dine-out`.
+With time, number resaturants may increase, thus try increasing the number of iterations in the `for loop` if you think all the resaturants are not fetched.
+
+
+## Output
+
+![Sample](https://user-images.githubusercontent.com/43356237/137814799-c9180b73-0163-4f93-a230-b7fdb0a2b00a.png)
+
+## Author(s)
+
+Rahul Shah
+
+## Disclaimers, if any
+
+Author shall not be responsible for any malpractice done because of this script.
diff --git a/zomato_dinein_restaurant_scraper/requirements.txt b/zomato_dinein_restaurant_scraper/requirements.txt
@@ -0,0 +1,3 @@
+pandas==1.3.2
+selenium==3.141.0
+beautifulsoup4==4.10.0
diff --git a/zomato_dinein_restaurant_scraper/zomato_scraper.py b/zomato_dinein_restaurant_scraper/zomato_scraper.py
@@ -0,0 +1,43 @@
+# -- IMPORTING LIBRARIES --
+import pandas as pd
+from selenium import webdriver
+from bs4 import BeautifulSoup
+import time
+
+
+# -- STARTING CHROME WITH WEBDRIVER --
+browser = webdriver.Chrome()
+
+
+# -- OPENING URL IN BROWSER --
+url = 'https://www.zomato.com/indore/dine-out'
+browser.get(url)
+
+
+# -- ITERATING THROUGH THE PAGE TO GET ALL THE RESTAURANTS --
+for i in range(0, 25):
+    browser.execute_script("window.scrollTo(0, document.body.scrollHeight*0.81);")
+    time.sleep(5)
+    browser.execute_script("window.scrollTo(0, document.body.scrollHeight*0.86);")
+    time.sleep(1)
+
+# -- EXTRACTING PAGE SOURCE --
+html = browser.page_source
+
+# -- CREATING BeautifulSoup OBJECT --
+soup = BeautifulSoup(html, 'html.parser')
+
+
+# -- DEFINING FUNCTION FOR EXTRACTING RESATURANT DETAILS --
+def zomato(soup):
+    name = [i.text.strip() for i in soup.find_all('h4', class_='sc-1hp8d8a-0 sc-dpiBDp iFpvOr')]
+    cuisine = [i.text.strip() for i in soup.find_all('p', class_='sc-1hez2tp-0 sc-hENMEE ffqcCI')]
+    area = [i.text.strip() for i in soup.find_all('p', class_='sc-1hez2tp-0 sc-dCaJBF jughZz')]
+    rate = [i.text.strip() for i in soup.find_all('p', class_='sc-1hez2tp-0 sc-hENMEE crfqyB')]
+    return pd.DataFrame({'Name': name, 'Cuisine': cuisine, 'Area': area, 'Rate for Two': rate})
+
+
+# -- DISPLAYING AND EXPORTING RESULTS --
+df = zomato(soup)
+print(df.head())
+df.to_csv('Zomato Restaurants.csv')

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+pandas==1.3.2`
	`2`	`+selenium==3.141.0`
	`3`	`+beautifulsoup4==4.10.0`