web-crawler-scrapy

Terminal Commands for running Scrapy

Install package
```
pip install scrapy
```

Setup a scrapy project

cd {where you want to create the project folder}
scrapy startproject {your pj name}

Change dir to the project folder
```
cd {your pj name}
```

Create a spider

scrapy genspider {spider name} {the domain you want to scrape the data from}

# e.g
# scrapy genspider getquotes quotes.toscrape.com

Edit spider in jupyter notebook / spider /

check getbooks.py file for details

Run the spider

scrapy crawl {spider name}
scrapy crawl {spider name} -a address="40-18 48th st" -a borough="4" -o output.csv

getbooks.py

This is a Python script that uses the Scrapy library to scrape book information from the website books.toscrape.com. The script defines a spider called getbooks that navigates to each book page on the website and extracts the title, price, rating, and availability of the book. The spider can save this information to a CSV file named on your own. The script uses CSS selectors to extract data from the website's HTML pages and demonstrates how to use Scrapy's functionality to navigate a website and extract data from it.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
getbooks.py		getbooks.py
getquotes.py		getquotes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

web-crawler-scrapy

getbooks.py

getquotes.py

About

Uh oh!

Releases

Packages

Languages

menghsuanl/web-crawler-scrapy

Folders and files

Latest commit

History

Repository files navigation

web-crawler-scrapy

getbooks.py

getquotes.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages