WebScraper-FlaskDB

Project description

The Python script performs web scraping on the "Books to Scrape" website (https://books.toscrape.com/) to extract information about various books, including their names, prices and other features. The data is then stored in SQLite database and later presented through a Flask web application. The coding is done in both .py and .ipynb formats.

Data Collection

The data for this project was collected from 'http://books.toscrape.com/' website through web scraping using the BeautifulSoup library.

Data Processing

The collected data was processed with some basic cleaning actions such as text normalisation , case adjustment and numerical value extraction.

Database

The data was stored in SQlite database which is named as "books.db" which has two tables one containing the data and other contaning the feature details.

Website

The Flask framework, along with the sqlalchemy package, was used to create a simple website containing the tabular representation of scraped data. A simple template was choosen from w3schools for creating the outline of this website. There is also an About page describing data along with variable definitions.

Project Structure

- WebScraper-FlaskDB
    - templates/
    - README.md
    - book_data.csv
    - books_updated.csv
    - books.db
    - data_collection.ipynb
    - data_collection.py
    - data_preprocessing.ipynb
    - data_preprocessing.py
    - database.ipynb
    - database.py
    - features.csv
    - README.md
    - requirements.txt
    - website.ipynb
    - website.py

Code Usage

Setting up the environment

Clone the repository

git clone https://github.com/jibnorge/WebScraper-FlaskDB.git
cd WebScraper-FlaskDB

Create a virtual python environment in anaconda prompt

conda create --name venv python=3.9

Activate the environment and install requirements.txt

conda activate venv
pip install -r requirements.txt

Run the web app or open jupyter notebook and run the website.ipynb file.

python website.py

Conclusion

This project demonstrates the end-to-end process of web scraping, data processing, database management, and website development. The project's aim is to provide a comprehensive understanding of these concepts through a practical implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WebScraper-FlaskDB

Project description

Data Collection

Data Processing

Database

Website

Project Structure

Code Usage

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
templates		templates
README.md		README.md
book_data.csv		book_data.csv
books.db		books.db
books_updated.csv		books_updated.csv
data_collection.ipynb		data_collection.ipynb
data_collection.py		data_collection.py
data_preprocessing.ipynb		data_preprocessing.ipynb
data_preprocessing.py		data_preprocessing.py
database.ipynb		database.ipynb
database.py		database.py
features.csv		features.csv
requirements.txt		requirements.txt
website.ipynb		website.ipynb
website.py		website.py

jibnorge/WebScraper-FlaskDB

Folders and files

Latest commit

History

Repository files navigation

WebScraper-FlaskDB

Project description

Data Collection

Data Processing

Database

Website

Project Structure

Code Usage

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages