Dealing with the Data that is present on recipeall website and plotting some important graphs
This projects deals with the scraped data from food websites and use that data to plot relevant graphs. Some important distributions includes frequency-rank distributions, recipe size distributions
I use BeautifulSoup for web scraping because it provides a powerful yet simple interface to navigate and extract data from HTML and XML documents. Its flexibility allows to efficiently parse complex web pages, extract relevant information like text, links, and images, and convert them into structured formats. Jupyter Notebook complements this by offering an interactive environment where I can write and execute code in real time, visualize data instantly, and document my thought process. This combination makes it easier to experiment, debug, and refine scraping scripts quickly, while also providing a platform for sharing and presenting results effectively.
Main Components
.ipynb files: These jupyter files contained the code for problem statement.
.csv and .txt file: The output of the data extractions are stored in this files.
git clone https://github.com/akshatrajsaxena/CGAS_Assignment_1.git
Install the neccessary Libraries that is mentioned in the file Q1.ipynb file. The libraries are mentioned below:
pip install requests
pip install html5lib
pip install bs4
pip install pandas
pip install lxml
pip install spacy
pip install scikit-learn spacy
pip install transformers
pip install torch
pip install matplotlib
If you find any issues or have suggestions for improvements, feel free to open an issue or submit a pull request. Contributions are welcome!
If you have any questions or would like to get in touch, you can reach me at Akshat Raj Saxena or Aditya Sharma