Skip to content

Commit 7a09eeb

Browse files
Merge pull request #407 from vaishali-sharma-20/vaishali-sharma-20-webscraping
Webscraping with Beautifulsoup
2 parents 6c3cdbc + 6b1e52a commit 7a09eeb

File tree

3 files changed

+789
-0
lines changed

3 files changed

+789
-0
lines changed
Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
<!DOCTYPE html>
2+
<html lang="en">
3+
<head>
4+
<meta charset="utf-8" />
5+
<meta
6+
name="viewport"
7+
content="width=device-width, initial-scale=1, shrink-to-fit=no" />
8+
<meta
9+
name="description"
10+
content="article on using Beautiful soup Library. A web scraping application using Python's Beautiful Soup library. Extract data from a website and present it in a structured format (e.g., CSV, JSON)." />
11+
<meta
12+
name="keywords"
13+
content="Web Scraping, python, Beautifulsoup, web scraping with beautifulsoup, GitHub, Student Developer Pack, apply, student benefits, developer tools, software resources, education, student discount, GitHub Education, student developer, open source, coding" />
14+
<title>Example of Web Scraping application with Beautifulsoup - CSEdge</title>
15+
<meta name="vaishali-sharma" content="CSEdge" />
16+
<!-- Favicon-->
17+
<link
18+
rel="icon"
19+
type="image/x-icon"
20+
href="https://csedge.courses/Images/CSEDGE-LOGO32X32.png" />
21+
<!-- Core theme CSS (includes Bootstrap)-->
22+
<link href="../styles.css" rel="stylesheet" />
23+
</head>
24+
25+
<body>
26+
<!-- Responsive navbar-->
27+
<nav class="navbar navbar-expand-lg navbar-dark bg-dark">
28+
<div class="container">
29+
<img
30+
height="32px"
31+
width="32px"
32+
src="https://csedge.courses/Images/CSEDGE-LOGO32X32.png"
33+
alt="logo" />
34+
<a class="navbar-brand" href="../.././index.html">CSEdge Learn</a>
35+
<button
36+
class="navbar-toggler"
37+
type="button"
38+
data-bs-toggle="collapse"
39+
data-bs-target="#navbarSupportedContent"
40+
aria-controls="navbarSupportedContent"
41+
aria-expanded="false"
42+
aria-label="Toggle navigation">
43+
<span class="navbar-toggler-icon"></span>
44+
</button>
45+
<div class="collapse navbar-collapse" id="navbarSupportedContent">
46+
<ul class="navbar-nav ms-auto mb-2 mb-lg-0">
47+
<li class="nav-item">
48+
<a class="nav-link" href="https://learn.csedge.courses">Home</a>
49+
</li>
50+
<li class="nav-item">
51+
<a class="nav-link" href="https://csedge.courses/about">About</a>
52+
</li>
53+
<li class="nav-item">
54+
<a class="nav-link" href="https://csedge.courses#contact"
55+
>Contact</a
56+
>
57+
</li>
58+
<li class="nav-item">
59+
<a
60+
class="nav-link active"
61+
aria-current="page"
62+
href="https://learn.csedge.courses"
63+
>Blog</a>
64+
</li>
65+
</ul>
66+
</div>
67+
</div>
68+
</nav>
69+
70+
<div class="container mt-5">
71+
<div class="row">
72+
<!-- Blog entries-->
73+
<div class="col-lg-8">
74+
<h1>Building a Web Scraping Application with BeautifulSoup</h1>
75+
<!-- Featured blog post-->
76+
<div class="card mb-4">
77+
<img class="card-img-top" src="learn.csedge.courses/posts/images/Monitors.png" alt="WebScraping " />
78+
<div class="card-body">
79+
<main class="container card mb-6">
80+
<section>
81+
Web scraping is the process of extracting data from websites. It allows you to gather information from various web pages and present it in a structured format. In this article, we’ll explore how to create a simple web scraping application using Python and the BeautifulSoup library.
82+
<h3>Prerequisites</h3>
83+
Before we begin, make sure you have the following installed:
84+
<ul>
85+
<li><strong>Python:</strong> You’ll need Python installed on your system. You can download it from the official Python website.</li>
86+
<li><strong>BeautifulSoup: </strong> Install BeautifulSoup using pip:<br>
87+
<code>pip install beautifulsoup4</code></li>
88+
</ul>
89+
<h3>Steps to Create the Web Scraping Application</h3>
90+
<ul>
91+
<li><strong>Choose a Website to Scrape: </strong> Decide which website you want to scrape. For this example, let’s scrape product information from an e-commerce site.</li>
92+
<li><strong>Inspect the HTML Structure: </strong> Open the website in your browser and inspect the HTML structure. Identify the elements (tags, classes, or IDs) that contain the data you want to extract.</li>
93+
<li><strong>Write Python Code: </strong> Create a Python script to fetch the HTML content of the webpage and parse it using BeautifulSoup. Here’s a basic example:
94+
<br>
95+
96+
<code>import requests<br>
97+
&nbsp; from bs4 import BeautifulSoup<br>
98+
<br>
99+
&nbsp; # URL of the website to scrape<br>
100+
&nbsp; &nbsp; url = 'https://example.com/products'<br>
101+
<br>
102+
&nbsp; # Send an HTTP request to the website<br>
103+
&nbsp; &nbsp; response = requests.get(url)<br>
104+
<br>
105+
&nbsp; # Parse the HTML content<br>
106+
&nbsp; &nbsp; soup = BeautifulSoup(response.content, 'html.parser')<br>
107+
<br>
108+
&nbsp; # Find relevant elements (e.g., product names, prices)<br>
109+
&nbsp; &nbsp; product_names = soup.find_all('h2', class_='product-name')<br>
110+
&nbsp; &nbsp; product_prices = soup.find_all('span', class_='product-price')<br>
111+
<br>
112+
&nbsp; # Extract data and store it (e.g., in a CSV or JSON file)<br>
113+
&nbsp; &nbsp; for name, price in zip(product_names, product_prices):<br>
114+
&nbsp; &nbsp; print(f"Product: {name.text.strip()}, Price: {price.text.strip()}")<br>
115+
<br>
116+
&nbsp; # You can save this data to a CSV or JSON file<br></code></li>
117+
<li><strong>Run the Script: </strong> Execute your Python script, and it will scrape the product information from the specified website.</li>
118+
<li><strong>Data Storage: </strong> Depending on your requirements, you can store the extracted data in a CSV file, JSON file, or a database.</li>
119+
</ul>
120+
</section>
121+
<section>
122+
<h4>Conclusion:</h4>
123+
<p>Web scraping with BeautifulSoup is a powerful technique for extracting data from websites. Remember to respect the website’s terms of use and robots.txt file.<br> Happy scraping!</p>
124+
</section>
125+
</main>
126+
</div>
127+
</div>
128+
</div>
129+
<!-- Side widgets-->
130+
<div class="col-lg-4">
131+
<!-- Search widget-->
132+
<div class="card mb-4">
133+
<div class="card-header">Search</div>
134+
<div class="card-body">
135+
<div class="input-group">
136+
<input
137+
class="form-control"
138+
type="text"
139+
id="searchInput"
140+
placeholder="Enter search term..."
141+
aria-label="Enter search term..."
142+
aria-describedby="button-search" />
143+
<button
144+
class="btn btn-primary"
145+
id="button-search"
146+
type="button"
147+
onclick="search()">
148+
Go!
149+
</button>
150+
</div>
151+
</div>
152+
<!-- Search Results -->
153+
<div id="searchResults"></div>
154+
</div>
155+
<!-- Categories widget-->
156+
<div class="card mb-4">
157+
<div class="card-header">Categories</div>
158+
<div class="card-body">
159+
<div class="row">
160+
<div class="col-sm-6">
161+
<ul class="list-unstyled mb-0">
162+
<li><a href="#!">Web Design</a></li>
163+
<li><a href="#!">HTML</a></li>
164+
<li><a href="#!">Freebies</a></li>
165+
</ul>
166+
</div>
167+
<div class="col-sm-6">
168+
<ul class="list-unstyled mb-0">
169+
<li><a href="#!">JavaScript</a></li>
170+
<li><a href="#!">CSS</a></li>
171+
<li><a href="#!">Tutorials</a></li>
172+
</ul>
173+
</div>
174+
</div>
175+
</div>
176+
</div>
177+
<!-- Side widget-->
178+
<div class="card mb-4">
179+
<div class="card-header">Recent Posts</div>
180+
<div class="card-body">
181+
<p>Coming Soon..!</p>
182+
</div>
183+
</div>
184+
<div class="card mb-4">
185+
<div class="card-body">
186+
<script
187+
async
188+
src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8930077947690409"
189+
crossorigin="anonymous"></script>
190+
<ins
191+
class="adsbygoogle"
192+
style="display: block"
193+
data-ad-format="fluid"
194+
data-ad-layout-key="-fb+5w+4e-db+86"
195+
data-ad-client="ca-pub-8930077947690409"
196+
data-ad-slot="9866674087"></ins>
197+
<script>
198+
(adsbygoogle = window.adsbygoogle || []).push({});
199+
</script>
200+
<script
201+
async
202+
src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8930077947690409"
203+
crossorigin="anonymous"></script>
204+
<ins
205+
class="adsbygoogle"
206+
style="display: block"
207+
data-ad-format="fluid"
208+
data-ad-layout-key="-fb+5w+4e-db+86"
209+
data-ad-client="ca-pub-8930077947690409"
210+
data-ad-slot="9866674087"></ins>
211+
<script>
212+
(adsbygoogle = window.adsbygoogle || []).push({});
213+
</script>
214+
<script
215+
async
216+
src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8930077947690409"
217+
crossorigin="anonymous"></script>
218+
<ins
219+
class="adsbygoogle"
220+
style="display: block"
221+
data-ad-format="fluid"
222+
data-ad-layout-key="-fb+5w+4e-db+86"
223+
data-ad-client="ca-pub-8930077947690409"
224+
data-ad-slot="9866674087"></ins>
225+
<script>
226+
(adsbygoogle = window.adsbygoogle || []).push({});
227+
</script>
228+
</div>
229+
</div>
230+
</div>
231+
<!-- Footer-->
232+
<footer class="py-5 bg-dark">
233+
<div class="container">
234+
<p class="m-0 text-center text-white">
235+
Copyright &copy CSEdge Learn 2024
236+
</p>
237+
</div>
238+
</footer>
239+
<!-- Bootstrap core JS-->
240+
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/js/bootstrap.bundle.min.js"></script>
241+
<!-- Core theme JS-->
242+
<script src="../script.js"></script>
243+
</body>
244+
</html>

0 commit comments

Comments
 (0)