Skip to content

feat: use the Python course as a template for a new JavaScript course #1584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 26 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
0c087a7
fix: update intro, lesson titles, and descriptions to mention JS
honzajavorek May 16, 2025
5cfd912
feat: decide about the technologies
honzajavorek May 16, 2025
2ec5a83
fix: update intro to be about JS
honzajavorek May 16, 2025
27ba023
fix: update devtools 1 to be about JS
honzajavorek May 16, 2025
62ace76
fix: update devtools 2 to be about JS
honzajavorek May 16, 2025
75121bc
fix: update devtools 3 to be about JS
honzajavorek May 16, 2025
8f107d6
fix: update downloading to be about JS
honzajavorek Jun 23, 2025
6cbd29d
fix: update parsing to be about JS
honzajavorek Jun 24, 2025
87f5b96
fix: update locating to be about JS
honzajavorek Jun 24, 2025
e53cc09
feat: make the example longer, because Congo can uncover some mistake…
honzajavorek Jun 24, 2025
8b7117a
fix: update extracting to be about JS
honzajavorek Jun 24, 2025
7f46f01
style: change order, first json, then csv
honzajavorek Jun 27, 2025
4cdf396
fix: use --save with npm install in the parsing lesson
honzajavorek Jun 27, 2025
15e8c3f
fix: various improvements to the Python lesson about saving data
honzajavorek Jun 27, 2025
d8973ff
fix: update saving to be about JS
honzajavorek Jun 27, 2025
360165b
fix: missing semicolon in saving data
honzajavorek Jun 30, 2025
fe2528c
feat: update getting links to be about JS
honzajavorek Jun 30, 2025
0c94e58
fix: make it clearer in saving data that we append more code
honzajavorek Jun 30, 2025
ce90190
feat: change naming of JS variables, update crawling to be about JS
honzajavorek Jun 30, 2025
7b3a1f7
feat: update crawling exercises to be about JS
honzajavorek Jun 30, 2025
4566f8d
feat: explain dollar sign variable names
honzajavorek Jul 1, 2025
16715fa
style: better readability for a code example in crawling lesson
honzajavorek Jul 1, 2025
ce403ac
feat: update first half of scraping variants to be about JS
honzajavorek Jul 1, 2025
3e65fb0
feat: update the rest of scraping variants to be about JS
honzajavorek Jul 2, 2025
20278bd
feat: update one scraping variants exercise to be about JS
honzajavorek Jul 2, 2025
36ad10b
feat: update another scraping variants exercise to be about JS
honzajavorek Jul 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Now let's peek behind the scenes of a real-world website—say, Wikipedia. We'll

![Wikipedia with Chrome DevTools open](./images/devtools-wikipedia.png)

Websites are built with three main technologies: HTML, CSS, and JavaScript. In the **Elements** tab, DevTools shows the HTML and CSS of the current page:
Apart from JavaScript, websites are built with two main technologies: HTML and CSS. In the **Elements** tab, DevTools shows the HTML and CSS of the current page:

![Elements tab in Chrome DevTools](./images/devtools-elements-tab.png)

Expand Down Expand Up @@ -58,9 +58,9 @@ HTML, a markup language, describes how everything on a page is organized, how el
}
```

While HTML and CSS describe what the browser should display, [JavaScript](https://developer.mozilla.org/en-US/docs/Learn/JavaScript) is a general-purpose programming language that adds interaction to the page.
While HTML and CSS describe what the browser should display, JavaScript adds interaction to the page. In DevTools, the **Console** tab allows ad-hoc experimenting with JavaScript.

In DevTools, the **Console** tab allows ad-hoc experimenting with JavaScript. If you don't see it, press **ESC** to toggle the Console. Running commands in the Console lets us manipulate the loaded page—we’ll try this shortly.
If you don't see it, press **ESC** to toggle the Console. Running commands in the Console lets us manipulate the loaded page—we’ll try this shortly.

![Console in Chrome DevTools](./images/devtools-console.png)

Expand Down Expand Up @@ -104,13 +104,13 @@ Encyclopedia

## Interacting with an element

We won't be creating Python scrapers just yet. Let's first get familiar with what we can do in the JavaScript console and how we can further interact with HTML elements on the page.
We won't be creating Node.js scrapers just yet. Let's first get familiar with what we can do in the DevTools console and how we can further interact with HTML elements on the page.

In the **Elements** tab, with the subtitle element highlighted, let's right-click the element to open the context menu. There, we'll choose **Store as global variable**. The **Console** should appear, with a `temp1` variable ready.

![Global variable in Chrome DevTools Console](./images/devtools-console-variable.png)

The Console allows us to run JavaScript in the context of the loaded page, similar to Python's [interactive REPL](https://realpython.com/interacting-with-python/). We can use it to play around with elements.
The Console allows us to run code in the context of the loaded page. We can use it to play around with elements.

For a start, let's access some of the subtitle's properties. One such property is `textContent`, which contains the text inside the HTML element. The last line in the Console is where your cursor is. We'll type the following and hit **Enter**:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,7 @@ The `class` attribute can hold multiple values separated by whitespace. This par

## Programmatically locating a product card

Let's jump into the **Console** and write some JavaScript. Don't worry—we don't need to know the language, and yes, this is a helpful step on our journey to creating a scraper in Python.

In browsers, JavaScript represents the current page as the [`Document`](https://developer.mozilla.org/en-US/docs/Web/API/Document) object, accessible via `document`. This object offers many useful methods, including [`querySelector()`](https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector). This method takes a CSS selector as a string and returns the first HTML element that matches. We'll try typing this into the **Console**:
Let's jump into the **Console** and write some code. In browsers, JavaScript represents the current page as the [`Document`](https://developer.mozilla.org/en-US/docs/Web/API/Document) object, accessible via `document`. This object offers many useful methods, including [`querySelector()`](https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector). This method takes a CSS selector as a string and returns the first HTML element that matches. We'll try typing this into the **Console**:

```js
document.querySelector('.product-item');
Expand Down Expand Up @@ -136,14 +134,14 @@ We'll expand the result by clicking the small arrow, then hover our cursor over

![Highlighting a querySelectorAll() result](./images/devtools-hover-queryselectorall.png)

To save the subwoofer in a variable for further inspection, we can use index access with brackets, just like in Python lists (or JavaScript arrays):
To save the subwoofer in a variable for further inspection, we can use index access with brackets, just like with regular JavaScript arrays:

```js
products = document.querySelectorAll('.product-item');
subwoofer = products[2];
```

Even though we're just playing with JavaScript in the browser's **Console**, we're inching closer to figuring out what our Python program will need to do. In the next lesson, we'll dive into accessing child elements and extracting product details.
Even though we're just playing in the browser's **Console**, we're inching closer to figuring out what our Node.js program will need to do. In the next lesson, we'll dive into accessing child elements and extracting product details.

---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ We'll use the **Elements** tab of DevTools to inspect all child elements of the

![Finding child elements](./images/devtools-product-details.png)

JavaScript represents HTML elements as [Element](https://developer.mozilla.org/en-US/docs/Web/API/Element) objects. Among properties we've already played with, such as `textContent` or `outerHTML`, it also has the [`querySelector()`](https://developer.mozilla.org/en-US/docs/Web/API/Element/querySelector) method. Here the method looks for matches only within children of the element:
Browser JavaScript represents HTML elements as [Element](https://developer.mozilla.org/en-US/docs/Web/API/Element) objects. Among properties we've already played with, such as `textContent` or `outerHTML`, it also has the [`querySelector()`](https://developer.mozilla.org/en-US/docs/Web/API/Element/querySelector) method. Here the method looks for matches only within children of the element:

```js
title = subwoofer.querySelector('.product-item__title');
Expand Down Expand Up @@ -69,17 +69,17 @@ It works, but the price isn't alone in the result. Before we'd use such data, we

![Extracting product price](./images/devtools-extracting-price.png)

But for now that's okay. We're just testing the waters now, so that we have an idea about what our scraper will need to do. Once we'll get to extracting prices in Python, we'll figure out how to get the values as numbers.
But for now that's okay. We're just testing the waters now, so that we have an idea about what our scraper will need to do. Once we'll get to extracting prices in Node.js, we'll figure out how to get the values as numbers.

In the next lesson, we'll start with our Python project. First we'll be figuring out how to download the Sales page without browser and make it accessible in a Python program.
In the next lesson, we'll start with our Node.js project. First we'll be figuring out how to download the Sales page without browser and make it accessible in a Node.js program.

---

<Exercises />

### Extract the price of IKEA's most expensive artificial plant

At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/artificial-plants-flowers-20492/), use CSS selectors and HTML elements manipulation in the **Console** to extract the price of the most expensive artificial plant (sold in Sweden, as you'll be browsing their Swedish offer). Before opening DevTools, use your judgment to adjust the page to make the task as straightforward as possible. Finally, use JavaScript's [`parseInt()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt) function to convert the price text into a number.
At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/artificial-plants-flowers-20492/), use CSS selectors and HTML elements manipulation in the **Console** to extract the price of the most expensive artificial plant (sold in Sweden, as you'll be browsing their Swedish offer). Before opening DevTools, use your judgment to adjust the page to make the task as straightforward as possible. Finally, use the [`parseInt()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt) function to convert the price text into a number.

<details>
<summary>Solution</summary>
Expand All @@ -98,7 +98,7 @@ At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/a

### Extract the name of the top wiki on Fandom Movies

On Fandom's [Movies page](https://www.fandom.com/topics/movies), use CSS selectors and HTML element manipulation in the **Console** to extract the name of the top wiki. Use JavaScript's [`trim()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim) method to remove white space around the name.
On Fandom's [Movies page](https://www.fandom.com/topics/movies), use CSS selectors and HTML element manipulation in the **Console** to extract the name of the top wiki. Use the [`trim()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim) method to remove white space around the name.

![Fandom's Movies page](./images/devtools-exercise-fandom.png)

Expand Down
Loading
Loading