work

sharav12 · sharav12 · commit 7a726bc57273 · 2025-02-17T11:25:56.000-05:00
diff --git a/inst/tutorials/24-web-scraping/tutorial.Rmd b/inst/tutorials/24-web-scraping/tutorial.Rmd
@@ -499,6 +499,11 @@ Examine the [webpage](https://rvest.tidyverse.org/articles/starwars.html) for th
   read_html()
 ```
 
+Before we talk much about web scraping, we should talk about whether it is legal and ethical to do so. Overall, the situation is complicated. Legalities depend a lot on where you live. However, as a general rule of thumb, if the data is public, non-personal, and factual, you're most likely ok. These three factors connect to the site’s terms and conditions, personally identifiable information, and copyright, hence their importance.
+If these factors are false, or you're scraping the web to make money, it's a good idea to talk to a lawyer, but in any case of web scraping, be respectful of the resources of the server hosting the page(s). This means that if you're scraping many pages, you should wait a bit in between each request.
+
+
+
 ### 
 
 The structure of the underlying HTML looks like this:
@@ -663,6 +668,10 @@ now we're going to learn how to streamline this process. Copy https://rvest.tidy
 "https://rvest.tidyverse.org/articles/starwars.html" 
 ```
 
+
+
+What we're attempting to do here is rather than piping the web page every time we web scrape, we make an object to hold the results of the pipe, which we can call when needed.
+
 ### 
 
 ### Exercise 7
@@ -687,6 +696,10 @@ Pipe this to `read_html()`.
          
 ```
 
+
+you can notice that the steps are currently similar right now, but there will be less steps needed to pull this off.
+
+
 ### 
 
 ### Exercise 8