The National Weather Service provides an hour-by-hour forecast for the
next seven days at the website forecast.weather.gov
.
The file scraper.R
contains the function get_forecast
which
downloads hourly forecast for the specified location, and converts the
xml format into an R data frame.
source("scraper.R", verbose=F)
head(get_forecast)
##
## 1 function (latitude, longitude)
## 2 {
## 3 url <- sprintf("https://forecast.weather.gov/MapClick.php?lat=%f&lon=%f&FcstType=digitalDWML",
## 4 latitude, longitude)
## 5 xml_data <- read_xml(url)
## 6 write_xml(xml_data, file = "download.xml")
The get_forecast
function can be used to get data for Lincoln, NE or
Ames, IA as shown below. Note that this code chunk is set to not
automatically evaluate eval = FALSE
, since we do not want to download
new data every single time we render the current document. Instead, the
results are saved in rds
format and loaded in the next code chunk.
lincoln <- get_forecast(40.8164, -96.6882)
saveRDS(lincoln, "lincoln.rds")
ames <- get_forecast(42.04, -93.68)
saveRDS(ames, "ames.rds")
Load previously saved data for Lincoln:
lincoln <- readRDS("lincoln.rds")
head(lincoln)
## # A tibble: 6 × 9
## parameter type units values times latitude longitude conditions source
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 temperature hourly <NA> 70 2025-08-… 40.8164… -96.6882… rain (cha… WFO O…
## 2 temperature hourly <NA> 72 2025-08-… 40.8164… -96.6882… rain (cha… WFO O…
## 3 temperature hourly <NA> 73 2025-08-… 40.8164… -96.6882… rain (cha… WFO O…
## 4 temperature hourly <NA> 74 2025-08-… 40.8164… -96.6882… rain (cha… WFO O…
## 5 temperature hourly <NA> 74 2025-08-… 40.8164… -96.6882… thunderst… WFO O…
## 6 temperature hourly <NA> 74 2025-08-… 40.8164… -96.6882… thunderst… WFO O…
Your job is to get a handle on the data structure.
-
Create a Quarto document called
index.qmd
, add it to the repository and include all of your answers and code in that file. Make sure to structure it such that it is obvious, which question you are addressing in which section of the file. -
Pick a place of your choice in the US and identify its latitude and longitude. Use the function
get_forecast
to get the 7-day forecast for that location. Save the data you scraped in anrds
file and add the file to the homework repo. -
Find examples in the data for violations of two different principles of tidy data. Describe which principle they violate and how.
-
Is the set of variables
latitude, longitude, times, parameter
a key to the data? Show (with code) why or why not. -
You are charged with re-structuring the data set such that the variables
latitude, longitude,
andtimes
are a key.- describe in words the format of the data set that has
latitude, longitude,
andtimes
as key. - write the code that reshapes the data set and execute it. Name
the result
data_lat_long_times
- validate (with code) that
latitude, longitude,
andtimes
are the key of thedata_lat_long_times
data set.
- describe in words the format of the data set that has
-
Why is
data_lat_long_times
not in 2nd normal form? Find at least two problems with split keys. -
Address the split key problems by separating the
data_lat_long_times
data set into three:- pick suitable names for each of the three data sets and specify which variables should go into each.
- write the code to split the
data_lat_long_times
data into three data sets. - discuss if the three data sets are in 2nd normal form.
-
The choice of using
latitude
andlongitude
as part of the key is potentially problematic. Discuss why and suggest a solution.
Make sure that your file index.qmd
includes all details to make your
answers fully reproducible. Ensure that the file renders properly. Add
all relevant(!) files to the repository, commit, and push!