Skip to content

Commit df1f784

Browse files
authored
Merge pull request #399 from albhasan/m2022q1_e14
Episode 14 updated by replacing calls to raster and rgdal packages
2 parents 240acc4 + edf7901 commit df1f784

File tree

1 file changed

+93
-90
lines changed

1 file changed

+93
-90
lines changed

episodes/14-extract-ndvi-from-rasters-in-r.Rmd

Lines changed: 93 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -25,19 +25,21 @@ source("setup.R")
2525
::::::::::::::::::::::::::::::::::::::::::::::::::
2626

2727
```{r load-libraries, echo=FALSE, results="hide", message=FALSE, warning=FALSE}
28-
library(raster)
29-
library(rgdal)
28+
library(terra)
3029
library(ggplot2)
3130
library(dplyr)
3231
```
3332

3433
```{r load-data, echo=FALSE, results="hide"}
3534
# learners will have this data loaded from the previous episode
3635
37-
all_NDVI_HARV <- list.files("data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI", full.names = TRUE, pattern = ".tif$")
36+
all_NDVI_HARV <- list.files("data/NEON-DS-Landsat-NDVI/HARV/2011/NDVI",
37+
full.names = TRUE, pattern = ".tif$")
3838
3939
# Create a time series raster stack
40-
NDVI_HARV_stack <- stack(all_NDVI_HARV)
40+
NDVI_HARV_stack <- rast(all_NDVI_HARV)
41+
# NOTE: Fix the bands' names so they don't start with a number!
42+
names(NDVI_HARV_stack) <- paste0("X", names(NDVI_HARV_stack))
4143
4244
# apply scale factor
4345
NDVI_HARV_stack <- NDVI_HARV_stack/10000
@@ -47,72 +49,68 @@ NDVI_HARV_stack <- NDVI_HARV_stack/10000
4749

4850
## Things You'll Need To Complete This Episode
4951

50-
See the [lesson homepage](.) for detailed information about the software,
51-
data, and other prerequisites you will need to work through the examples in this episode.
52+
See the [lesson homepage](.) for detailed information about the software, data,
53+
and other prerequisites you will need to work through the examples in this
54+
episode.
5255

5356

5457
::::::::::::::::::::::::::::::::::::::::::::::::::
5558

56-
In this episode, we will extract NDVI values from a
57-
raster time series dataset and plot them using the
58-
`ggplot2` package.
59+
In this episode, we will extract NDVI values from a raster time series dataset
60+
and plot them using the `ggplot2` package.
5961

6062
## Extract Summary Statistics From Raster Data
6163

62-
We often want to extract summary values from raster data. For
63-
example, we might want to understand overall greeness across a field site or at
64-
each plot within a field site. These values can then be compared between
65-
different field sites and combined with other
66-
related metrics to support modeling and further analysis.
64+
We often want to extract summary values from raster data. For example, we might
65+
want to understand overall greeness across a field site or at each plot within
66+
a field site. These values can then be compared between different field sites
67+
and combined with other related metrics to support modeling and further
68+
analysis.
6769

6870
## Calculate Average NDVI
6971

70-
Our goal in this episode is to create a dataframe that contains a single,
71-
mean NDVI value for each raster in our time series. This value represents the
72-
mean NDVI value for this area on a given day.
72+
Our goal in this episode is to create a dataframe that contains a single, mean
73+
NDVI value for each raster in our time series. This value represents the mean
74+
NDVI value for this area on a given day.
7375

74-
We can calculate the mean for each raster using the `cellStats()` function. The
75-
`cellStats()` function produces a named numeric vector, where each value
76-
is associated with the name of raster stack it was derived from.
76+
We can calculate the mean for each raster using the `global()` function. The
77+
`global()` function produces a named numeric vector, where each value is
78+
associated with the name of raster stack it was derived from.
7779

7880
```{r}
79-
avg_NDVI_HARV <- cellStats(NDVI_HARV_stack, mean)
81+
avg_NDVI_HARV <- global(NDVI_HARV_stack, mean)
8082
avg_NDVI_HARV
8183
```
8284

83-
We can then convert our output to a data frame using `as.data.frame()`. It's a good
84-
idea to view the first few rows of our data frame with `head()` to make sure the
85-
structure is what we expect.
85+
The output is a data frame (othewise, we could use `as.data.frame()`). It's a
86+
good idea to view the first few rows of our data frame with `head()` to make
87+
sure the structure is what we expect.
8688

8789
```{r}
88-
avg_NDVI_HARV <- as.data.frame(avg_NDVI_HARV)
8990
head(avg_NDVI_HARV)
9091
```
9192

92-
We now have a data frame with row names that are based on the original file name and
93-
a mean NDVI value for each file. Next, let's clean up the column names in our
94-
data frame to make it easier for colleagues to work with our code.
93+
We now have a data frame with row names that are based on the original file
94+
name and a mean NDVI value for each file. Next, let's clean up the column names
95+
in our data frame to make it easier for colleagues to work with our code.
9596

96-
It is a bit confusing to have duplicate object \& column names (`avg_NDVI_HARV`). Additionally the "avg" does not clearly indicate what the value in that
97-
particular column is. Let's change the NDVI column name to `meanNDVI`.
97+
Let's change the NDVI column name to `meanNDVI`.
9898

9999
```{r view-dataframe-output}
100100
names(avg_NDVI_HARV) <- "meanNDVI"
101101
head(avg_NDVI_HARV)
102102
```
103103

104-
By renaming the column, we lose the "HARV" in the header that reminds us what
105-
site our data are from. While we are only working with one site now, we
106-
might want to compare several sites worth of data in the future. Let's add a
107-
column to our dataframe called "site".
104+
The new column name doesn't reminds us what site our data are from. While we
105+
are only working with one site now, we might want to compare several sites
106+
worth of data in the future. Let's add a column to our dataframe called "site".
108107

109108
```{r insert-site-name}
110109
avg_NDVI_HARV$site <- "HARV"
111110
```
112111

113-
We can populate this column with the
114-
site name - HARV. Let's also create a year column and populate it with 2011 -
115-
the year our data were collected.
112+
We can populate this column with the site name - HARV. Let's also create a year
113+
column and populate it with 2011 - the year our data were collected.
116114

117115
```{r}
118116
avg_NDVI_HARV$year <- "2011"
@@ -128,14 +126,15 @@ We'd like to produce a plot where Julian days (the numeric day of the year,
128126
0 - 365/366) are on the x-axis and NDVI is on the y-axis. To create this plot,
129127
we'll need a column that contains the Julian day value.
130128

131-
One way to create a Julian day column is to use `gsub()` on the file name in each
132-
row. We can replace both the `X` and the `_HARV_NDVI_crop` to extract the Julian
133-
Day value, just like we did in the [previous episode](13-plot-time-series-rasters-in-r/).
129+
One way to create a Julian day column is to use `gsub()` on the file name in
130+
each row. We can replace both the `X` and the `_HARV_NDVI_crop` to extract the
131+
Julian Day value, just like we did in the
132+
[previous episode](13-plot-time-series-rasters-in-r/).
134133

135-
This time we will use one additional trick to do both of these steps at the same
136-
time. The vertical bar character ( `|` ) is equivalent to the word "or". Using
137-
this character in our search pattern
138-
allows us to search for more than one pattern in our text strings.
134+
This time we will use one additional trick to do both of these steps at the
135+
same time. The vertical bar character ( `|` ) is equivalent to the word "or".
136+
Using this character in our search pattern allows us to search for more than
137+
one pattern in our text strings.
139138

140139
```{r extract-julian-day}
141140
julianDays <- gsub("X|_HARV_ndvi_crop", "", row.names(avg_NDVI_HARV))
@@ -158,16 +157,17 @@ class(avg_NDVI_HARV$julianDay)
158157
## Convert Julian Day to Date Class
159158

160159
Currently, the values in the Julian day column are stored as class `character`.
161-
Storing this data as a date object is better - for plotting, data subsetting and
162-
working with our data. Let's convert. We worked with data conversions
163-
[in an earlier episode](12-time-series-raster/). For a more
164-
introduction to date-time classes, see the NEON Data Skills tutorial
160+
Storing this data as a date object is better - for plotting, data subsetting
161+
and working with our data. Let's convert. We worked with data conversions
162+
[in an earlier episode](12-time-series-raster/). For an introduction to
163+
date-time classes, see the NEON Data Skills tutorial
165164
[Convert Date \& Time Data from Character Class to Date-Time Class (POSIX) in R](https://www.neonscience.org/dc-convert-date-time-POSIX-r).
166165

167-
To convert a Julian day number to a date class, we need to set the origin, which is
168-
the day that our Julian days start counting from. Our data is from 2011 and we
169-
know that the USGS Landsat Team created Julian day values for this year.
170-
Therefore, the first day or "origin" for our Julian day count is 01 January 2011.
166+
To convert a Julian day number to a date class, we need to set the origin,
167+
which is the day that our Julian days start counting from. Our data is from
168+
2011 and we know that the USGS Landsat Team created Julian day values for this
169+
year. Therefore, the first day or "origin" for our Julian day count is 01
170+
January 2011.
171171

172172
```{r}
173173
origin <- as.Date("2011-01-01")
@@ -183,11 +183,12 @@ Once we set the Julian day origin, we can add the Julian day value (as an
183183
integer) to the origin date.
184184

185185
Note that when we convert our integer class `julianDay` values to dates, we
186-
subtracted 1. This is because the origin day is 01 January 2011, so the extracted day
187-
is 01. The Julian Day (or year day) for this is also 01. When we convert from the
188-
integer 05 `julianDay` value (indicating 5th of January), we cannot simply add
189-
`origin + julianDay` because `01 + 05 = 06` or 06 January 2011. To correct, this
190-
error we then subtract 1 to get the correct day, January 05 2011.
186+
subtracted 1. This is because the origin day is 01 January 2011, so the
187+
extracted day is 01. The Julian Day (or year day) for this is also 01. When we
188+
convert from the integer 05 `julianDay` value (indicating 5th of January), we
189+
cannot simply add `origin + julianDay` because `01 + 05 = 06` or 06 January
190+
2011. To correct, this error we then subtract 1 to get the correct day, January
191+
05 2011.
191192

192193
```{r}
193194
avg_NDVI_HARV$Date<- origin + (avg_NDVI_HARV$julianDay - 1)
@@ -206,14 +207,12 @@ class(avg_NDVI_HARV$Date)
206207
## Challenge: NDVI for the San Joaquin Experimental Range
207208

208209
We often want to compare two different sites. The National Ecological
209-
Observatory Network (NEON) also has a field site in Southern California
210-
at the
210+
Observatory Network (NEON) also has a field site in Southern California at the
211211
[San Joaquin Experimental Range (SJER)](https://www.neonscience.org/field-sites/field-sites-map/SJER).
212212

213-
For this challenge, create a dataframe containing the mean NDVI values
214-
and the Julian days the data was collected (in date format)
215-
for the NEON San
216-
Joaquin Experimental Range field site. NDVI data for SJER are located in the
213+
For this challenge, create a dataframe containing the mean NDVI values and the
214+
Julian days the data was collected (in date format) for the NEON San Joaquin
215+
Experimental Range field site. NDVI data for SJER are located in the
217216
`NEON-DS-Landsat-NDVI/SJER/2011/NDVI` directory.
218217

219218
::::::::::::::: solution
@@ -229,15 +228,16 @@ all_NDVI_SJER <- list.files(NDVI_path_SJER,
229228
full.names = TRUE,
230229
pattern = ".tif$")
231230
232-
NDVI_stack_SJER <- stack(all_NDVI_SJER)
231+
NDVI_stack_SJER <- rast(all_NDVI_SJER)
232+
names(NDVI_stack_SJER) <- paste0("X", names(NDVI_stack_SJER))
233233
234234
NDVI_stack_SJER <- NDVI_stack_SJER/10000
235235
```
236236

237237
Then we can calculate the mean values for each day and put that in a dataframe.
238238

239239
```{r}
240-
avg_NDVI_SJER <- as.data.frame(cellStats(NDVI_stack_SJER, mean))
240+
avg_NDVI_SJER <- as.data.frame(global(NDVI_stack_SJER, mean))
241241
```
242242

243243
Next we rename the NDVI column, and add site and year columns to our data.
@@ -272,9 +272,9 @@ plot our data.
272272
```{r ggplot-data}
273273
ggplot(avg_NDVI_HARV, aes(julianDay, meanNDVI)) +
274274
geom_point() +
275-
ggtitle("Landsat Derived NDVI - 2011", subtitle = "NEON Harvard Forest Field Site") +
275+
ggtitle("Landsat Derived NDVI - 2011",
276+
subtitle = "NEON Harvard Forest Field Site") +
276277
xlab("Julian Days") + ylab("Mean NDVI")
277-
278278
```
279279

280280
::::::::::::::::::::::::::::::::::::::: challenge
@@ -316,7 +316,8 @@ Now we can plot both datasets on the same plot.
316316
ggplot(NDVI_HARV_SJER, aes(x = julianDay, y = meanNDVI, colour = site)) +
317317
geom_point(aes(group = site)) +
318318
geom_line(aes(group = site)) +
319-
ggtitle("Landsat Derived NDVI - 2011", subtitle = "Harvard Forest vs San Joaquin") +
319+
ggtitle("Landsat Derived NDVI - 2011",
320+
subtitle = "Harvard Forest vs San Joaquin") +
320321
xlab("Julian Day") + ylab("Mean NDVI")
321322
```
322323

@@ -335,9 +336,9 @@ on the x-axis.
335336
ggplot(NDVI_HARV_SJER, aes(x = Date, y = meanNDVI, colour = site)) +
336337
geom_point(aes(group = site)) +
337338
geom_line(aes(group = site)) +
338-
ggtitle("Landsat Derived NDVI - 2011", subtitle = "Harvard Forest vs San Joaquin") +
339+
ggtitle("Landsat Derived NDVI - 2011",
340+
subtitle = "Harvard Forest vs San Joaquin") +
339341
xlab("Date") + ylab("Mean NDVI")
340-
341342
```
342343

343344
:::::::::::::::::::::::::
@@ -352,10 +353,10 @@ towards 0 during a time period when we might expect the vegetation to have a
352353
higher greenness value. Is the vegetation truly senescent or gone or are these
353354
outlier values that should be removed from the data?
354355

355-
We've seen in [an earlier episode](12-time-series-raster/) that
356-
data points with very low NDVI values can be associated with
357-
images that are filled with clouds. Thus, we can attribute the low NDVI values
358-
to high levels of cloud cover. Is the same thing happening at SJER?
356+
We've seen in [an earlier episode](12-time-series-raster/) that data points
357+
with very low NDVI values can be associated with images that are filled with
358+
clouds. Thus, we can attribute the low NDVI values to high levels of cloud
359+
cover. Is the same thing happening at SJER?
359360

360361
```{r view-all-rgb-SJER, echo=FALSE}
361362
# code not shown, demonstration only
@@ -373,15 +374,14 @@ par(mfrow = c(5, 4))
373374
# that one image. You could automate this by testing the range of each band in each image
374375
375376
for (aFile in rgb.allCropped.SJER)
376-
{NDVI.rastStack <- stack(aFile)
377+
{NDVI.rastStack <- rast(aFile)
377378
if (aFile =="data/NEON-DS-Landsat-NDVI/SJER/2011/RGB//254_SJER_landRGB.tif")
378379
{plotRGB(NDVI.rastStack) }
379380
else { plotRGB(NDVI.rastStack, stretch="lin") }
380381
}
381382
382383
# reset layout
383384
par(mfrow=c(1, 1))
384-
385385
```
386386

387387
Without significant additional processing, we will not be able to retrieve a
@@ -399,12 +399,12 @@ episode. We can then use the subset function to remove outlier datapoints
399399

400400
## Data Tip
401401

402-
Thresholding, or removing outlier data,
403-
can be tricky business. In this case, we can be confident that some of our NDVI
404-
values are not valid due to cloud cover. However, a threshold value may not
405-
always be sufficient given that 0.1 could be a valid NDVI value in some areas. This
406-
is where decision-making should be fueled by practical scientific knowledge of
407-
the data and the desired outcomes!
402+
Thresholding, or removing outlier data, can be tricky business. In this case,
403+
we can be confident that some of our NDVI values are not valid due to cloud
404+
cover. However, a threshold value may not always be sufficient given that 0.1
405+
could be a valid NDVI value in some areas. This is where decision-making should
406+
be fueled by practical scientific knowledge of the data and the desired
407+
outcomes!
408408

409409

410410
::::::::::::::::::::::::::::::::::::::::::::::::::
@@ -419,7 +419,8 @@ Now we can create another plot without the suspect data.
419419
```{r plot-clean-HARV}
420420
ggplot(avg_NDVI_HARV_clean, aes(x = julianDay, y = meanNDVI)) +
421421
geom_point() +
422-
ggtitle("Landsat Derived NDVI - 2011", subtitle = "NEON Harvard Forest Field Site") +
422+
ggtitle("Landsat Derived NDVI - 2011",
423+
subtitle = "NEON Harvard Forest Field Site") +
423424
xlab("Julian Days") + ylab("Mean NDVI")
424425
```
425426

@@ -437,16 +438,16 @@ We will use `write.csv()` to write a specified dataframe to a `.csv` file.
437438
Unless you designate a different directory, the output file will be saved in
438439
your working directory.
439440

440-
Before saving our file, let's view the format to make sure it is what we
441-
want as an output format.
441+
Before saving our file, let's view the format to make sure it is what we want
442+
as an output format.
442443

443444
```{r write-csv}
444445
head(avg_NDVI_HARV_clean)
445446
```
446447

447-
It looks like we have a series of `row.names` that we do not need because we have
448-
this information stored in individual columns in our data frame. Let's remove
449-
the row names.
448+
It looks like we have a series of `row.names` that we do not need because we
449+
have this information stored in individual columns in our data frame. Let's
450+
remove the row names.
450451

451452
```{r drop-rownames-write-csv}
452453
row.names(avg_NDVI_HARV_clean) <- NULL
@@ -485,9 +486,11 @@ write.csv(avg_NDVI_SJER_clean, file = "meanNDVI_SJER_2011.csv")
485486

486487
:::::::::::::::::::::::::::::::::::::::: keypoints
487488

488-
- Use the `cellStats()` function to calculate summary statistics for cells in a raster object.
489+
- Use the `global()` function to calculate summary statistics for cells in a
490+
raster object.
489491
- The pipe (`|`) operator means `or`.
490-
- Use the `rbind()` function to combine data frames that have the same column names.
492+
- Use the `rbind()` function to combine data frames that have the same column
493+
names.
491494

492495
::::::::::::::::::::::::::::::::::::::::::::::::::
493496

0 commit comments

Comments
 (0)