Skip to content

Commit 23f226e

Browse files
authored
Merge pull request #178 from ropensci/fix/hydat_path
Update version
2 parents f9ebbc4 + cc9eaad commit 23f226e

17 files changed

+383
-335
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: tidyhydat
22
Title: Extract and Tidy Canadian 'Hydrometric' Data
3-
Version: 0.5.5
3+
Version: 0.5.6
44
Authors@R: c(person("Sam", "Albers", email = "sam.albers@gmail.com", role = c("aut", "cre"),
55
comment = c(ORCID = "0000-0002-9270-7884")),
66
person("David", "Hutchinson", email = "david.hutchinson@canada.ca", role = "ctb"),

NEWS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# tidyhydat 0.5.6
2+
- fixed CRAN document issue
3+
- fixed bug created by HYDAT database name (#175)
4+
15
# tidyhydat 0.5.5
26

37
### MINOR IMPROVEMENTS

R/download.R

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@
2828
#'
2929

3030
download_hydat <- function(dl_hydat_here = NULL, ask = TRUE) {
31-
3231
if(is.null(dl_hydat_here)){
3332
dl_hydat_here <- hy_dir()
3433
} else {
@@ -97,7 +96,7 @@ download_hydat <- function(dl_hydat_here = NULL, ask = TRUE) {
9796

9897

9998
## temporary path to save
100-
tmp <- tempfile("hydat_")
99+
tmp <- tempfile("hydat_", fileext = ".zip")
101100

102101
## Download the zip file
103102
res <- httr::GET(url, httr::write_disk(tmp), httr::progress("down"),
@@ -110,6 +109,12 @@ download_hydat <- function(dl_hydat_here = NULL, ask = TRUE) {
110109

111110
utils::unzip(tmp, exdir = dl_hydat_here, overwrite = TRUE)
112111

112+
## rename to consistent name
113+
file.rename(
114+
list.files(dl_hydat_here, pattern = "\\.sqlite3$", full.names = TRUE),
115+
hydat_path
116+
)
117+
113118

114119
if (file.exists(hydat_path)){
115120
congrats("HYDAT successfully downloaded")

cran-comments.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,25 @@
1+
tidyhydat 0.5.6
2+
=========================
3+
4+
There were zero WARNINGS and zero ERRORS.
5+
6+
There was one NOTE: 'Note: found 122 marked UTF-8 strings'. These strings are necessary for testing as the data source that this package accesses includes data with UTF-8 strings (french language accents)
7+
8+
## NEWS
9+
- fixed CRAN document issue
10+
- fixed bug created by HYDAT database name (#175)
11+
12+
## Test environments
13+
* win-builder (via `devtools::check_win_devel()` and `devtools::check_win_release()`)
14+
* local macOS, R 4.2.1 (via R CMD check --as-cran)
15+
* ubuntu-20.04, r: 'release' (github actions)
16+
* ubuntu-20.04, r: 'devel' (github actions)
17+
* macOS, r: 'release' (github actions)
18+
* windows, r: 'release' (github actions)
19+
* Fedora Linux, R-devel, clang, gfortran - r-hub
20+
* Debian Linux, R-release, GCC (debian-gcc-release) - r-hub
21+
* Windows Server 2008 R2 SP1, R-devel, 32/64 bit - r-hub
22+
123
tidyhydat 0.5.5
224
=========================
325

data-raw/HYDAT_internal_data/allstations.csv

Lines changed: 280 additions & 273 deletions
Large diffs are not rendered by default.

data/allstations.rda

228 Bytes
Binary file not shown.

data/hy_data_symbols.rda

7 Bytes
Binary file not shown.

data/hy_data_types.rda

1 Byte
Binary file not shown.

inst/test_db/tinyhydat.sqlite3

28 KB
Binary file not shown.

vignettes/tidyhydat_an_introduction.Rmd

Lines changed: 39 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "tidyhydat: An Introduction"
33
author: "Sam Albers"
4-
date: "2022-03-17"
4+
date: "2022-08-19"
55
output:
66
html_vignette:
77
keep_md: true
@@ -42,15 +42,15 @@ hy_daily_flows(station_number = "08LA001")
4242
```
4343

4444
```
45-
## Queried from version of HYDAT released on 2022-01-17
46-
## Observations: 30,255
47-
## Measurement flags: 6,020
45+
## Queried from version of HYDAT released on 2022-07-21
46+
## Observations: 31,351
47+
## Measurement flags: 6,166
4848
## Parameter(s): Flow
49-
## Date range: 1914-01-01 to 2018-12-31
49+
## Date range: 1914-01-01 to 2021-12-31
5050
## Station(s) returned: 1
5151
## Stations requested but not returned:
5252
## All stations returned.
53-
## # A tibble: 30,255 x 5
53+
## # A tibble: 31,351 × 5
5454
## STATION_NUMBER Date Parameter Value Symbol
5555
## <chr> <date> <chr> <dbl> <chr>
5656
## 1 08LA001 1914-01-01 Flow 144 <NA>
@@ -63,7 +63,8 @@ hy_daily_flows(station_number = "08LA001")
6363
## 8 08LA001 1914-01-08 Flow 140 <NA>
6464
## 9 08LA001 1914-01-09 Flow 140 <NA>
6565
## 10 08LA001 1914-01-10 Flow 140 <NA>
66-
## # ... with 30,245 more rows
66+
## # … with 31,341 more rows
67+
## # ℹ Use `print(n = ...)` to see more rows
6768
```
6869

6970
Another method is to use `hy_stations()` to generate your vector which is then given the `station_number` argument. For example, we could take a subset for only those active stations within Prince Edward Island (Province code:PE) and then create vector for `hy_daily_flows()`:
@@ -79,23 +80,24 @@ PEI_stns
7980
```
8081

8182
```
82-
## [1] "01CA003" "01CB002" "01CB004" "01CB018" "01CC002" "01CC005" "01CC010" "01CC011" "01CD005"
83+
## [1] "01CA003" "01CB002" "01CB004" "01CB018" "01CC002" "01CC005" "01CC010" "01CC011"
84+
## [9] "01CD005"
8385
```
8486

8587
```r
8688
hy_daily_flows(station_number = PEI_stns)
8789
```
8890

8991
```
90-
## Queried from version of HYDAT released on 2022-01-17
91-
## Observations: 113,507
92-
## Measurement flags: 20,357
92+
## Queried from version of HYDAT released on 2022-07-21
93+
## Observations: 114,605
94+
## Measurement flags: 20,524
9395
## Parameter(s): Flow
9496
## Date range: 1961-08-01 to 2020-12-31
9597
## Station(s) returned: 9
9698
## Stations requested but not returned:
9799
## All stations returned.
98-
## # A tibble: 113,507 x 5
100+
## # A tibble: 114,605 × 5
99101
## STATION_NUMBER Date Parameter Value Symbol
100102
## <chr> <date> <chr> <dbl> <chr>
101103
## 1 01CA003 1961-08-01 Flow NA <NA>
@@ -108,7 +110,8 @@ hy_daily_flows(station_number = PEI_stns)
108110
## 8 01CB002 1961-08-04 Flow NA <NA>
109111
## 9 01CA003 1961-08-05 Flow NA <NA>
110112
## 10 01CB002 1961-08-05 Flow NA <NA>
111-
## # ... with 113,497 more rows
113+
## # … with 114,595 more rows
114+
## # ℹ Use `print(n = ...)` to see more rows
112115
```
113116

114117
We can also merge our station choice and data extraction into one unified pipe which accomplishes a single goal. For example if for some reason we wanted all the stations in Canada that had the name "Canada" in them we unify that selection and data extraction process into a single pipe:
@@ -120,15 +123,15 @@ search_stn_name("canada") %>%
120123
```
121124

122125
```
123-
## Queried from version of HYDAT released on 2022-01-17
124-
## Observations: 84,594
125-
## Measurement flags: 25,617
126+
## Queried from version of HYDAT released on 2022-07-21
127+
## Observations: 86,056
128+
## Measurement flags: 26,218
126129
## Parameter(s): Flow
127-
## Date range: 1918-08-01 to 2020-12-31
130+
## Date range: 1918-08-01 to 2021-12-31
128131
## Station(s) returned: 7
129132
## Stations requested but not returned:
130133
## All stations returned.
131-
## # A tibble: 84,594 x 5
134+
## # A tibble: 86,056 × 5
132135
## STATION_NUMBER Date Parameter Value Symbol
133136
## <chr> <date> <chr> <dbl> <chr>
134137
## 1 01AK001 1918-08-01 Flow NA <NA>
@@ -141,7 +144,8 @@ search_stn_name("canada") %>%
141144
## 8 01AK001 1918-08-08 Flow 1.78 <NA>
142145
## 9 01AK001 1918-08-09 Flow 1.5 <NA>
143146
## 10 01AK001 1918-08-10 Flow 1.78 <NA>
144-
## # ... with 84,584 more rows
147+
## # … with 86,046 more rows
148+
## # ℹ Use `print(n = ...)` to see more rows
145149
```
146150

147151
We saw above that if we were only interested in a subset of dates we could use the `start_date` and `end_date` arguments. A date must be supplied to both these arguments in the form of YYYY-MM-DD. If you were interested in all daily flow data from station number "08LA001" for 1981, you would specify all days in 1981 :
@@ -194,7 +198,7 @@ search_stn_name("liard")
194198
```
195199

196200
```
197-
## # A tibble: 9 x 5
201+
## # A tibble: 9 × 5
198202
## STATION_NUMBER STATION_NAME PROV_TERR_STATE_LOC LATITUDE LONGITUDE
199203
## <chr> <chr> <chr> <dbl> <dbl>
200204
## 1 10AA001 LIARD RIVER AT UPPER CROSSING YT 60.1 -129.
@@ -214,20 +218,21 @@ search_stn_number("08MF")
214218
```
215219

216220
```
217-
## # A tibble: 51 x 5
218-
## STATION_NUMBER STATION_NAME PROV_TERR_STATE_LOC LATITUDE LONGITUDE
219-
## <chr> <chr> <chr> <dbl> <dbl>
220-
## 1 08MF005 FRASER RIVER AT HOPE BC 49.4 -121.
221-
## 2 08MF035 FRASER RIVER NEAR AGASSIZ BC 49.2 -122.
222-
## 3 08MF038 FRASER RIVER AT CANNOR BC 49.1 -122.
223-
## 4 08MF040 FRASER RIVER ABOVE TEXAS CREEK BC 50.6 -122.
224-
## 5 08MF062 COQUIHALLA RIVER BELOW NEEDLE CREEK BC 49.5 -121.
225-
## 6 08MF065 NAHATLATCH RIVER BELOW TACHEWANA CREEK BC 50.0 -122.
226-
## 7 08MF068 COQUIHALLA RIVER ABOVE ALEXANDER CREEK BC 49.4 -121.
227-
## 8 08MF072 FRASER RIVER AT LAIDLAW BC 49.3 -122.
228-
## 9 08MF073 FRASER RIVER AT HARRISON MILLS BC 49.2 -122.
229-
## 10 08MF001 ANDERSON RIVER NEAR BOSTON BAR BC 49.8 -121.
230-
## # ... with 41 more rows
221+
## # A tibble: 53 × 5
222+
## STATION_NUMBER STATION_NAME PROV_TERR_STATE_LOC LATIT…¹ LONGI…²
223+
## <chr> <chr> <chr> <dbl> <dbl>
224+
## 1 08MF005 FRASER RIVER AT HOPE BC 49.4 -121.
225+
## 2 08MF035 FRASER RIVER NEAR AGASSIZ BC 49.2 -122.
226+
## 3 08MF038 FRASER RIVER AT CANNOR BC 49.1 -122.
227+
## 4 08MF040 FRASER RIVER ABOVE TEXAS CREEK BC 50.6 -122.
228+
## 5 08MF062 COQUIHALLA RIVER BELOW NEEDLE CREEK BC 49.5 -121.
229+
## 6 08MF065 NAHATLATCH RIVER BELOW TACHEWANA CREEK BC 50.0 -122.
230+
## 7 08MF068 COQUIHALLA RIVER ABOVE ALEXANDER CREEK BC 49.4 -121.
231+
## 8 08MF072 FRASER RIVER AT LAIDLAW BC 49.3 -122.
232+
## 9 08MF073 FRASER RIVER AT HARRISON MILLS BC 49.2 -122.
233+
## 10 08MF001 ANDERSON RIVER NEAR BOSTON BAR BC 49.8 -121.
234+
## # … with 43 more rows, and abbreviated variable names ¹​LATITUDE, ²​LONGITUDE
235+
## # ℹ Use `print(n = ...)` to see more rows
231236
```
232237

233238
## Using joins

vignettes/tidyhydat_example_analysis.Rmd

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Two examples of using tidyhydat"
33
author: "Sam Albers"
4-
date: "2022-03-17"
4+
date: "2022-08-19"
55
output:
66
html_vignette:
77
keep_md: true
@@ -60,25 +60,26 @@ hy_stn_data_range()
6060
```
6161

6262
```
63-
## Queried from version of HYDAT released on 2022-01-17
64-
## Observations: 12,055
65-
## Station(s) returned: 7,923
63+
## Queried from version of HYDAT released on 2022-07-21
64+
## Observations: 12,076
65+
## Station(s) returned: 7,935
6666
## Stations requested but not returned:
6767
## All stations returned.
68-
## # A tibble: 12,055 x 6
68+
## # A tibble: 12,076 × 6
6969
## STATION_NUMBER DATA_TYPE SED_DATA_TYPE Year_from Year_to RECORD_LENGTH
7070
## <chr> <chr> <chr> <int> <int> <int>
7171
## 1 01AA002 Q <NA> 1967 1977 11
7272
## 2 01AD001 Q <NA> 1918 1997 80
73-
## 3 01AD002 Q <NA> 1926 2019 94
74-
## 4 01AD003 H <NA> 2011 2018 8
75-
## 5 01AD003 Q <NA> 1951 2018 68
73+
## 3 01AD002 Q <NA> 1926 2020 95
74+
## 4 01AD003 H <NA> 2011 2020 10
75+
## 5 01AD003 Q <NA> 1951 2020 70
7676
## 6 01AD004 H <NA> 1980 2019 35
7777
## 7 01AD004 Q <NA> 1968 1979 12
7878
## 8 01AD005 H <NA> 1966 1974 9
7979
## 9 01AD008 H <NA> 1972 1974 3
8080
## 10 01AD009 H <NA> 1973 1982 10
81-
## # ... with 12,045 more rows
81+
## # … with 12,066 more rows
82+
## # ℹ Use `print(n = ...)` to see more rows
8283
```
8384
Our objective here is to filter from this data for the station that has the longest record of flow (`DATA_TYPE == "Q"`). You'll also notice this symbol `%>%` which in R is called a [pipe](https://magrittr.tidyverse.org/reference/pipe.html). In code, read it as the word *then*. So for the data_range data we want to grab the data *then* filter it by flow ("Q") in `DATA_TYPE` and then by the maximum value of `RECORD_LENGTH`:
8485

@@ -88,12 +89,12 @@ hy_stn_data_range() %>%
8889
```
8990

9091
```
91-
## Queried from version of HYDAT released on 2022-01-17
92+
## Queried from version of HYDAT released on 2022-07-21
9293
## Observations: 1
9394
## Station(s) returned: 1
9495
## Stations requested but not returned:
9596
## All stations returned.
96-
## # A tibble: 1 x 6
97+
## # A tibble: 1 × 6
9798
## STATION_NUMBER DATA_TYPE SED_DATA_TYPE Year_from Year_to RECORD_LENGTH
9899
## <chr> <chr> <chr> <int> <int> <int>
99100
## 1 02HA003 Q <NA> 1860 2020 161

vignettes/tidyhydat_hydat_db.Rmd

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Stepping into the HYDAT Database"
33
author: "Dewey Dunnington"
4-
date: "2022-03-17"
4+
date: "2022-08-19"
55
output: rmarkdown::html_vignette
66
vignette: >
77
%\VignetteIndexEntry{Stepping into the HYDAT Database}
@@ -38,15 +38,17 @@ To list the tables, use `src_tbls()` from the **dplyr** package.
3838

3939
```r
4040
src_tbls(src)
41-
#> [1] "AGENCY_LIST" "ANNUAL_INSTANT_PEAKS" "ANNUAL_STATISTICS" "CONCENTRATION_SYMBOLS"
42-
#> [5] "DATA_SYMBOLS" "DATA_TYPES" "DATUM_LIST" "DLY_FLOWS"
43-
#> [9] "DLY_LEVELS" "MEASUREMENT_CODES" "OPERATION_CODES" "PEAK_CODES"
44-
#> [13] "PRECISION_CODES" "REGIONAL_OFFICE_LIST" "SAMPLE_REMARK_CODES" "SED_DATA_TYPES"
45-
#> [17] "SED_DLY_LOADS" "SED_DLY_SUSCON" "SED_SAMPLES" "SED_SAMPLES_PSD"
46-
#> [21] "SED_VERTICAL_LOCATION" "SED_VERTICAL_SYMBOLS" "STATIONS" "STN_DATA_COLLECTION"
47-
#> [25] "STN_DATA_RANGE" "STN_DATUM_CONVERSION" "STN_DATUM_UNRELATED" "STN_OPERATION_SCHEDULE"
48-
#> [29] "STN_REGULATION" "STN_REMARKS" "STN_REMARK_CODES" "STN_STATUS_CODES"
49-
#> [33] "VERSION"
41+
#> [1] "AGENCY_LIST" "ANNUAL_INSTANT_PEAKS" "ANNUAL_STATISTICS"
42+
#> [4] "CONCENTRATION_SYMBOLS" "DATA_SYMBOLS" "DATA_TYPES"
43+
#> [7] "DATUM_LIST" "DLY_FLOWS" "DLY_LEVELS"
44+
#> [10] "MEASUREMENT_CODES" "OPERATION_CODES" "PEAK_CODES"
45+
#> [13] "PRECISION_CODES" "REGIONAL_OFFICE_LIST" "SAMPLE_REMARK_CODES"
46+
#> [16] "SED_DATA_TYPES" "SED_DLY_LOADS" "SED_DLY_SUSCON"
47+
#> [19] "SED_SAMPLES" "SED_SAMPLES_PSD" "SED_VERTICAL_LOCATION"
48+
#> [22] "SED_VERTICAL_SYMBOLS" "STATIONS" "STN_DATA_COLLECTION"
49+
#> [25] "STN_DATA_RANGE" "STN_DATUM_CONVERSION" "STN_DATUM_UNRELATED"
50+
#> [28] "STN_OPERATION_SCHEDULE" "STN_REGULATION" "STN_REMARKS"
51+
#> [31] "STN_REMARK_CODES" "STN_STATUS_CODES" "VERSION"
5052
```
5153

5254
To inspect any particular table, use the `tbl()` function with the `src` and the table name.
@@ -55,7 +57,7 @@ To inspect any particular table, use the `tbl()` function with the `src` and the
5557
```r
5658
tbl(src, "STN_OPERATION_SCHEDULE")
5759
#> # Source: table<STN_OPERATION_SCHEDULE> [?? x 5]
58-
#> # Database: sqlite 3.37.2 [C:\work\_dev\GitHub_repos\tidyhydat\inst\test_db\tinyhydat.sqlite3]
60+
#> # Database: sqlite 3.39.2 [/Users/samalbers/_dev/gh_repos/tidyhydat/inst/test_db/tinyhydat.sqlite3]
5961
#> STATION_NUMBER DATA_TYPE YEAR MONTH_FROM MONTH_TO
6062
#> <chr> <chr> <int> <chr> <chr>
6163
#> 1 01AP003 H 1923 <NA> <NA>
@@ -68,7 +70,8 @@ tbl(src, "STN_OPERATION_SCHEDULE")
6870
#> 8 01AP003 H 1930 <NA> <NA>
6971
#> 9 01AP003 H 1931 <NA> <NA>
7072
#> 10 01AP003 H 1932 <NA> <NA>
71-
#> # ... with more rows
73+
#> # … with more rows
74+
#> # ℹ Use `print(n = ...)` to see more rows
7275
```
7376

7477
Working with SQL tables in dplyr is much like working with regular data frames, except no data is actually read from the database until necessary. Because some of these tables are large (particularly those containing the actual data), you will want to `filter()` the tables before you `collect()` them (the `collect()` operation loads them into memory as a `data.frame`).
@@ -78,7 +81,7 @@ Working with SQL tables in dplyr is much like working with regular data frames,
7881
tbl(src, "STN_OPERATION_SCHEDULE") %>%
7982
filter(STATION_NUMBER == "05AA008") %>%
8083
collect()
81-
#> # A tibble: 103 x 5
84+
#> # A tibble: 103 × 5
8285
#> STATION_NUMBER DATA_TYPE YEAR MONTH_FROM MONTH_TO
8386
#> <chr> <chr> <int> <chr> <chr>
8487
#> 1 05AA008 H 2012 JAN DEC
@@ -91,7 +94,8 @@ tbl(src, "STN_OPERATION_SCHEDULE") %>%
9194
#> 8 05AA008 H 2019 JAN DEC
9295
#> 9 05AA008 H 2020 JAN DEC
9396
#> 10 05AA008 Q 1910 <NA> <NA>
94-
#> # ... with 93 more rows
97+
#> # … with 93 more rows
98+
#> # ℹ Use `print(n = ...)` to see more rows
9599
```
96100

97101
When you are finished with the database (i.e., the end of the script), it is good practice to close the connection (you may get a loud red warning if you don't!).

vignettes/vignette-fig-old_rec-1.png

99 KB
Loading
163 KB
Loading
118 KB
Loading

vignettes/vignette-fig-tile_plt-1.png

27.4 KB
Loading
84.9 KB
Loading

0 commit comments

Comments
 (0)