-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Description
I was excited to see that the s3
methods are implemented in the package. However when I try them out there seems to be a considerable speed difference (factor 10) compared to first downloading the file. Admittedly I have a reasonably fast internet connection. However I would have expected reading a 500*360 matrix out of a 20 Mb file would be faster without downloading the full file. Am I doing something wrong? Or should the s3
option not be used to speed up code? Or might this relate to the specific bucket?
require(rhdf5)
#> Loading required package: rhdf5
url<-"https://s3-eu-west-1.amazonaws.com/fmi-opendata-radar-volume-hdf5/2024/03/03/filuo/202403030100_filuo_PVOL.h5"
system.time({download.file(url, t<-tempfile(fileext = "h5"))
download<-h5read(file=t, name = "/dataset1/data1")
})
#> user system elapsed
#> 0.181 0.135 2.086
system.time(direct<-h5read(file=url, s3=T, name = "/dataset1/data1"))
#> user system elapsed
#> 0.693 0.108 24.391
all.equal(direct, download)
#> [1] TRUE
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.0 (2024-04-24)
#> os Ubuntu 22.04.4 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Amsterdam
#> date 2024-11-06
#> pandoc 3.1.11 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.0)
#> digest 0.6.35 2024-03-11 [1] CRAN (R 4.4.0)
#> evaluate 0.23 2023-11-01 [1] CRAN (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
#> fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
#> glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#> knitr 1.46 2024-04-06 [1] CRAN (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
#> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.4.0)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.4.0)
#> R.oo 1.26.0 2024-01-24 [1] CRAN (R 4.4.0)
#> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.4.0)
#> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.4.0)
#> rhdf5 * 2.48.0 2024-04-30 [1] Bioconduc~
#> rhdf5filters 1.16.0 2024-04-30 [1] Bioconduc~
#> Rhdf5lib 1.26.0 2024-04-30 [1] Bioconduc~
#> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0)
#> rmarkdown 2.26 2024-03-05 [1] CRAN (R 4.4.0)
#> rstudioapi 0.17.1 2024-10-22 [1] CRAN (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
#> styler 1.10.3 2024-04-07 [1] CRAN (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
#> withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.0)
#> xfun 0.43 2024-03-25 [1] CRAN (R 4.4.0)
#> yaml 2.3.9 2024-07-05 [1] CRAN (R 4.4.0)
#>
#> [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.4
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Created on 2024-11-06 with reprex v2.1.0
Metadata
Metadata
Assignees
Labels
No labels