Skip to content

Speed of s3 downloads #150

@bart1

Description

@bart1

I was excited to see that the s3 methods are implemented in the package. However when I try them out there seems to be a considerable speed difference (factor 10) compared to first downloading the file. Admittedly I have a reasonably fast internet connection. However I would have expected reading a 500*360 matrix out of a 20 Mb file would be faster without downloading the full file. Am I doing something wrong? Or should the s3 option not be used to speed up code? Or might this relate to the specific bucket?

require(rhdf5)
#> Loading required package: rhdf5
url<-"https://s3-eu-west-1.amazonaws.com/fmi-opendata-radar-volume-hdf5/2024/03/03/filuo/202403030100_filuo_PVOL.h5"
system.time({download.file(url, t<-tempfile(fileext = "h5"))
download<-h5read(file=t, name = "/dataset1/data1")
})
#>    user  system elapsed 
#>   0.181   0.135   2.086
system.time(direct<-h5read(file=url, s3=T, name = "/dataset1/data1"))
#>    user  system elapsed 
#>   0.693   0.108  24.391
all.equal(direct, download)
#> [1] TRUE
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 (2024-04-24)
#>  os       Ubuntu 22.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2024-11-06
#>  pandoc   3.1.11 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version date (UTC) lib source
#>  cli            3.6.3   2024-06-21 [1] CRAN (R 4.4.0)
#>  digest         0.6.35  2024-03-11 [1] CRAN (R 4.4.0)
#>  evaluate       0.23    2023-11-01 [1] CRAN (R 4.4.0)
#>  fastmap        1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
#>  fs             1.6.4   2024-04-25 [1] CRAN (R 4.4.0)
#>  glue           1.8.0   2024-09-30 [1] CRAN (R 4.4.0)
#>  htmltools      0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#>  knitr          1.46    2024-04-06 [1] CRAN (R 4.4.0)
#>  lifecycle      1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
#>  magrittr       2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
#>  purrr          1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
#>  R.cache        0.16.0  2022-07-21 [1] CRAN (R 4.4.0)
#>  R.methodsS3    1.8.2   2022-06-13 [1] CRAN (R 4.4.0)
#>  R.oo           1.26.0  2024-01-24 [1] CRAN (R 4.4.0)
#>  R.utils        2.12.3  2023-11-18 [1] CRAN (R 4.4.0)
#>  reprex         2.1.0   2024-01-11 [1] CRAN (R 4.4.0)
#>  rhdf5        * 2.48.0  2024-04-30 [1] Bioconduc~
#>  rhdf5filters   1.16.0  2024-04-30 [1] Bioconduc~
#>  Rhdf5lib       1.26.0  2024-04-30 [1] Bioconduc~
#>  rlang          1.1.4   2024-06-04 [1] CRAN (R 4.4.0)
#>  rmarkdown      2.26    2024-03-05 [1] CRAN (R 4.4.0)
#>  rstudioapi     0.17.1  2024-10-22 [1] CRAN (R 4.4.0)
#>  sessioninfo    1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
#>  styler         1.10.3  2024-04-07 [1] CRAN (R 4.4.0)
#>  vctrs          0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
#>  withr          3.0.2   2024-10-28 [1] CRAN (R 4.4.0)
#>  xfun           0.43    2024-03-25 [1] CRAN (R 4.4.0)
#>  yaml           2.3.9   2024-07-05 [1] CRAN (R 4.4.0)
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.4
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2024-11-06 with reprex v2.1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions