Skip to content

rhdf5 chunk size warning #219

@ethanplunkett

Description

@ethanplunkett

Preprocess species in BirdFlowR running in R 4.5 with rhdf5

is producing a warning:

You created a large dataset with compression and chunking.
The chunk size is equal to the dataset dimensions.
If you want to read subsets of the dataset, you should test smaller chunk sizes to improve read times.

We should probably heed this. It's unclear to me if that's individual transition matrices that are written in one chunk which would be fine, or if the entire file is one chunk which would not be good. Some testing is in order.

There are several functions that read and write just part of the file that could be used for comparison testing:
read_geom() - reads just the $geom component of an HDF5 model file.
extend_birdflow() - reads, edits, and overwrites the $geom component of an HDF5 file.

To test write the hdf5 file with different chunk sizes and then see if it affects:

  1. Reading the geometry
  2. Updating the geometry
  3. Reading the entire file
    And then either set a chunk size or suppress the warning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions