Should there be a chunk iterator for writing datasets using 'create_dataset'?

When writing an HDF5 file to the HSDS with H5PYD, it appears that although chunks are being created in the final output file, the initial writing of the data seems to operate in a contiguous manner. This would sometimes produce interrupts (http request errors) when writing large, ~GB-size HDF5 files (~GB-size) with H5PYD to the HSDS despite having more than enough memory in each of the HSDS data nodes. Writing smaller, ~MB-sized files was hit and miss, and KB-sized files had no issues. The 3D datasets in the HDF5 files of varied sizes (~GB, ~MB, and ~KB-size) used in these tests were filled with 3D random numpy arrays.
 
In order to use the H5PYD Chunk Interator in create_dataset, the following fix is suggested:

The line below is added to the import statements in the group.py file in h5pyd/_hl:
 
from h5pyd._apps.chunkiter import ChunkIterator
 
In the group.py file under h5pyd/_hl, change lines 334-336 from this:

if data is not None:
            self.log.info("initialize data")
            dset[...] = data

to this:
 
        if data is not None:
            self.log.info("initialize data")
            # dset[...] = data
            it = ChunkIterator(dset)
            for chunk in it:
               dset[chunk] = data[chunk]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Should there be a chunk iterator for writing datasets using 'create_dataset'? #88

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Should there be a chunk iterator for writing datasets using 'create_dataset'? #88

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions