How safe is it to open NWBs without a context manager? #41
-
I notice in the documentation, a lot of the time files are read like this:
as opposed to using a context manager. Is this best practice? Specifically, is it possible to end up with a corrupted HDF5 file if the kernel crashes unexpectedly while reading from a file that's opened like this? Or is reading generally less prone to this sort of thing happening than writing? I tried doing this and restarting the kernel a bunch of times and it was fine, but it's hard to recreate an unexpected crash. I ask because I'm trying to make it a bit easier to analyze data from many NWBs files at once in, say, a jupyter notebook, by collecting references to all the relevant NWB objects in some object (dict, DataFrame, etc), and then being able to do something like:
You can also imagine this being useful to being able to subset data by various kinds of session / subject level metadata. Other solutions to address this issue are also welcome. The other obvious way is to extract all the data from the NWBs into some set of linked dataframes and load it all into memory (a la the Allen SDK), but I'd like to avoid loading all the data into memory! Thanks :) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Ok, I wrote a small class to handle opening and closing the files politely. Curious to hear thoughts about whether there's a better way. It's much slower than keeping the file handles open, but on my machine it still only takes ~100 ms to open and read each file. So even if you're reading 5-10 NWB fields and doing that for 5-10 sessions, it will still only take you an extra ~10 seconds, which feels tolerable. Of course if you're literally plotting raw data it will be painfully slow, but in that case, just read in the data you want to work with little chunks at a time.
|
Beta Was this translation helpful? Give feedback.
-
No, opening a file with mode
For read it is common because on read users often need to perform a large number of read operations on the same file that do not nicely fit into a context manager. To avoid opening and parsing the file for every read, the file is typically opened until the end of the analysis. The file should then be closed explicitly (note files are also closed automatically when the NWBHDF5IO object is being deleted, e.g., when the script finishes).
Note, the files will need to remain open for the references to work. In particular, datasets are loaded lazily as h5py.Dataset objects in PyNWB, and so in order to read data, e.g,. from a TimeSeries object, the file will need to be open. |
Beta Was this translation helpful? Give feedback.
No, opening a file with mode
'r'
enforces that no changes are being made to the file and as such the file cannot be corrupted.For read it is common because on read users often need to perform a large number of read operations on the same file that do not nicely fit into a context manager. To avoid opening and parsing the file for every read, the file is typically opened until the end of the analysis. The file should then be closed explicitly (note files are also closed auto…