New types of datasets supported for Delmic HDF5 format #328

noemiebonnet · 2024-10-28T18:54:47Z

New supported formats: intensity, hyperspectral, angle-resolved, E-k, time-resolved.

Progress of the PR

rsciio/delmic/_api.py

codecov · 2024-10-28T19:00:05Z

Codecov Report

❌ Patch coverage is 91.42857% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.08%. Comparing base (6ea01bd) to head (e6e1985).
⚠️ Report is 73 commits behind head on main.

Files with missing lines	Patch %	Lines
rsciio/delmic/_api.py	91.42%	15 Missing and 9 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #328      +/-   ##
==========================================
+ Coverage   88.02%   88.08%   +0.06%     
==========================================
  Files          91       91              
  Lines       11538    11798     +260     
  Branches     2131     2186      +55     
==========================================
+ Hits        10156    10392     +236     
- Misses        875      890      +15     
- Partials      507      516       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

noemiebonnet · 2024-10-29T12:51:29Z

pre-commit.ci autofix

jlaehne

Thanks @noemiebonnet for the substantial progress! I left a few comments from a first browsing over the code, but did not yet have time for a closer look. Please include the standard tracking list in the initial comment to get an overview of how far the PR is.

There are still a number of lines uncovered by tests, as commented by codecov. If lines should explicitly be ignored in the coverage test, one can add the comment # pragma: no cover at the end of the line. I see that most uncovered lines concern warnings or errors that might surface during file reading. It might be hard to test for those if one has only proper test files. What is our current best-practice in that case @ericpre? (Note that you can also get inspiration on such cases from other files readers recently implemented, such as Horiba JobinYvon or Hamamatsu).

doc/user_guide/supported_formats/delmic.rst

rsciio/delmic/_api.py

rsciio/tests/test_delmic.py

rsciio/delmic/_api.py

jlaehne · 2024-11-05T08:47:03Z

A single spectrum is still loaded as <Signal1D, title: , dimensions: (1, 1|2048)>, but should be <Signal1D, title: , dimensions: (|2048)> (drop the navigation dimensions).

rsciio/delmic/_api.py

jlaehne · 2024-11-13T07:37:09Z

rsciio/delmic/_api.py

+    the associated image type.
+    """
+    if Acq is None:
+        raise TypeError(


Currently, all the various loading errors are giving codecov warnings, because they are not included in the tests. In hdf5, it should be rather straight forward to create some minimal test files for each case where certain elements are deleted from the file to test each of these cases. An idea could also be to use hdf5 files from some other rsciio plugins that do not match the delmic specifications to test some of the errors.

rsciio/delmic/_api.py

jlaehne · 2024-11-13T07:46:57Z

rsciio/delmic/_api.py

+        or Image.shape[3] < 1
+        or Image.shape[4] < 1
+    ):
+        raise ValueError(


As an example for testing errors, here you could use the file from a different type of data and change the img_type to get a case that triggers this error.

jlaehne · 2024-11-13T07:54:50Z

rsciio/delmic/_api.py

+            Scale = np.array(ImgData.get(scale_key))
+
+            if axis_name in ["C", "T"]:
+                scale_value = np.mean(


Some other readers have a parameter to choose whether to read such axes as a UniformDataAxis by calculating the mean scale, or as a non-uniform DataAxis by just taking exactly the vector that is in the file.

see e.g.:

rosettasciio/rsciio/jobinyvon/_api.py

Line 684 in 1274ed5

use_uniform_signal_axis : bool, default=False

rsciio/delmic/_api.py

jlaehne · 2025-03-23T08:59:27Z

@noemiebonnet is it possible to update the progress you have made on your local branch? Might be enough to include the current PR in the release planned for the next days: #381

aidanc151 · 2025-06-23T15:26:29Z

I'm running into an error message after converting a .h5 into a .hspy, and then trying to load the .hspy:

TypeError: issubclass() arg 1 must be a class

To reproduce:

Load a .h5 hyperspectral map
Use the .save wrapper and save into .hspy format
Load the new .hspy file

It looks like it might be related to the original metadata - as if the initial loading of .h5 is done without the original metadata, the error is avoided, e.g.:

hs.load('hyperspectral.h5', reader = 'delmic', load_original_metadata=False)

jlaehne · 2025-06-23T20:50:48Z

It looks like it might be related to the original metadata

Indeed, the parsing of original_metadata is still wrong. I inspected the resulting dictionary, e.g.:

from rsciio.delmic import file_reader
d1 = file_reader('delmic_file.h5')
d1.[0]['original_metadata']

and it results in:

{'BackprojectedIlluminationPinholeRadius': array([2.8e-07]),
...

All values are read in as array, no matter of their content - none of them actually seem to contain more than one array element. In particular, nested dictionaries such as for the 'ExtraSettings' are not correctly parsed by HyperSpy if the dictionary is embedded in an array.

pieleric · 2025-07-03T19:00:10Z

I will pick up this PR from now on. I went through all the code and the comments of this pull-request and attempted to address all the issues raised. Here is a summary of the (major) update:

Major overhaul of the code structure, to be more generic, and handle all (corner) cases.
Adjust the behaviour of "signal" argument: it now defaults to passing all the data, and all the possible options are lower-case only.
The metadata now also include acquisition time and dataset title.
The original_metadata now also include the SVIData and acquisition time.
Ensure that the original_metadata dict only contains basic Python types (ie, no numpy array or JSON-encoded strings)
Fix offset value in the X & Y which were incorrectly computed.
Fix incorrect direction of the X axes in 4D angular-resolved datasets.
If LumiSpy is installed, use the corresponding Signal type for dataset matching CL_SEM, LumiTransient, and LumiTransientSpectrum types.
Support angular-resolved data with multiple polarizations.
Remove some test cases which were only testing the same as another one.
Add test case attempting to load incorrect HDF5 file.

I now see that some CI tests are failing (though all the test cases did pass on my computer). I'll dig into these issues tomorrow.

Any feedback is already well appreciated!

rsciio/delmic/_api.py

rsciio/tests/test_delmic.py

rsciio/delmic/_api.py

doc/supported_formats/delmic.rst

jlaehne · 2025-07-17T13:40:32Z

upcoming_changes/328.new.rst

@@ -0,0 +1 @@
+:ref:`Delmic <delmic-format>` format: add support for Delmic HDF5 (cathodoluminescence) acquisition files


We added basic support in the 0.7 release:
https://rosettasciio.readthedocs.io/en/latest/changes.html#id42

So the changelog should make clear that more complete support for different types of acquistions is now added.

jlaehne · 2025-07-17T13:48:24Z

doc/supported_formats/delmic.rst

+of three datasets. It is possible to load each of them separately.
+
+.. Note::
+    To load the various types of datasets in the file, use the ``signal`` argument


We were discussing what is actually the best default here - reading only CL to return a single signal item as in most other readers or the list with all items, but I guess the main point would be that it is well documented.

Also, I am not certain having the survey first is the best order of the signals in the list. I assume the order is motivated by the odemis files? Generally, I would find it more intuitive to have the actual CL signal first, then concurrent and last survey (making the survey always last item, even if there are multiple streams in one file).

Finally, I would say there is not really a point in reading in the concurrent se when it is dimensionless (for spectra, single transients or streak images).

In particular having the anchor region data from drift correction included in the default loading seems a bit overload - so maybe default to cl and have an option all to include the other datasets?

Sure. I've now changed it to return only the "cl" datasets by default. Also changed from None to "all" to get all the datasets.

Concerning the datasets with a single point, although it's probably not useful, I'd rather always return them when that type of dataset (or "all") is requested. This keeps the code simple and makes sure that if for some reason the user actually do care about this single pixel, they can still read it.

Fair enough for the datasets with a single point. Only, I would still tend to reverse the order of the different signals in the list for "all" to have the 'cl' first (as it is the main one and now also the one that is read by default) and then add the additional signals with 'survey' last.

jlaehne · 2025-07-17T14:01:14Z

Thanks Eric for picking up on this. In general it looks good, and some first tests with real data did not surface any problems.

We are actually interested in getting this released ASAP. So I would try to resolve anything that needs to be decided in terms of API (see some comments on the signal parameter) and as soon as tests run through merge the PR.

In the mid-term, additional features such as metadata parsing and features such as reading in scan areas/points on the survey as markers could be added via separate PRs. Coverage is not optimal, but missing lines mainly seem to concern errors at could maybe also be addressed later.

pieleric · 2025-07-18T09:35:42Z

Thanks Eric for picking up on this. In general it looks good, and some first tests with real data did not surface any problems.

We are actually interested in getting this released ASAP. So I would try to resolve anything that needs to be decided in terms of API (see some comments on the signal parameter) and as soon as tests run through merge the PR.

In the mid-term, additional features such as metadata parsing and features such as reading in scan areas/points on the survey as markers could be added via separate PRs. Coverage is not optimal, but missing lines mainly seem to concern errors at could maybe also be addressed later.

Thanks Jonas for the review. I've tried to correct the code for all your comments. I agree with the way-forward. Let's try to get this big change merged in, and then we can more easily bring extra features later on in "small bites".

Let me know if you want me to squash some of the commits together before merging the PR.

pieleric · 2025-07-18T16:29:27Z

I've tried hard to get the packaging test to pass, but I'm now at loss at how to handle the current errors. I assume it's related to the way the test data is stored... but no idea what should be changed to fix it, or even if that's a real error. Do you have any hint on how to solve such error?

ERROR tests/test_import.py::test_import_version - requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://github.com/noemiebonnet/rosettasciio/raw/extend-delmic-format/rsciio/tests/data/zspy/signal1d_10x10-DirectoryStore.zspy/.zattrs

ericpre · 2025-07-18T16:41:25Z

I don't remember the exact reason, but this is related to a recently merged PR #417 that add these test files and these are not included in this branch. A rebase should fix it.

jlaehne · 2025-07-18T17:03:45Z

Let me know if you want me to squash some of the commits together before merging the PR.

Maybe squash the last few commits with only minor corrections into one.

Major overhaul of the code structure, to be more generic, and handle all (corner) cases. Adjust the behaviour of "signal" argument: it now defaults to passing all the data, and all the possible options are lower-case only. The metadata now also include acquisition time and dataset title. The original_metadata now also include the SVIData and acquisition time. Ensure that the original_metadata dict only contains basic Python types (ie, no numpy array or JSON-encoded strings) Fix offset value in the X & Y which were incorrectly computed. Fix incorrect direction of the X axes in 4D angular-resolved datasets. If LumiSpy is installed, use the corresponding Signal type for dataset matching CL_SEM, LumiTransient, and LumiTransientSpectrum types. Support angular-resolved data with multiple polarizations. Remove some test cases which were only testing the same as another one. Add test case attempting to load incorrect HDF5 file.

ruff check registry handle lumispy not present

By default, only return cathodoluminescence datasets. Also adjust the value to pass to get all the datasets from None to "all".

Don't return the datasets based on the order stored in HDF5, but by type. Especially, for the user, CL data is the most interesting, so returning it first. Also add a test case for anchor region (aka drift correction).

Some of the exception handlers should only be reached with malformed data, which we don't have. So let's not count these paths for now.

pieleric · 2025-07-23T08:06:51Z

I've updated the pull-request with these changes:

sort the returned data to be in the order CL, SE, survey, anchor
fix the documentation according to the comments by Jonas
rebased on the latest main (to avoid the test failures hopefully)
Simplified the test cases, so that each call to the loader is done only once
improved coverage by testing a few more cases (data with anchor region, incorrect arguments) and adding some pragma's around the most unlikely exception handlers.

Talking of coverage, I don't have the tools to measure it locally, so it's a bit hard to check how well it will go. Also it seems the coverage is considered a "fail" if it's below the project average, which is currently at 87%. Please be aware that mathematically, if every merge is above the average, this average will increase, which will make the bar to entry higher and higher for the next pull-requests, which will make the work of submitters harder and harder...

jlaehne · 2025-07-23T08:22:51Z

Thanks Éric :-)

* sort the returned data to be in the order CL, SE, survey, anchor

My thought was to have survey last so that it can always be accessed as the [-1] element of the list (and anchor is not always there).

jlaehne · 2025-07-23T08:29:47Z

Talking of coverage, I don't have the tools to measure it locally, so it's a bit hard to check how well it will go. Also it seems the coverage is considered a "fail" if it's below the project average, which is currently at 87%. Please be aware that mathematically, if every merge is above the average, this average will increase, which will make the bar to entry higher and higher for the next pull-requests, which will make the work of submitters harder and harder...

You can see the current report and a markup of missing lines at Codecov, but you have reached the threshold now:
https://app.codecov.io/gh/hyperspy/rosettasciio/pull/328?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=checks&utm_campaign=pr+comments&utm_term=hyperspy

The aim should be close to 100% coverage for the relevant parts of new contributions (and most of the rest can usually be ignored by pragmas). It is well below 100% mainly because some of the oldest readers started out without any tests and were never brought close to this value, but indeed we have improved over the years. In the end, the check is mainly to have a guideline where we are heading and it is at the discretion of the maintainers to merge PRs even with a below average coverage. From experience, it is good to directly push for good coverage instead of postponing that to separate PRs.

ericpre · 2025-07-23T08:31:58Z

There will be an expected failure on the build (the one labelled "hyperspy_dev") with the development branch of hyperspy. This can be ignored in this PR.

For the coverage, if it doesn't pass the currently set criteria, it is not a big deal, usually what we do is to check online (using the link in the "github checks") where the code is missed and correct where needed.

jlaehne · 2025-07-23T08:32:51Z

rebased on the latest main (to avoid the test failures hopefully)

Seems to have solved all but one hspy-related test failure. @ericpre ?

ericpre · 2025-07-23T08:52:46Z

Yes, these are the one that are expected and sorted in #425 and hyperspy/hyperspy#3528.

pieleric · 2025-07-24T07:33:57Z

My thought was to have survey last so that it can always be accessed as the [-1] element of the list (and anchor is not always there).

Ok. Changed the order to CL, SE, anchor, survey.

jlaehne · 2025-07-24T19:29:30Z

Thanks @pieleric - that is a great step forward.

TODO:

Mapping from original_metadata to metadata according to the LumiSpy+HyperSpy(SEM) conventions
Reading of scan area / spectral position on survey image as marker

github-advanced-security bot found potential problems Oct 28, 2024

View reviewed changes

rsciio/delmic/_api.py Fixed Show fixed Hide fixed

rsciio/delmic/_api.py Fixed Show fixed Hide fixed

rsciio/delmic/_api.py Fixed Show fixed Hide fixed

jlaehne added this to the v0.7 milestone Nov 3, 2024

jlaehne added status: needs review type: enhancement labels Nov 3, 2024

jlaehne reviewed Nov 3, 2024

View reviewed changes

jlaehne reviewed Nov 13, 2024

View reviewed changes

jlaehne added status: waiting for author and removed status: needs review labels Nov 13, 2024

jlaehne mentioned this pull request Dec 12, 2024

Making a New Release #344

Closed

ericpre modified the milestones: v0.7, v0.8 Dec 13, 2024

ericpre removed this from the v0.8 milestone Mar 22, 2025

jlaehne added this to the v0.10 milestone May 28, 2025

noemiebonnet force-pushed the extend-delmic-format branch from 3ab696c to 853f36c Compare June 20, 2025 08:58

pieleric force-pushed the extend-delmic-format branch from 853f36c to cd5b958 Compare July 3, 2025 18:47

github-advanced-security bot found potential problems Jul 3, 2025

View reviewed changes

rsciio/delmic/_api.py Fixed Show fixed Hide fixed

rsciio/tests/test_delmic.py Fixed Show fixed Hide fixed

rsciio/tests/test_delmic.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Jul 4, 2025

View reviewed changes

rsciio/delmic/_api.py Fixed Show fixed Hide fixed

jlaehne reviewed Jul 17, 2025

View reviewed changes

noemiebonnet and others added 12 commits July 22, 2025 11:18

add lumispy condition in metadata

5d937bd

Add testing functions for CL, SE, survey signals

80bfe2c

Create test functions for data stacks

670bbc5

Add new metadata signal_types and corresponding test functions

597fa3a

Add test functions and reference files for single-point measurements

1c68f03

Add test function and file for streak camera spot measurement

3142543

Add documentation

ebae3a1

Add changelog entry for new Delmic format

72d2469

delmic format: minor clean-ups

5db37c7

ruff check registry handle lumispy not present

delmic: adjust signal parameter to default to "cl"

70bb83d

By default, only return cathodoluminescence datasets. Also adjust the value to pass to get all the datasets from None to "all".

delmic: refactor tests to be a single test per type of data loading

106687e

pieleric force-pushed the extend-delmic-format branch from c1449f7 to 9c63bc9 Compare July 23, 2025 07:52

pieleric added 3 commits July 23, 2025 10:05

delmic: sort datasets by type

5afc7b2

Don't return the datasets based on the order stored in HDF5, but by type. Especially, for the user, CL data is the most interesting, so returning it first. Also add a test case for anchor region (aka drift correction).

delmic: don't count coverage on exception handler

6a80016

Some of the exception handlers should only be reached with malformed data, which we don't have. So let's not count these paths for now.

doc delmic: minor improvements

9bac123

pieleric force-pushed the extend-delmic-format branch from 9c63bc9 to 9bac123 Compare July 23, 2025 08:05

delmic: move survey to be very last dataset

e6e1985

jlaehne approved these changes Jul 24, 2025

View reviewed changes

jlaehne merged commit e84a943 into hyperspy:main Jul 24, 2025
33 of 34 checks passed

jlaehne removed the status: needs review label Jul 24, 2025

jlaehne mentioned this pull request Jul 25, 2025

0.10.0 Release Tracker #429

Closed

		@@ -0,0 +1 @@
		:ref:`Delmic <delmic-format>` format: add support for Delmic HDF5 (cathodoluminescence) acquisition files

New types of datasets supported for Delmic HDF5 format #328

New types of datasets supported for Delmic HDF5 format #328

Uh oh!

Conversation

noemiebonnet commented Oct 28, 2024 • edited by jlaehne Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress of the PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

noemiebonnet commented Oct 29, 2024

Uh oh!

jlaehne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jlaehne commented Nov 5, 2024

Uh oh!

Uh oh!

jlaehne Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jlaehne Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

jlaehne Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jlaehne commented Mar 23, 2025

Uh oh!

aidanc151 commented Jun 23, 2025

Uh oh!

jlaehne commented Jun 23, 2025

Uh oh!

pieleric commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jlaehne Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

jlaehne Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

jlaehne Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

pieleric Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlaehne Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

jlaehne commented Jul 17, 2025

Uh oh!

pieleric commented Jul 18, 2025

Uh oh!

pieleric commented Jul 18, 2025

Uh oh!

ericpre commented Jul 18, 2025

Uh oh!

jlaehne commented Jul 18, 2025

Uh oh!

pieleric commented Jul 23, 2025

noemiebonnet commented Oct 28, 2024 •

edited by jlaehne

Loading

codecov bot commented Oct 28, 2024 •

edited

Loading

pieleric Jul 18, 2025 •

edited

Loading