Skip to content

Conversation

@noemiebonnet
Copy link
Contributor

@noemiebonnet noemiebonnet commented Oct 28, 2024

New supported formats: intensity, hyperspectral, angle-resolved, E-k, time-resolved.

Progress of the PR

  • Read intensity,
  • Read hyperspectral,
  • Read angle-resolved,
  • Read E-k,
  • Read time-resolved,
  • Read original_metadata,
  • Read EELS,
  • Write EELS,
  • update docstring (if appropriate),
  • update user guide (if appropriate),
  • add a changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
  • Check formatting of the changelog entry (and eventual user guide changes) in the docs/readthedocs.org:rosettasciio build of this PR (link in github checks)
  • add tests,
  • increase test coverage for errors,
  • ready for review.

@codecov
Copy link

codecov bot commented Oct 28, 2024

Codecov Report

❌ Patch coverage is 91.42857% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.08%. Comparing base (6ea01bd) to head (e6e1985).
⚠️ Report is 73 commits behind head on main.

Files with missing lines Patch % Lines
rsciio/delmic/_api.py 91.42% 15 Missing and 9 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #328      +/-   ##
==========================================
+ Coverage   88.02%   88.08%   +0.06%     
==========================================
  Files          91       91              
  Lines       11538    11798     +260     
  Branches     2131     2186      +55     
==========================================
+ Hits        10156    10392     +236     
- Misses        875      890      +15     
- Partials      507      516       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@noemiebonnet
Copy link
Contributor Author

pre-commit.ci autofix

Copy link
Member

@jlaehne jlaehne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @noemiebonnet for the substantial progress! I left a few comments from a first browsing over the code, but did not yet have time for a closer look. Please include the standard tracking list in the initial comment to get an overview of how far the PR is.

There are still a number of lines uncovered by tests, as commented by codecov. If lines should explicitly be ignored in the coverage test, one can add the comment # pragma: no cover at the end of the line. I see that most uncovered lines concern warnings or errors that might surface during file reading. It might be hard to test for those if one has only proper test files. What is our current best-practice in that case @ericpre? (Note that you can also get inspiration on such cases from other files readers recently implemented, such as Horiba JobinYvon or Hamamatsu).

@jlaehne
Copy link
Member

jlaehne commented Nov 5, 2024

A single spectrum is still loaded as <Signal1D, title: , dimensions: (1, 1|2048)>, but should be <Signal1D, title: , dimensions: (|2048)> (drop the navigation dimensions).

the associated image type.
"""
if Acq is None:
raise TypeError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, all the various loading errors are giving codecov warnings, because they are not included in the tests. In hdf5, it should be rather straight forward to create some minimal test files for each case where certain elements are deleted from the file to test each of these cases. An idea could also be to use hdf5 files from some other rsciio plugins that do not match the delmic specifications to test some of the errors.

or Image.shape[3] < 1
or Image.shape[4] < 1
):
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example for testing errors, here you could use the file from a different type of data and change the img_type to get a case that triggers this error.

Scale = np.array(ImgData.get(scale_key))

if axis_name in ["C", "T"]:
scale_value = np.mean(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other readers have a parameter to choose whether to read such axes as a UniformDataAxis by calculating the mean scale, or as a non-uniform DataAxis by just taking exactly the vector that is in the file.

see e.g.:

use_uniform_signal_axis : bool, default=False

@jlaehne jlaehne mentioned this pull request Dec 12, 2024
@ericpre ericpre modified the milestones: v0.7, v0.8 Dec 13, 2024
@ericpre ericpre removed this from the v0.8 milestone Mar 22, 2025
@jlaehne
Copy link
Member

jlaehne commented Mar 23, 2025

@noemiebonnet is it possible to update the progress you have made on your local branch? Might be enough to include the current PR in the release planned for the next days: #381

@jlaehne jlaehne added this to the v0.10 milestone May 28, 2025
@noemiebonnet noemiebonnet force-pushed the extend-delmic-format branch from 3ab696c to 853f36c Compare June 20, 2025 08:58
@aidanc151
Copy link

I'm running into an error message after converting a .h5 into a .hspy, and then trying to load the .hspy:

TypeError: issubclass() arg 1 must be a class

To reproduce:

  1. Load a .h5 hyperspectral map
  2. Use the .save wrapper and save into .hspy format
  3. Load the new .hspy file

It looks like it might be related to the original metadata - as if the initial loading of .h5 is done without the original metadata, the error is avoided, e.g.:

hs.load('hyperspectral.h5', reader = 'delmic', load_original_metadata=False)

@jlaehne
Copy link
Member

jlaehne commented Jun 23, 2025

It looks like it might be related to the original metadata

Indeed, the parsing of original_metadata is still wrong. I inspected the resulting dictionary, e.g.:

from rsciio.delmic import file_reader
d1 = file_reader('delmic_file.h5')
d1.[0]['original_metadata']

and it results in:

{'BackprojectedIlluminationPinholeRadius': array([2.8e-07]),
...

All values are read in as array, no matter of their content - none of them actually seem to contain more than one array element. In particular, nested dictionaries such as for the 'ExtraSettings' are not correctly parsed by HyperSpy if the dictionary is embedded in an array.

@pieleric pieleric force-pushed the extend-delmic-format branch from 853f36c to cd5b958 Compare July 3, 2025 18:47
@pieleric
Copy link
Contributor

pieleric commented Jul 3, 2025

I will pick up this PR from now on. I went through all the code and the comments of this pull-request and attempted to address all the issues raised. Here is a summary of the (major) update:

  • Major overhaul of the code structure, to be more generic, and handle all (corner) cases.
  • Adjust the behaviour of "signal" argument: it now defaults to passing all the data, and all the possible options are lower-case only.
  • The metadata now also include acquisition time and dataset title.
  • The original_metadata now also include the SVIData and acquisition time.
  • Ensure that the original_metadata dict only contains basic Python types (ie, no numpy array or JSON-encoded strings)
  • Fix offset value in the X & Y which were incorrectly computed.
  • Fix incorrect direction of the X axes in 4D angular-resolved datasets.
  • If LumiSpy is installed, use the corresponding Signal type for dataset matching CL_SEM, LumiTransient, and LumiTransientSpectrum types.
  • Support angular-resolved data with multiple polarizations.
  • Remove some test cases which were only testing the same as another one.
  • Add test case attempting to load incorrect HDF5 file.

I now see that some CI tests are failing (though all the test cases did pass on my computer). I'll dig into these issues tomorrow.

Any feedback is already well appreciated!

@@ -0,0 +1 @@
:ref:`Delmic <delmic-format>` format: add support for Delmic HDF5 (cathodoluminescence) acquisition files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added basic support in the 0.7 release:
https://rosettasciio.readthedocs.io/en/latest/changes.html#id42

So the changelog should make clear that more complete support for different types of acquistions is now added.

of three datasets. It is possible to load each of them separately.

.. Note::
To load the various types of datasets in the file, use the ``signal`` argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were discussing what is actually the best default here - reading only CL to return a single signal item as in most other readers or the list with all items, but I guess the main point would be that it is well documented.

Also, I am not certain having the survey first is the best order of the signals in the list. I assume the order is motivated by the odemis files? Generally, I would find it more intuitive to have the actual CL signal first, then concurrent and last survey (making the survey always last item, even if there are multiple streams in one file).

Finally, I would say there is not really a point in reading in the concurrent se when it is dimensionless (for spectra, single transients or streak images).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular having the anchor region data from drift correction included in the default loading seems a bit overload - so maybe default to cl and have an option all to include the other datasets?

Copy link
Contributor

@pieleric pieleric Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I've now changed it to return only the "cl" datasets by default. Also changed from None to "all" to get all the datasets.

Concerning the datasets with a single point, although it's probably not useful, I'd rather always return them when that type of dataset (or "all") is requested. This keeps the code simple and makes sure that if for some reason the user actually do care about this single pixel, they can still read it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough for the datasets with a single point. Only, I would still tend to reverse the order of the different signals in the list for "all" to have the 'cl' first (as it is the main one and now also the one that is read by default) and then add the additional signals with 'survey' last.

@jlaehne
Copy link
Member

jlaehne commented Jul 17, 2025

Thanks Eric for picking up on this. In general it looks good, and some first tests with real data did not surface any problems.

We are actually interested in getting this released ASAP. So I would try to resolve anything that needs to be decided in terms of API (see some comments on the signal parameter) and as soon as tests run through merge the PR.

In the mid-term, additional features such as metadata parsing and features such as reading in scan areas/points on the survey as markers could be added via separate PRs. Coverage is not optimal, but missing lines mainly seem to concern errors at could maybe also be addressed later.

@pieleric
Copy link
Contributor

Thanks Eric for picking up on this. In general it looks good, and some first tests with real data did not surface any problems.

We are actually interested in getting this released ASAP. So I would try to resolve anything that needs to be decided in terms of API (see some comments on the signal parameter) and as soon as tests run through merge the PR.

In the mid-term, additional features such as metadata parsing and features such as reading in scan areas/points on the survey as markers could be added via separate PRs. Coverage is not optimal, but missing lines mainly seem to concern errors at could maybe also be addressed later.

Thanks Jonas for the review. I've tried to correct the code for all your comments. I agree with the way-forward. Let's try to get this big change merged in, and then we can more easily bring extra features later on in "small bites".

Let me know if you want me to squash some of the commits together before merging the PR.

@pieleric
Copy link
Contributor

I've tried hard to get the packaging test to pass, but I'm now at loss at how to handle the current errors. I assume it's related to the way the test data is stored... but no idea what should be changed to fix it, or even if that's a real error. Do you have any hint on how to solve such error?

ERROR tests/test_import.py::test_import_version - requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://github.com/noemiebonnet/rosettasciio/raw/extend-delmic-format/rsciio/tests/data/zspy/signal1d_10x10-DirectoryStore.zspy/.zattrs

@ericpre
Copy link
Member

ericpre commented Jul 18, 2025

I don't remember the exact reason, but this is related to a recently merged PR #417 that add these test files and these are not included in this branch. A rebase should fix it.

@jlaehne
Copy link
Member

jlaehne commented Jul 18, 2025

Let me know if you want me to squash some of the commits together before merging the PR.

Maybe squash the last few commits with only minor corrections into one.

noemiebonnet and others added 12 commits July 22, 2025 11:18
Major overhaul of the code structure, to be more generic, and handle all (corner) cases.

Adjust the behaviour of "signal" argument: it now defaults to passing
all the data, and all the possible options are lower-case only.

The metadata now also include acquisition time and dataset title.

The original_metadata now also include the SVIData and acquisition time.

Ensure that the original_metadata dict only contains basic Python types
(ie, no numpy array or JSON-encoded strings)

Fix offset value in the X & Y which were incorrectly computed.

Fix incorrect direction of the X axes in 4D angular-resolved datasets.

If LumiSpy is installed, use the corresponding Signal type for dataset
matching CL_SEM, LumiTransient, and LumiTransientSpectrum types.

Support angular-resolved data with multiple polarizations.

Remove some test cases which were only testing the same as another one.

Add test case attempting to load incorrect HDF5 file.
ruff check
registry
handle lumispy not present
By default, only return cathodoluminescence datasets. Also adjust the
value to pass to get all the datasets from None to "all".
@pieleric pieleric force-pushed the extend-delmic-format branch from c1449f7 to 9c63bc9 Compare July 23, 2025 07:52
pieleric added 3 commits July 23, 2025 10:05
Don't return the datasets based on the order stored in HDF5, but by
type. Especially, for the user, CL data is the most interesting, so
returning it first.

Also add a test case for anchor region (aka drift correction).
Some of the exception handlers should only be reached with malformed
data, which we don't have. So let's not count these paths for now.
@pieleric pieleric force-pushed the extend-delmic-format branch from 9c63bc9 to 9bac123 Compare July 23, 2025 08:05
@pieleric
Copy link
Contributor

I've updated the pull-request with these changes:

  • sort the returned data to be in the order CL, SE, survey, anchor
  • fix the documentation according to the comments by Jonas
  • rebased on the latest main (to avoid the test failures hopefully)
  • Simplified the test cases, so that each call to the loader is done only once
  • improved coverage by testing a few more cases (data with anchor region, incorrect arguments) and adding some pragma's around the most unlikely exception handlers.

Talking of coverage, I don't have the tools to measure it locally, so it's a bit hard to check how well it will go. Also it seems the coverage is considered a "fail" if it's below the project average, which is currently at 87%. Please be aware that mathematically, if every merge is above the average, this average will increase, which will make the bar to entry higher and higher for the next pull-requests, which will make the work of submitters harder and harder...

@jlaehne
Copy link
Member

jlaehne commented Jul 23, 2025

Thanks Éric :-)

* sort the returned data to be in the order CL, SE, survey, anchor

My thought was to have survey last so that it can always be accessed as the [-1] element of the list (and anchor is not always there).

@jlaehne
Copy link
Member

jlaehne commented Jul 23, 2025

Talking of coverage, I don't have the tools to measure it locally, so it's a bit hard to check how well it will go. Also it seems the coverage is considered a "fail" if it's below the project average, which is currently at 87%. Please be aware that mathematically, if every merge is above the average, this average will increase, which will make the bar to entry higher and higher for the next pull-requests, which will make the work of submitters harder and harder...

You can see the current report and a markup of missing lines at Codecov, but you have reached the threshold now:
https://app.codecov.io/gh/hyperspy/rosettasciio/pull/328?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=checks&utm_campaign=pr+comments&utm_term=hyperspy

The aim should be close to 100% coverage for the relevant parts of new contributions (and most of the rest can usually be ignored by pragmas). It is well below 100% mainly because some of the oldest readers started out without any tests and were never brought close to this value, but indeed we have improved over the years. In the end, the check is mainly to have a guideline where we are heading and it is at the discretion of the maintainers to merge PRs even with a below average coverage. From experience, it is good to directly push for good coverage instead of postponing that to separate PRs.

@ericpre
Copy link
Member

ericpre commented Jul 23, 2025

There will be an expected failure on the build (the one labelled "hyperspy_dev") with the development branch of hyperspy. This can be ignored in this PR.

For the coverage, if it doesn't pass the currently set criteria, it is not a big deal, usually what we do is to check online (using the link in the "github checks") where the code is missed and correct where needed.

@jlaehne
Copy link
Member

jlaehne commented Jul 23, 2025

rebased on the latest main (to avoid the test failures hopefully)

Seems to have solved all but one hspy-related test failure. @ericpre ?

@ericpre
Copy link
Member

ericpre commented Jul 23, 2025

Yes, these are the one that are expected and sorted in #425 and hyperspy/hyperspy#3528.

@pieleric
Copy link
Contributor

My thought was to have survey last so that it can always be accessed as the [-1] element of the list (and anchor is not always there).

Ok. Changed the order to CL, SE, anchor, survey.

@jlaehne
Copy link
Member

jlaehne commented Jul 24, 2025

Thanks @pieleric - that is a great step forward.

TODO:

@jlaehne jlaehne merged commit e84a943 into hyperspy:main Jul 24, 2025
33 of 34 checks passed
@jlaehne jlaehne mentioned this pull request Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants