What's an effective approach for storing discontinuous signal data? #7673

nicholaslivingstone · 2023-03-24T19:32:29Z

nicholaslivingstone
Mar 24, 2023

Context

I've been using Pandas DataFrames for a while, but I've become frustrated with it's lack of support for multi-dimensional datasets. So I'm experimenting with switching to xarray for my use case. For context, I'm working with laser spectroscopy data which presents itself as a signal. A single sample will contain 3 components of the original signal: fundamental, harmonic, and a ratio. These components will have a fixed N length. In addition, there's an N length index of the wavelengths for each data point and some meta data. I've been able to implement xarray nicely for my data so far:

<xarray.DataArray (signal: 3, wavelength: 4000)>
array([[ 0.37265886,  0.34511695,  0.37183573, ...,  0.39760327,
         0.36285142,  0.42337811],
       [33.27452526, 38.74876074, 36.06964534, ..., 28.31278282,
        28.57133764, 28.1024442 ],
       [12.17477097, 13.34164776, 13.34980213, ..., 11.20389523,
        10.29601224, 11.94972044]])
Coordinates:
  * signal      (signal) <U11 'ratio' 'fundamental' 'harmomic'
  * wavelength  (wavelength) float64 7.496 7.496 7.496 ... 7.519 7.519 7.519
Attributes:
    concentration:  1815
    humidity:       20.5
    temp:           20.8
    timestamp:      2022-10-28T18:58:00

Problem

However, sometimes the data will come from multiple lasers spanning different, discontinuous wavelength regions. For example, we might have a laser scanning a sample from 8-8.5µm, and then 9-9.5µm. Here's a screenshot of some data with two lasers in HDF5 format for a better visualization of what I mean:

Each laser will collect the same 3 components as mentioned previously, just over a different wavelength region. How can I best represent this discontinuous signal data with xarray?

Attempts at a solution

The trivial approach would be to just concat the two regions together. Resulting in a 3 x (2 N) dimensional array. The problem with this is that I'm often applying kernel filters on the data that would result in distortion in the regions where the two regions meet, since the filter will think the signal is continuous when it's really not.
The alternative would be to add a dimension for the laser ( M ) and make the array 4 x N x M, and store the wavelength as part of the data. But, I'd prefer to find a solution which allows me to keep 'wavelength' as a coordinate so I can take advantage of slicing and locating specific regions. Using this will lose that. Additionally, storing the wavelength as part of the data doesn't seem like good practice.
Storing them as separate DataArrays. This one doesn't seem like a good use of the multi-dimensionality aspects of xarray and would require additional overhead to keep the objects together.
I've also considered using Dataset but I'd prefer to reserve that for when I'm looking at many samples at once.

Is there another approach I haven't considered, or are the drawbacks I mentioned in my attempts not really drawbacks at all?

Answered by headtr1ck

Mar 26, 2023

You can stick with your first solution of simply concatenating the wavelength dimension. If you add a categorical coordinate, maybe called laser_num = [0, 0, ..., 1, 1, ... 2, 2...] you can apply your filters using groupby.

View full answer

headtr1ck · 2023-03-26T18:49:44Z

headtr1ck
Mar 26, 2023
Maintainer

You can stick with your first solution of simply concatenating the wavelength dimension. If you add a categorical coordinate, maybe called laser_num = [0, 0, ..., 1, 1, ... 2, 2...] you can apply your filters using groupby.

1 reply

nicholaslivingstone Mar 28, 2023
Author

Thanks! I hadn't thought of that, I like this idea since it keeps everything in one DataArray.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What's an effective approach for storing discontinuous signal data? #7673

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

What's an effective approach for storing discontinuous signal data? #7673

Uh oh!

nicholaslivingstone Mar 24, 2023

Context

Problem

Attempts at a solution

Replies: 1 comment · 1 reply

Uh oh!

headtr1ck Mar 26, 2023 Maintainer

Uh oh!

nicholaslivingstone Mar 28, 2023 Author

nicholaslivingstone
Mar 24, 2023

Replies: 1 comment 1 reply

headtr1ck
Mar 26, 2023
Maintainer

nicholaslivingstone Mar 28, 2023
Author