Skip to content

[Bug] Reading tdms data chunks works only for certain chunk sizes #337

@Nikolai-Hlubek

Description

@Nikolai-Hlubek

I have a tdms file from which I'm trying to read a channel in chunks.

For certain chunk sizes it works for others not.

data_read_sliced = []

with nptdms.TdmsFile.open(fp) as tdms_file:
    len_data = len(tdms_file['Messdaten'][sensor])

    dt = tdms_file['Messdaten'][sensor].properties['wf_increment']

    len_slice = 2705
    len_slice = 2725
    len_slice = 4000  # Hardcoding 4000 reading works

    for idx in range(int(np.floor(len_data/len_slice))):
        data_slice = tdms_file['Messdaten'][sensor].read_data(offset=len_slice*idx,length=len_slice)
        data_read_sliced.append(data_slice)

    data_read_sliced = np.concatenate(data_read_sliced)
    
    data_read_once = tdms_file['Messdaten'][sensor].read_data()

np.sum(data_read_sliced - data_read_once)

len_slice = 2705

----> 1 np.sum(data_read_sliced - data_read_once)
ValueError: operands could not be broadcast together with shapes (16359840,) (16360000,) 

len_slice = 2725

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[19], line 11
      8 len_slice = 2725
     10 for idx in range(int(np.floor(len_data/len_slice))):
---> 11     data_slice = tdms_file['Messdaten'][sensor].read_data(offset=len_slice*idx,length=len_slice)
     12     data_read_sliced.append(data_slice)
     14 data_read_sliced = np.concatenate(data_read_sliced)

File /opt/pyenvs/DSS04/lib/python3.10/site-packages/nptdms/tdms.py:604, in TdmsChannel.read_data(self, offset, length, scaled)
    591 """ Reads data for this channel from the TDMS file and returns it as a numpy array
    592 
    593 Indexing into the channel with a slice should be preferred over using
   (...)
    601     For DAQmx data a dictionary of scaler id to raw scaler data will be returned.
    602 """
    603 if self._raw_data is None:
--> 604     raw_data = self._read_channel_data(offset, length)
    605 else:
    606     raw_data = slice_raw_data(self._raw_data, offset, length)

File /opt/pyenvs/DSS04/lib/python3.10/site-packages/nptdms/tdms.py:810, in TdmsChannel._read_channel_data(self, offset, length)
    808 for chunk in self._reader.read_raw_data_for_channel(self.path, offset, length):
    809     if chunk.data is not None:
--> 810         channel_data.append_data(chunk.data)
    811     if chunk.scaler_data is not None:
    812         for scaler_id, scaler_data in chunk.scaler_data.items():

File /opt/pyenvs/DSS04/lib/python3.10/site-packages/nptdms/channel_data.py:92, in NumpyDataReceiver.append_data(self, new_data)
     90 start_pos = self._data_insert_position
     91 end_pos = self._data_insert_position + len(new_data)
---> 92 self.data[start_pos:end_pos] = new_data
     93 self._data_insert_position += len(new_data)

ValueError: could not broadcast input array from shape (200,) into shape (0,)

len_slice = 4000

Works and gives a sum of 0.

Reading the data in one go always works.

For len_slice = 2725 (the once I actually want) the shown error is that new_data should be appended to self.data at 2725:2925 but self.data has only 2725 elements. In the reader.py somehow num_chunk is 2 for the last chunk of 200 and hence is tried to read twice. Also the end_segment is too large and hence the trimming code for the segment doesn't trigger.
So far for my debug attempts. I tried changing some things but it didn't get better as I don't know anything about the internals of the tdms format.

I could provide you the file in question if required but it is >2GB so I can't just upload it here.

I tried with nptdms 1.7.1 and 1.9.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions