Skip to content

Slow performance when adding gappy data to ASDF #57

@chad-earthscope

Description

@chad-earthscope

Reading gappy data incurs a significant performance penalty compared to non-gappy data.

Attached is a read timing test and test data. The test script times the read if specified miniSEED using, for reference, obspy.read() followed by pyasdf's add_waveforms(). The test data are a day of both gappy (2200+ gaps) and non-gapped time series.

On my machine:

$ ./read-timing-test.py -o output.h5 clean-day.mseed gappy-day.mseed 
Opening output ASDF volume: output.h5
Processing clean-day.mseed
ObsPy read(): 0.05147713300000012 seconds
ASDF add_waveforms(): 0.1556968969999999 seconds
Processing gappy-day.mseed
ObsPy read(): 0.49582375 seconds
ASDF add_waveforms(): 7.62076154 seconds

The add_waveforms() method, at 7.6 seconds, is more than an order of magnitude slower than an obspy.read() of the same data at 0.49 seconds.

Obviously it would be nice if this were faster. As ASDF gains popularity it will be used with a likewise-broadening set of input data.

read-timing-test.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions