Skip to content

ParseError with single-line files stored as io.BytesIO #370

@william-watson-swri

Description

@william-watson-swri

Version
pymzml: 2.5.10
Python: 3.11.7

Description
I'm receiving mzML files as bytes, wrapping these in io.BytesIO, and then passing that to pymzml.run.Reader:

reader = pymzml.run.Reader(io.BytesIO(mzml_bytes))

This sometimes raises the following exception:

ParseError: no element found: line 1, column 0

Why
Some of the mzML files I'm using do not have line breaks - i.e. they are all on a single line, and the _guess_encoding function breaks these. Looking at the pymzml source, the io.BytesIO objects travel through this line, which in turn calls the culprit, _guess_encoding:

match = regex_patterns.FILE_ENCODING_PATTERN.search(mzml_file.readline())

After the .readline(), there's no data left in the BytesIO if the file has no line breaks, and thus the later XML parsing fails.

Workaround/fix
I'm current inserting a line break at the start of the XML data before passing it to pymzml:

data = re.sub(br'(<\?xml[^>]+>)', br'\1\n', mzml_bytes, count=1)

I believe this could also be fixed by just adding mzml_file.seek(0) after the offending line.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions