-
Notifications
You must be signed in to change notification settings - Fork 92
Description
Version
pymzml: 2.5.10
Python: 3.11.7
Description
I'm receiving mzML files as bytes
, wrapping these in io.BytesIO
, and then passing that to pymzml.run.Reader
:
reader = pymzml.run.Reader(io.BytesIO(mzml_bytes))
This sometimes raises the following exception:
ParseError: no element found: line 1, column 0
Why
Some of the mzML files I'm using do not have line breaks - i.e. they are all on a single line, and the _guess_encoding
function breaks these. Looking at the pymzml
source, the io.BytesIO
objects travel through this line, which in turn calls the culprit, _guess_encoding:
match = regex_patterns.FILE_ENCODING_PATTERN.search(mzml_file.readline())
After the .readline()
, there's no data left in the BytesIO
if the file has no line breaks, and thus the later XML parsing fails.
Workaround/fix
I'm current inserting a line break at the start of the XML data before passing it to pymzml
:
data = re.sub(br'(<\?xml[^>]+>)', br'\1\n', mzml_bytes, count=1)
I believe this could also be fixed by just adding mzml_file.seek(0)
after the offending line.