-
Notifications
You must be signed in to change notification settings - Fork 92
Description
Is your feature request related to a problem? Please describe.
Whenever I use pymzML, my terminal and/or log file prints 5 SyntaxWarnings for an invalid escape sequence "\s". Example:
.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:384
/myapp/.venv/python3.12/site-packages/pymzml/file_classes/standardMzml.py:384: SyntaxWarning: invalid escape sequence '\s'
chromexp = re.compile(b'<\s*chromatogram[^>]*id="([^"]*)"')
.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:385
/myapp/.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:385: SyntaxWarning: invalid escape sequence '\s'
chromcntexp = re.compile(b'<\s*chromatogramList\s*count="([^"]*)"')
.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:386
/myapp/.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:386: SyntaxWarning: invalid escape sequence '\s'
specexp = re.compile(b'<\s*spectrum[^>]*id="([^"]*)"')
.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:387
/myapp/.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:387: SyntaxWarning: invalid escape sequence '\s'
speccntexp = re.compile(b'<\s*spectrumList\s*count="([^"]*)"')
.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:735
/myapp/.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:735: SyntaxWarning: invalid escape sequence '\s'
'<\s*spectrum[^>]*index="[0-9]+"\sid="({0})"\sdefaultArrayLength="[0-9]+">'.format(
The problem is that \s
is not a valid escape sequence in python (unlike \n
(which is substituted with a newline) or \t
(which is substituted with a tab). The intention in these regular expressions is to pass a backslash followed by the "s" character as in \s
, which is the regular expression code for "whitespace". However, python's escape sequence parser is encountering the \s
and looking to substitute it with something before the regex parser can evaluate the string.
In older versions of Python, this was silently ignored, and any invalid escape sequences such as '\s' were passed along as a literal backslash followed by an 's'. However, starting in Python 3.6, this behavior now raises a SyntaxWarning to alert developers that this could become a SyntaxError and break in a future version of Python.
Essentially, the code is relying on an old, deprecated behavior. The warning is telling pymzML to please be more explicit about the intent in the regular expression.
Describe the solution you'd like
To correctly specify these expression without raising a syntax warning, there are two options.
- Option 1 - Use
\\s
, meaning a literal backslash followed by "s". - Option 2 - Convert these regular expression strings from bytestrings
b'...'
to raw bytestringsrb'...'
. The "raw" in raw bytestrings means the escape sequence substitution engine in python is disabled for that string, which eliminates the ambiguity about what is intended by\s
.
Describe alternatives you've considered
It's difficult but possible to silence these warnings within applications relying on pymzML using the python warning module and/or python global warning environment variables (eg PYTHONWARNINGS=ignore::SyntaxWarning
). This is what I'm doing now wherever possible.
Additional context
I would be happy to submit a pull request to resolve this issue if the maintenance team is interested in
- Indicating your preference for escaped backslashes (
\\s
) vs converting to raw bytestringsrb'...'
. - Release a new version of pymzML with the fix after I submit a PR and you accept it.