Skip to content

Clean up SyntaxWarnings at import time #384

@rileymcdowell

Description

@rileymcdowell

Is your feature request related to a problem? Please describe.
Whenever I use pymzML, my terminal and/or log file prints 5 SyntaxWarnings for an invalid escape sequence "\s". Example:

.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:384
  /myapp/.venv/python3.12/site-packages/pymzml/file_classes/standardMzml.py:384: SyntaxWarning: invalid escape sequence '\s'
    chromexp = re.compile(b'<\s*chromatogram[^>]*id="([^"]*)"')

.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:385
  /myapp/.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:385: SyntaxWarning: invalid escape sequence '\s'
    chromcntexp = re.compile(b'<\s*chromatogramList\s*count="([^"]*)"')

.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:386
  /myapp/.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:386: SyntaxWarning: invalid escape sequence '\s'
    specexp = re.compile(b'<\s*spectrum[^>]*id="([^"]*)"')

.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:387
  /myapp/.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:387: SyntaxWarning: invalid escape sequence '\s'
    speccntexp = re.compile(b'<\s*spectrumList\s*count="([^"]*)"')

.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:735
  /myapp/.venv/lib/python3.12/site-packages/pymzml/file_classes/standardMzml.py:735: SyntaxWarning: invalid escape sequence '\s'
    '<\s*spectrum[^>]*index="[0-9]+"\sid="({0})"\sdefaultArrayLength="[0-9]+">'.format(

The problem is that \s is not a valid escape sequence in python (unlike \n (which is substituted with a newline) or \t (which is substituted with a tab). The intention in these regular expressions is to pass a backslash followed by the "s" character as in \s, which is the regular expression code for "whitespace". However, python's escape sequence parser is encountering the \s and looking to substitute it with something before the regex parser can evaluate the string.

In older versions of Python, this was silently ignored, and any invalid escape sequences such as '\s' were passed along as a literal backslash followed by an 's'. However, starting in Python 3.6, this behavior now raises a SyntaxWarning to alert developers that this could become a SyntaxError and break in a future version of Python.

Essentially, the code is relying on an old, deprecated behavior. The warning is telling pymzML to please be more explicit about the intent in the regular expression.

Describe the solution you'd like

To correctly specify these expression without raising a syntax warning, there are two options.

  • Option 1 - Use \\s, meaning a literal backslash followed by "s".
  • Option 2 - Convert these regular expression strings from bytestrings b'...' to raw bytestrings rb'...'. The "raw" in raw bytestrings means the escape sequence substitution engine in python is disabled for that string, which eliminates the ambiguity about what is intended by \s.

Describe alternatives you've considered

It's difficult but possible to silence these warnings within applications relying on pymzML using the python warning module and/or python global warning environment variables (eg PYTHONWARNINGS=ignore::SyntaxWarning). This is what I'm doing now wherever possible.

Additional context

I would be happy to submit a pull request to resolve this issue if the maintenance team is interested in

  1. Indicating your preference for escaped backslashes (\\s) vs converting to raw bytestrings rb'...'.
  2. Release a new version of pymzML with the fix after I submit a PR and you accept it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions