Skip to content

feat: Support Windows PE metadata parsing for Python binaries #5159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

alex-cheng-techman
Copy link

This PR improves the Python checker to extract version info directly from PE metadata (ProductVersion, FileVersion) using pefile. This helps resolve fallback version detection issues on Windows binaries such as python*.dll and .exe files.

@terriko terriko changed the title Support Windows PE metadata parsing for Python binaries feat: Support Windows PE metadata parsing for Python binaries Jun 25, 2025
Copy link
Contributor

@terriko terriko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks promising! I've got comments below. In the meantime, I'm going to approve the CI workflows to run so you can get whatever you need out of the linters, but the main tests will fail right now due to our ongoing cache/data issue (github can't talk to our mirror and I hit another set of bugs when I tried to implement a workaround).

@@ -18,6 +18,9 @@ class OpensslChecker(Checker):
CONTAINS_PATTERNS = [r"part of OpenSSL", r"openssl.cnf", r"-DOPENSSL_"]
FILENAME_PATTERNS = [r"libssl.so.", r"libcrypto.so"]
VERSION_PATTERNS = [
# for general format: OpenSSL 1.0.2u�BOpenSSL 3.0.0�BOpenSSL 1.1.1k
r"OpenSSL\s+([0-9]+\.[0-9]+\.[0-9]+[a-z]*)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this will cause false positives (lots of things mention specific versions of openssl in error messages and stuff) so you'll probably need to add whatever string is before or after in the windows binaries to disambiguate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @ffontaine who's spent a lot of time tightening our signatures: any recommendations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I confirm that this pattern will raise false positives for example with OpenSSL 1.1.1 or newer which is embedded in most binaries linking with openssl (see commit 0f7a834).

By the way, I'm a bit unconfortable with the current approach.
I see that extract_version_from_pe is constructing a fake line from 3 PE fields: ProductName ProductVersion CompanyName
Then this fake line is passed to binary checkers which are updated to handle this fake line.
This is definitely "hacking" the binary checker logic.

IMHO, making a language parser to extract PE information could be a better approach.
In this PE language parser, specific PE handling could be done such as adding a regex to extract ([0-9]+\.[0-9]+\.[0-9]+) from ProductVersion.
The user could also easily disable this new handling if they are not happy with the results.

An other option would be to just update the python and openssl patterns so that they work with your windows file.

If you can provide the pe windows file that fails to be detected, I could probably help you.

@@ -19,6 +19,9 @@ class PythonChecker(Checker):
]
FILENAME_PATTERNS = [r"python"]
VERSION_PATTERNS = [
# to match the data from PE file
r"[Pp]ython ([0-9]+\.[0-9]+\.[0-9]+)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one may also cause false positives.

@alex-cheng-techman
Copy link
Author

sample.zip
_hashlib.pyd: Python 3.10.11, OpenSSL 3.0.0
NxLib64.dll: OpenSSL 1.0.2u
both of these two files are identified by the other commercial application.
Actually, I am a user of cve-bin-tool only, not familar with python even github.
It will be appreciated if there is any other PR can solve this issue.

ffontaine added a commit to ffontaine/cve-bin-tool that referenced this pull request Jun 26, 2025
Add a Windows-specific pattern to detect openssl in NxLib64.dll

See intel#5159

Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
@ffontaine
Copy link
Contributor

Thanks for your samples, this was really helpful.
I used Dependency Walker and Linux strings to better understand what was inside those files.

Here is my understanding:

  • _hashlib.pyd depends on libcrypto-1_1.dll. It contains the following string For OpenSSL 3.0.0 and newer it returns the state of the default provider. So returning that _haslib.pyd is openssl 3.0.0 is definitely a false positive. It only links to libcrypto-1_1.dll which is probably openssl 1.1.x given the name of this file.
  • _hashlib.pyd also depends on python3.10.dll. This Python Dynamic Module has probably been compiled with python 3.10.11. However, this doesn't mean this Python module is python 3.10.11 and is affected by all python CVEs. The "real" python interpreter is provided by python3.10.dll which could be 3.10.11 but also 3.10.12 or python 3.12, etc.
  • NxLib64.dll is different, it really seems to embed openssl 1.0.2u. I sent PR fix: add openssl Windows-specific pattern #5174 to add a Windows-specific pattern. I expect that this pattern will not affect Linux binaries.

To conclude, from my personal experience, commercial applications shall not be trusted blindly. Sometimes cve-bin-tool is better. However, getting this kind of feedback is really helpful so do not hesitate to send more.

By the way, can you share what are the commercial applications that detected openssl and python inside those binaries (Synopsys Black Duck Binary Analysis, other)?

@alex-cheng-techman
Copy link
Author

_hashlib.pyd is downloaded from https://www.python.org/ftp/python/3.10.11/python-3.10.11-embed-amd64.zip
and it is recognized as 3.10.11 by windows too. Suppose there is no argument on version info.
1

@ffontaine
Copy link
Contributor

Thanks for this new input.

If I'm running cve-bin-tool on binaries extracted from python-3.10.11-embed-amd64.zip, python 3.10.11 is correctly detected from python310.dll and openssl 1.1.1t is correctly detected from libcrypto-1_1.dll:

windows-cve-bin-tool

That's the expected behavior, hacking the binary checker logic to detect python 3.10.11 in _hashlib.pyd seems wrong (and unneeded as this information is already extracted from python310.dll). I'll let @terriko give additonal feedback.

@alex-cheng-techman
Copy link
Author

actually, there are lot of python 3.10.11 files on zip. And python310.dll are python.exe most identifiable.
result.txt

ffontaine added a commit to ffontaine/cve-bin-tool that referenced this pull request Jun 27, 2025
Add a Windows-specific pattern to detect openssl in NxLib64.dll

See intel#5159

Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
update python.py checker, improve VERSION_PAEETERNS format for PE info.
fix balck format issue.
update extract_version_from_pe to output valid format info.
ffontaine added a commit to ffontaine/cve-bin-tool that referenced this pull request Jun 27, 2025
Built from intel#5159, this alternative has the following advantages:
 - it doesn't create a fake "line" that is passed to binary checker
 - python binary checker doesn't have to be updated
 - it allows the end-user to disable this new behavior through "-s pe"
 - any specific PE handling such as setting the product to lower case or
   converting "Python Software Foundation" to "python" is done in pe.py

For now, pe parser runs only on pyd files but this could be updated.

Credits shall be given to @alex-cheng-techman which created most of the
original code

Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
ffontaine added a commit to ffontaine/cve-bin-tool that referenced this pull request Jun 27, 2025
Built from intel#5159, this alternative has the following advantages:
 - it doesn't create a fake "line" that is passed to binary checker
 - python binary checker doesn't have to be updated
 - it allows the end-user to disable this new behavior through "-s pe"
 - any specific PE handling such as setting the product to lower case or
   converting "Python Software Foundation" to "python" is done in pe.py

For now, pe parser runs only on pyd files but this could be updated.

Credits shall be given to @alex-cheng-techman which created most of the
original code

Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
ffontaine added a commit to ffontaine/cve-bin-tool that referenced this pull request Jun 27, 2025
Built from intel#5159, this alternative has the following advantages:
 - it doesn't create a fake "line" that is passed to binary checker
 - python binary checker doesn't have to be updated
 - it allows the end-user to disable this new behavior through "-s pe"
 - any specific PE handling such as setting the product to lower case or
   converting "Python Software Foundation" to "python" is done in pe.py

For now, pe parser runs only on pyd files but this could be updated.

Credits shall be given to @alex-cheng-techman which created most of the
original code

Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
@ffontaine
Copy link
Contributor

ffontaine commented Jun 27, 2025

I created an alternative PR that moves your code inside a dedicated Windows PE parser: #5180. I think this approach is better, tell me what you think about it.

ffontaine added a commit to ffontaine/cve-bin-tool that referenced this pull request Jun 27, 2025
Built from intel#5159, this alternative has the following advantages:
 - it doesn't create a fake "line" that is passed to binary checker
 - python binary checker doesn't have to be updated
 - it allows the end-user to disable this new behavior through "-s pe"
 - any specific PE handling such as setting the product to lower case or
   converting "Python Software Foundation" to "python" is done in pe.py

For now, pe parser runs only on pyd files but this could be updated.

Credits shall be given to @alex-cheng-techman which created most of the
original code

Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
ffontaine added a commit to ffontaine/cve-bin-tool that referenced this pull request Jul 15, 2025
Built from intel#5159, this alternative has the following advantages:
 - it doesn't create a fake "line" that is passed to binary checker
 - python binary checker doesn't have to be updated
 - it allows the end-user to disable this new behavior through "-s pe"
 - any specific PE handling such as setting the product to lower case or
   converting "Python Software Foundation" to "python" is done in pe.py

For now, pe parser runs only on pyd files but this could be updated.

Credits shall be given to @alex-cheng-techman which created most of the
original code

Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
ffontaine added a commit to ffontaine/cve-bin-tool that referenced this pull request Jul 15, 2025
Add a Windows-specific pattern to detect openssl in NxLib64.dll

See intel#5159

Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
ffontaine added a commit that referenced this pull request Jul 15, 2025
Add a Windows-specific pattern to detect openssl in NxLib64.dll

See #5159

Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants