-
Notifications
You must be signed in to change notification settings - Fork 550
feat: Support Windows PE metadata parsing for Python binaries #5159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support Windows PE metadata parsing for Python binaries #5159
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks promising! I've got comments below. In the meantime, I'm going to approve the CI workflows to run so you can get whatever you need out of the linters, but the main tests will fail right now due to our ongoing cache/data issue (github can't talk to our mirror and I hit another set of bugs when I tried to implement a workaround).
cve_bin_tool/checkers/openssl.py
Outdated
@@ -18,6 +18,9 @@ class OpensslChecker(Checker): | |||
CONTAINS_PATTERNS = [r"part of OpenSSL", r"openssl.cnf", r"-DOPENSSL_"] | |||
FILENAME_PATTERNS = [r"libssl.so.", r"libcrypto.so"] | |||
VERSION_PATTERNS = [ | |||
# for general format: OpenSSL 1.0.2u�BOpenSSL 3.0.0�BOpenSSL 1.1.1k | |||
r"OpenSSL\s+([0-9]+\.[0-9]+\.[0-9]+[a-z]*)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this will cause false positives (lots of things mention specific versions of openssl in error messages and stuff) so you'll probably need to add whatever string is before or after in the windows binaries to disambiguate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @ffontaine who's spent a lot of time tightening our signatures: any recommendations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I confirm that this pattern will raise false positives for example with OpenSSL 1.1.1 or newer
which is embedded in most binaries linking with openssl (see commit 0f7a834).
By the way, I'm a bit unconfortable with the current approach.
I see that extract_version_from_pe
is constructing a fake line from 3 PE fields: ProductName ProductVersion CompanyName
Then this fake line is passed to binary checkers which are updated to handle this fake line.
This is definitely "hacking" the binary checker logic.
IMHO, making a language parser to extract PE information could be a better approach.
In this PE language parser, specific PE handling could be done such as adding a regex to extract ([0-9]+\.[0-9]+\.[0-9]+)
from ProductVersion.
The user could also easily disable this new handling if they are not happy with the results.
An other option would be to just update the python and openssl patterns so that they work with your windows file.
If you can provide the pe windows file that fails to be detected, I could probably help you.
cve_bin_tool/checkers/python.py
Outdated
@@ -19,6 +19,9 @@ class PythonChecker(Checker): | |||
] | |||
FILENAME_PATTERNS = [r"python"] | |||
VERSION_PATTERNS = [ | |||
# to match the data from PE file | |||
r"[Pp]ython ([0-9]+\.[0-9]+\.[0-9]+)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one may also cause false positives.
sample.zip |
Add a Windows-specific pattern to detect openssl in NxLib64.dll See intel#5159 Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
Thanks for your samples, this was really helpful. Here is my understanding:
To conclude, from my personal experience, commercial applications shall not be trusted blindly. Sometimes cve-bin-tool is better. However, getting this kind of feedback is really helpful so do not hesitate to send more. By the way, can you share what are the commercial applications that detected openssl and python inside those binaries (Synopsys Black Duck Binary Analysis, other)? |
_hashlib.pyd is downloaded from https://www.python.org/ftp/python/3.10.11/python-3.10.11-embed-amd64.zip |
Thanks for this new input. If I'm running cve-bin-tool on binaries extracted from python-3.10.11-embed-amd64.zip, That's the expected behavior, hacking the binary checker logic to detect |
actually, there are lot of python 3.10.11 files on zip. And python310.dll are python.exe most identifiable. |
Add a Windows-specific pattern to detect openssl in NxLib64.dll See intel#5159 Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
update python.py checker, improve VERSION_PAEETERNS format for PE info. fix balck format issue. update extract_version_from_pe to output valid format info.
Built from intel#5159, this alternative has the following advantages: - it doesn't create a fake "line" that is passed to binary checker - python binary checker doesn't have to be updated - it allows the end-user to disable this new behavior through "-s pe" - any specific PE handling such as setting the product to lower case or converting "Python Software Foundation" to "python" is done in pe.py For now, pe parser runs only on pyd files but this could be updated. Credits shall be given to @alex-cheng-techman which created most of the original code Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
Built from intel#5159, this alternative has the following advantages: - it doesn't create a fake "line" that is passed to binary checker - python binary checker doesn't have to be updated - it allows the end-user to disable this new behavior through "-s pe" - any specific PE handling such as setting the product to lower case or converting "Python Software Foundation" to "python" is done in pe.py For now, pe parser runs only on pyd files but this could be updated. Credits shall be given to @alex-cheng-techman which created most of the original code Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
Built from intel#5159, this alternative has the following advantages: - it doesn't create a fake "line" that is passed to binary checker - python binary checker doesn't have to be updated - it allows the end-user to disable this new behavior through "-s pe" - any specific PE handling such as setting the product to lower case or converting "Python Software Foundation" to "python" is done in pe.py For now, pe parser runs only on pyd files but this could be updated. Credits shall be given to @alex-cheng-techman which created most of the original code Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
I created an alternative PR that moves your code inside a dedicated Windows PE parser: #5180. I think this approach is better, tell me what you think about it. |
Built from intel#5159, this alternative has the following advantages: - it doesn't create a fake "line" that is passed to binary checker - python binary checker doesn't have to be updated - it allows the end-user to disable this new behavior through "-s pe" - any specific PE handling such as setting the product to lower case or converting "Python Software Foundation" to "python" is done in pe.py For now, pe parser runs only on pyd files but this could be updated. Credits shall be given to @alex-cheng-techman which created most of the original code Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
Built from intel#5159, this alternative has the following advantages: - it doesn't create a fake "line" that is passed to binary checker - python binary checker doesn't have to be updated - it allows the end-user to disable this new behavior through "-s pe" - any specific PE handling such as setting the product to lower case or converting "Python Software Foundation" to "python" is done in pe.py For now, pe parser runs only on pyd files but this could be updated. Credits shall be given to @alex-cheng-techman which created most of the original code Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
Add a Windows-specific pattern to detect openssl in NxLib64.dll See intel#5159 Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
Add a Windows-specific pattern to detect openssl in NxLib64.dll See #5159 Signed-off-by: Fabrice Fontaine <fabrice.fontaine@orange.com>
This PR improves the Python checker to extract version info directly from PE metadata (ProductVersion, FileVersion) using pefile. This helps resolve fallback version detection issues on Windows binaries such as python*.dll and .exe files.