Skip to content

chrome: check for binary in known locations #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

thseiler
Copy link
Contributor

@thseiler thseiler commented Apr 28, 2025

This PR tries to address #28 by:

  • add entrypoint.sh to .gitattributes so that it keeps the line-endings (Windows/Docker)
  • On Windows, chrome.exe is not in $PATH, remove it from the CI environment
  • CI: don't remove $HOME or %PROGRAMW6432% in environment sanitation (lit) as Chrome / Selenium depend on these
  • make mypy happy

@thseiler thseiler force-pushed the bugfix/chrome-detection-on-windows branch 2 times, most recently from 01f4c0b to 0fe6104 Compare April 30, 2025 15:18
Copy link
Collaborator

@stanislaw stanislaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this and improving the Windows experience. It is very appreciated.

I welcome this change in general but we need to be careful with all possible platforms. My comments are not ordered by importance, so please check them all.

The major issue besides the more minor comments is that I don't see the library to work on my macOS just yet - it doesn't find Any browser. I could debug this and possibly suggest a PR to roniemartinez/browsers but need some time to focus on just this separately.

Another important comment is whether we can write a failing test using GitHub CI Windows such that that test would reproduce the development machine error of the issue that you opened?

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.core.os_manager import ChromeType, OperationSystemManager
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general historical note:

Until quite recently, html2print was using the webdriver_manager to not only get the path to Chrome but also actually download the Chrome driver. Over time I encountered some issues with that library and found that it had not been maintained very well (the README says Because of the War in Ukraine the project is on hold😔). The main issue why I dropped the downloading part was because it every time did several HTTP requests to download JSON files instead of just one which seems to be enough. When you run html2print many times in tests, those separate JSON files were adding to the overall processing time, so I thought I'd keep this library for the browser detection part and re-implement the downloading with just 1! JSON file.

Now that you are adding another library just for the paths, I am ready to accept the change. Hopefully this other library will not cause us any other problems in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having in mind that we should be careful with this change, I have tried this library on my default Python, and I am getting this:

Stanislaw@home:html2pdf_python (bugfix/chrome-detection-on-windows)$ python
Python 3.9.20 (main, Oct 21 2024, 20:44:12)
[Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import browsers
>>> print(browsers.get("chrome"))
None
>>> print(browsers.get("Firefox"))
None
>>> print(list(browsers.browsers()))
[]

I am pretty sure I have Chrome, Firefox and Safari on the PC, so something is probably missing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked the implementation and I am not sure if we want to adopt it actually: it does not walk over the known browser paths. Instead it is trying to read the OSX bundle list which may not be the most precise of finding a browser. This explains well a possible issue on my PC where this bundle list is simply not working for some reason (maybe macOS version...)

https://github.com/roniemartinez/browsers/blob/master/browsers/osx.py#L9

Copy link
Contributor Author

@thseiler thseiler May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit concerning. Their Windows and Linux implementation looked clean, but I must admit that I didn't understand what the OSX code was doing. The more I think about it, the more I believe that switching to pybrowser to set the binary_location was fixing a symptom, and not the root cause.

Copy link
Collaborator

@stanislaw stanislaw May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the issue with the webdriver_manager: it has a nice list of browsers but it uses it inside the function for getting a version. This makes the library unusable unless we fork it.

Bildschirmfoto 2025-05-01 um 11 43 45

I see the following options:

  1. I roll up my sleeves and send a patch to pybrowsers to enable path-based lookups for macOS just like the webdriver_manager does it.

  2. We fork a webdriver_manager under strictdoc-project and make html2print always build from GitHub, not from Pip.

  3. We reimplement the browser finding inside html2print and have it do exactly what we need. We could almost steal/borrow the code of the webdriver_manager, just rework its structure because it is somewhat overengineered.

What do you think?

Copy link
Collaborator

@stanislaw stanislaw May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I am de-facto already working around webdriver_manager by doing this custom branch for finding Google Chrome for Testing which the webdriver_manager does not do:

        # Special case: GitHub Actions macOS CI machines have both
        # Google Chrome for Testing and normal Google Chrome installed, and
        # sometimes their versions are of different major version families.
        # The solution is to check if the Google Chrome for Testing is available,
        # and use its version instead of the normal one.
        if platform.system() == "Darwin":
            chrome_path = "/Applications/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"
            try:
                print(  # noqa: T201
                    "html2print: "
                    "checking if there is Google Chrome for Testing instead of "
                    "a normal Chrome available."
                )
...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that @MartyLake reported yesterday another issue where webdriver_manager does not discover Google Chrome on Linux. I am waiting for more details about the custom browser location in that case.

@thseiler thseiler force-pushed the bugfix/chrome-detection-on-windows branch from 0fe6104 to dfefdff Compare May 1, 2025 11:16
Copy link
Collaborator

@stanislaw stanislaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you nailed it with the LIT variable nicely 👍

@stanislaw stanislaw merged commit 4b33f5f into strictdoc-project:main May 1, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants