Skip to content

Add support for autoclick #2313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 16, 2025
Merged

Add support for autoclick #2313

merged 7 commits into from
Jan 16, 2025

Conversation

tw4l
Copy link
Member

@tw4l tw4l commented Jan 15, 2025

Fixes #2259

This PR brings backend and frontend support for the new autoclick behavior in Browsertrix.

On the backend, we introduce min_autoclick_crawler_image to values.yaml, with a default value of "docker.io/webrecorder/browsertrix-crawler:1.5.0". If this is set and the crawler version for a new crawl is less than this value, the autoclick behavior is removed from the behaviors list in the configmap created for the crawl.

The one caveat for this is that a crawler image tag like "latest" will always be parsed as greater than min_autoclick_crawler_image, so there is the potential for the crawler to run into issues if using a non-numeric image tag with an older version of the crawler. For production we use hardcoded specific versions of the crawler except for the dev channel, which from here on out will including autoclick support, so I think this should be okay (and is also true of the existing implementation for checking min_qa_crawler_image).

On the frontend, I've added a checkbox (unchecked by default) in the "Limits" section just below the current checkbox for autoscroll. We might want to move these to a different section eventually - I'm not sure Limits is the right place for them - but I wanted to be consistent with things as they are.

@tw4l tw4l requested review from ikreymer and SuaYoo January 15, 2025 18:17
@ikreymer ikreymer merged commit 5684e89 into main Jan 16, 2025
27 checks passed
@ikreymer ikreymer deleted the issue-2259-autoclick branch January 16, 2025 20:44
ikreymer added a commit that referenced this pull request Jan 29, 2025
Fixes #2259 

This PR brings backend and frontend support for the new autoclick
behavior in Browsertrix, introduces in Browsertrix 1.5.0+

On the backend, we introduce `min_autoclick_crawler_image` to
`values.yaml`, with a default value of
`"docker.io/webrecorder/browsertrix-crawler:1.5.0"`. If this is set and
the crawler version for a new crawl is less than this value, the
autoclick behavior is removed from the behaviors list in the configmap
created for the crawl.

The one caveat for this is that a crawler image tag like "latest" will
always be parsed as greater than `min_autoclick_crawler_image`, so there
is the potential for the crawler to run into issues if using a
non-numeric image tag with an older version of the crawler. For
production we use hardcoded specific versions of the crawler except for
the dev channel, which from here on out will including autoclick
support, so I think this should be okay (and is also true of the
existing implementation for checking `min_qa_crawler_image`).

On the frontend, I've added a checkbox (unchecked by default) in the
"Limits" section just below the current checkbox for autoscroll. We
might want to move these to a different section eventually - I'm not
sure Limits is the right place for them - but I wanted to be consistent
with things as they are.

---------

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
@Klindten
Copy link

Klindten commented Feb 10, 2025

One thing when summing up in Crawl Setting:

  • "Autoclick Behaviour" should be added to Crawl Settings (summing up all the choices)

image

I'm trying to find examples where the behaviour works, and will try to find some good cases:-)

No luck, so far with, these usecases:
https://www.zetland.dk/historie/s8D34NZJ-aOZj67pz-be97a (the play-button is not clicked (and content not crawled) with this new feature
https://soundcloud.com/soulstrutdotcom/steen-rock-rock-science-2006?in=funky4/sets/psych-fuzz (seems it plays without Click but only gets 14MB - maybe needs to be logged in).

@tw4l
Copy link
Member Author

tw4l commented Feb 10, 2025

@Klindten at this point, the autoclick selector will only click on anchor tags with hrefs, but if you want to try those pages with the crawler, you can use the --clickSelector argument to specify what elements it should click on, e.g. --clickSelector button.

We're adding support for this in Browsertrix too, it's just a bit behind the backend work.

I'm also going to create a new issue for the Crawl Settings miss, thanks for that!

@Klindten
Copy link

Thanks for the clarification:-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add checkbox in workflow UI for Autoclick behavior
3 participants