Skip to content

Optimize single-page crawl workflows #2656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 10, 2025
Merged

Optimize single-page crawl workflows #2656

merged 6 commits into from
Jun 10, 2025

Conversation

ikreymer
Copy link
Member

For single page crawls:

  • Always force 1 browser to be used, ignoring browser windows/scale setting
  • Don't use custom PVC volumes in crawler / redis, just use emptyDir - no chance of crawler being interrupted and restarted on different machine for a single page.

Adds a 'is_single_page' check to CrawlConfig, checking for either limit or scopeType / no extra hops.

Also: use qa_browser_instances instead of qa_scale for consistency for setting number of browsers for QA.

Fixes #2655

ikreymer added 3 commits June 9, 2025 17:32
- always set to 1 browser window for single page crawl
- don't use pvcs for crawler or redis for single page crawl
set 'isSinglePage' flag on CrawlJob when creating/updating crawl
also support 'qa_browser_windows' to be used instead of 'qa_scale'
@ikreymer ikreymer requested a review from tw4l June 10, 2025 06:23
Copy link
Member

@tw4l tw4l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working well! For peak user experience we may want to also disable browser window options > 1 in the frontend workflow editor when these conditions are true, but that can be done separately and is less pressing I think

@ikreymer ikreymer merged commit 8ea1639 into main Jun 10, 2025
22 checks passed
@ikreymer ikreymer deleted the opt-single-page branch June 10, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Optimize for single page crawl use case.
2 participants