Skip to content

Draft doc for Pausing and Resuming Crawl section #2639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions frontend/docs/docs/overrides/.icons/bootstrap/play-circle.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 8 additions & 5 deletions frontend/docs/docs/stylesheets/extra.css
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
@import './theme.css';
@import "./theme.css";
/* Font style definitions */

@font-face {
Expand All @@ -8,9 +8,9 @@
font-display: swap;
src: url("https://cdn.webrecorder.net/fonts/recursive/recursive-latin.woff2")
format("woff2");
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,
U+02DA, U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC,
U+2122, U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA,
U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191,
U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}

@font-face {
Expand Down Expand Up @@ -141,7 +141,10 @@ h3 {
}

.md-typeset {
font-feature-settings: "ss04" off,"ss07" on,"ss08" on;
font-feature-settings:
"ss04" off,
"ss07" on,
"ss08" on;
}

/* Custom badge classes, applies custom overrides to inline-code blocks */
Expand Down
34 changes: 25 additions & 9 deletions frontend/docs/docs/user-guide/running-crawl.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Modifying Running Crawls
# Running Crawls

Running crawls can be modified from the crawl workflow **Latest Crawl** tab. You may want to modify a running crawl if you find that the workflow is crawling pages that you didn't intend to archive, or if you want a boost of speed.

Expand All @@ -8,17 +8,20 @@ A crawl workflow that is in progress can be in one of the following states:

| Status | Description |
| ---- | ---- |
| <span class="status-waiting">:bootstrap-hourglass-split: Waiting</span> | The workflow can't start running yet but it is queued to run when resources are available. |
| <span class="status-waiting">:btrix-status-dot: Starting</span> | New resources are starting up. Crawling should begin shortly.|
| <span class="status-success">:btrix-status-dot: Running</span> | The crawler is finding and capturing pages! |
| <span class="status-waiting">:btrix-status-dot: Stopping</span> | A user has instructed this workflow to stop. Finishing capture of the current pages.|
| <span class="status-waiting">:btrix-status-dot: Finishing Downloads</span> | The workflow has finished crawling and is finalizing downloads.|
| <span class="status-waiting">:btrix-status-dot: Generating WACZ</span> | Data is being packaged into WACZ files.|
| <span class="status-waiting">:btrix-status-dot: Uploading WACZ</span> | WACZ files have been created and are being transferred to storage.|
| <span class="status-violet-600">:bootstrap-hourglass-split: Waiting</span> | The workflow can't start running yet but it is queued to run when resources are available. |
| <span class="status-violet-600">:btrix-status-dot: Starting</span> | New resources are starting up. Crawling should begin shortly.|
| <span class="status-green-600">:btrix-status-dot: Running</span> | The crawler is finding and capturing pages! |
| <span class="status-violet-600">:bootstrap-pause-circle: Pausing</span> | The workflow is in the process of being paused. |
| <span class="status-neutral-500">:bootstrap-pause-circle: Paused</span> | The workflow is currently paused. |
| <span class="status-violet-600">:bootstrap-play-circle: Resuming</span> | The workflow is in the process of resuming after being paused. |
| <span class="status-violet-600">:btrix-status-dot: Stopping</span> | A user has instructed this workflow to stop. Finishing capture of the current pages.|
| <span class="status-violet-600">:btrix-status-dot: Finishing Downloads</span> | The workflow has finished crawling and is finalizing downloads.|
| <span class="status-violet-600">:btrix-status-dot: Generating WACZ</span> | Data is being packaged into WACZ files.|
| <span class="status-violet-600">:btrix-status-dot: Uploading WACZ</span> | WACZ files have been created and are being transferred to storage.|

## Watch Crawl

You can watch the current state of the browser windows as the crawler visit pages in the **Watch** tab of **Latest Crawl**. A list of queued URLs are displayed below in the **Upcoming Pages** section.
You can watch the current state of the browser windows as the crawler visits pages in the **Watch** tab of **Latest Crawl**. A list of queued URLs are displayed below in the **Upcoming Pages** section.

## Live Exclusion Editing

Expand All @@ -34,6 +37,19 @@ Like exclusions, the number of [browser windows](workflow-setup.md#browser-windo

Unlike exclusions, this change will not be applied to future workflow runs.

## Pausing and Resuming Crawls

If you need to reassess or rescope your crawl at any point after it has started, you can pause the running crawl.

To pause a running crawl, click the *Pause* button. The crawl status will change from *Running* to *Pausing* as in-progress pages are completed, and then to *Paused* once the crawler is successful paused. Paused crawls do not continue to accrue execution time.

While a crawl is paused, it is possible to replay the pages crawled up to that point and to download the WACZ files from the *Latest Crawl* tab.

To resume a paused crawl, simply click the *Resume* button. The crawl status will update from *Resuming* to *Running* to indicate that the crawler has started crawling again. Any changes to the workflow settings will be applied in the the resumed crawl.

???+ Note
Paused crawls that are not resumed within 7 days of being paused are automatically updated to *Stopped*. Once stopped, the crawl is finished and can no longer be resumed.

## End a Crawl

If a crawl workflow is not crawling websites as intended it may be preferable to end crawling operations and update the crawl workflow's settings before trying again. There are two operations to end crawls, available both on the workflow's details page, or as part of the actions menu in the workflow list.
Expand Down
Loading