From eb7ee649ef397d6d8e0624243938a2463e208cc5 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Wed, 5 Mar 2025 17:48:56 +1300 Subject: [PATCH 01/10] Move troubleshooting to new section Signed-off-by: Christopher Hakkaart --- docs/aws.md | 37 +---- docs/cache-and-resume.md | 155 +------------------ docs/google.md | 2 +- docs/index.md | 9 ++ docs/process.md | 2 +- docs/reference/cli.md | 2 +- docs/reference/operator.md | 2 +- docs/script.md | 2 +- docs/troubleshooting/cache-failures.md | 198 ++++++++++++++++++++++++ docs/troubleshooting/compute-storage.md | 44 ++++++ docs/troubleshooting/language-server.md | 52 +++++++ docs/vscode.md | 14 +- 12 files changed, 313 insertions(+), 206 deletions(-) create mode 100644 docs/troubleshooting/cache-failures.md create mode 100644 docs/troubleshooting/compute-storage.md create mode 100644 docs/troubleshooting/language-server.md diff --git a/docs/aws.md b/docs/aws.md index 3de628e3cc..fe597cb658 100644 --- a/docs/aws.md +++ b/docs/aws.md @@ -476,42 +476,7 @@ The above snippet defines two volume mounts for the jobs executed in your pipeli ### Troubleshooting -**Problem**: The Pipeline execution terminates with an AWS error message similar to the one shown below: - -``` -JobQueue not found -``` - -Make sure you have defined a AWS region in the Nextflow configuration file and it matches the region in which your Batch environment has been created. - -**Problem**: A process execution fails reporting the following error message: - -``` -Process terminated for an unknown reason -- Likely it has been terminated by the external system -``` - -This may happen when Batch is unable to execute the process script. A common cause of this problem is that the Docker container image you have specified uses a non standard [entrypoint](https://docs.docker.com/engine/reference/builder/#entrypoint) which does not allow the execution of the Bash launcher script required by Nextflow to run the job. - -This may also happen if the AWS CLI doesn't run correctly. - -Other places to check for error information: - -- The `.nextflow.log` file. -- The Job execution log in the AWS Batch dashboard. -- The [CloudWatch](https://aws.amazon.com/cloudwatch/) logs found in the `/aws/batch/job` log group. - -**Problem**: A process execution is stalled in the `RUNNABLE` status and the pipeline output is similar to the one below: - -``` -executor > awsbatch (1) -process > (1) [ 0%] 0 of .... -``` - -It may happen that the pipeline execution hangs indefinitely because one of the jobs is held in the queue and never gets executed. In AWS Console, the queue reports the job as `RUNNABLE` but it never moves from there. - -There are multiple reasons why this can happen. They are mainly related to the Compute Environment workload/configuration, the docker service or container configuration, network status, etc. - -This [AWS page](https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-stuck-runnable-status/) provides several resolutions and tips to investigate and work around the issue. +See {ref}`aws-compute-storage` for common AWS compute and storage errors and strategies to resolve them. ## AWS Fargate diff --git a/docs/cache-and-resume.md b/docs/cache-and-resume.md index 84c09f326d..56ad384dae 100644 --- a/docs/cache-and-resume.md +++ b/docs/cache-and-resume.md @@ -65,122 +65,7 @@ When a previous task is retrieved from the task cache on a resumed run, Nextflow For this reason, it is important to preserve both the task cache (`.nextflow/cache`) and work directories in order to resume runs successfully. You can use the {ref}`cli-clean` command to delete specific runs from the cache. -## Troubleshooting - -Cache failures happen when either (1) a task that was supposed to be cached was re-executed, or (2) a task that was supposed to be re-executed was cached. - -When this happens, consider the following questions: - -- Is resume enabled via `-resume`? -- Is the {ref}`process-cache` directive set to a non-default value? -- Is the task still present in the task cache and work directory? -- Were any of the task inputs changed? - -Changing any of the inputs included in the [task hash](#task-hash) will invalidate the cache, for example: - -- Resuming from a different session ID -- Changing the process name -- Changing the task container image or Conda environment -- Changing the task script -- Changing an input file or bundled script used by the task - -While the following examples would not invalidate the cache: - -- Changing the value of a directive (other than {ref}`process-ext`), even if that directive is used in the task script - -In many cases, cache failures happen because of a change to the pipeline script or configuration, or because the pipeline itself has some non-deterministic behavior. - -Here are some common reasons for cache failures: - -### Modified input files - -Make sure that your input files have not been changed. Keep in mind that the default caching mode uses the complete file path, the last modified timestamp, and the file size. If any of these attributes change, the task will be re-executed, even if the file content is unchanged. - -### Process that modifies its inputs - -If a process modifies its own input files, it cannot be resumed for the reasons described in the previous point. As a result, processes that modify their own input files are considered an anti-pattern and should be avoided. - -### Inconsistent file attributes - -Some shared file systems, such as NFS, may report inconsistent file timestamps, which can invalidate the cache. If you encounter this problem, you can avoid it by using the `'lenient'` {ref}`caching mode `, which ignores the last modified timestamp and uses only the file path and size. - -(cache-global-var-race-condition)= - -### Race condition on a global variable - -While Nextflow tries to make it easy to write safe concurrent code, it is still possible to create race conditions, which can in turn impact the caching behavior of your pipeline. - -Consider the following example: - -```nextflow -Channel.of(1,2,3) | map { v -> X=v; X+=2 } | view { v -> "ch1 = $v" } -Channel.of(1,2,3) | map { v -> X=v; X*=2 } | view { v -> "ch2 = $v" } -``` - -The problem here is that `X` is declared in each `map` closure without the `def` keyword (or other type qualifier). Using the `def` keyword makes the variable local to the enclosing scope; omitting the `def` keyword makes the variable global to the entire script. - -Because `X` is global, and operators are executed concurrently, there is a *race condition* on `X`, which means that the emitted values will vary depending on the particular order of the concurrent operations. If the values were passed as inputs into a process, the process would execute different tasks on each run due to the race condition. - -The solution is to not use a global variable where a local variable is enough (or in this simple example, avoid the variable altogether): - -```nextflow -// local variable -Channel.of(1,2,3) | map { v -> def X=v; X+=2 } | view { v -> "ch1 = $v" } - -// no variable -Channel.of(1,2,3) | map { v -> v * 2 } | view { v -> "ch2 = $v" } -``` - -(cache-nondeterministic-inputs)= - -### Non-deterministic process inputs - -Sometimes a process needs to merge inputs from different sources. Consider the following example: - -```nextflow -workflow { - ch_foo = Channel.of( ['1', '1.foo'], ['2', '2.foo'] ) - ch_bar = Channel.of( ['2', '2.bar'], ['1', '1.bar'] ) - gather(ch_foo, ch_bar) -} - -process gather { - input: - tuple val(id), file(foo) - tuple val(id), file(bar) - - script: - """ - merge_command $foo $bar - """ -} -``` - -It is tempting to assume that the process inputs will be matched by `id` like the {ref}`operator-join` operator. But in reality, they are simply merged like the {ref}`operator-merge` operator. As a result, not only will the process inputs be incorrect, they will also be non-deterministic, thus invalidating the cache. - -The solution is to explicitly join the two channels before the process invocation: - -```nextflow -workflow { - ch_foo = Channel.of( ['1', '1.foo'], ['2', '2.foo'] ) - ch_bar = Channel.of( ['2', '2.bar'], ['1', '1.bar'] ) - gather(ch_foo.join(ch_bar)) -} - -process gather { - input: - tuple val(id), file(foo), file(bar) - - script: - """ - merge_command $foo $bar - """ -} -``` - -## Tips - -### Resuming from a specific run +## Resume from a specific run Nextflow resumes from the previous run by default. If you want to resume from an earlier run, simply specify the session ID for that run with the `-resume` option: @@ -189,41 +74,3 @@ nextflow run rnaseq-nf -resume 4dc656d2-c410-44c8-bc32-7dd0ea87bebf ``` You can use the {ref}`cli-log` command to view all previous runs as well as the task executions for each run. - -(cache-compare-hashes)= - -### Comparing the hashes of two runs - -One way to debug a resumed run is to compare the task hashes of each run using the `-dump-hashes` option. - -1. Perform an initial run: `nextflow -log run_initial.log run -dump-hashes` -2. Perform a resumed run: `nextflow -log run_resumed.log run -dump-hashes -resume` -3. Extract the task hash lines from each log (search for `cache hash:`) -4. Compare the runs with a diff viewer - -While some manual effort is required, the final diff can often reveal the exact change that caused a task to be re-executed. - -:::{versionadded} 23.10.0 -::: - -When using `-dump-hashes json`, the task hashes can be more easily extracted into a diff. Here is an example Bash script to perform two runs and produce a diff: - -```bash -nextflow -log run_1.log run $pipeline -dump-hashes json -nextflow -log run_2.log run $pipeline -dump-hashes json -resume - -get_hashes() { - cat $1 \ - | grep 'cache hash:' \ - | cut -d ' ' -f 10- \ - | sort \ - | awk '{ print; print ""; }' -} - -get_hashes run_1.log > run_1.tasks.log -get_hashes run_2.log > run_2.tasks.log - -diff run_1.tasks.log run_2.tasks.log -``` - -You can then view the `diff` output or use a graphical diff viewer to compare `run_1.tasks.log` and `run_2.tasks.log`. diff --git a/docs/google.md b/docs/google.md index 8095d9d392..4daee09aa7 100644 --- a/docs/google.md +++ b/docs/google.md @@ -427,7 +427,7 @@ Nextflow will automatically manage the transfer of input and output files betwee - Currently, it's not possible to specify a disk type different from the default one assigned by the service depending on the chosen instance type. -### Troubleshooting +### Configuration tips - Make sure to enable the Compute Engine API, Life Sciences API and Cloud Storage API in the [APIs & Services Dashboard](https://console.cloud.google.com/apis/dashboard) page. diff --git a/docs/index.md b/docs/index.md index 88494a1d5f..2252f01f57 100644 --- a/docs/index.md +++ b/docs/index.md @@ -144,6 +144,15 @@ developer/packages developer/plugins ``` +```{toctree} +:hidden: +:caption: Troubleshooting +:maxdepth: 1 +troubleshooting/cache-failures.md +troubleshooting/compute-storage.md +troubleshooting/language-server.md +``` + ```{toctree} :hidden: :caption: Tutorials diff --git a/docs/process.md b/docs/process.md index 4ed236fc79..7851f295a8 100644 --- a/docs/process.md +++ b/docs/process.md @@ -808,7 +808,7 @@ The above example executes the `bar` process three times because `x` is a value ``` :::{note} -In general, multiple input channels should be used to process *combinations* of different inputs, using the `each` qualifier or value channels. Having multiple queue channels as inputs is equivalent to using the {ref}`operator-merge` operator, which is not recommended as it may lead to {ref}`non-deterministic process inputs `. +In general, multiple input channels should be used to process *combinations* of different inputs, using the `each` qualifier or value channels. Having multiple queue channels as inputs is equivalent to using the {ref}`operator-merge` operator, which is not recommended as it may lead to {ref}`non-deterministic process inputs `. ::: See also: {ref}`channel-types`. diff --git a/docs/reference/cli.md b/docs/reference/cli.md index dd9bd9c32d..3dd97852b1 100644 --- a/docs/reference/cli.md +++ b/docs/reference/cli.md @@ -980,7 +980,7 @@ The `run` command is used to execute a local pipeline script or remote pipeline `-dump-hashes` : Dump task hash keys for debugging purposes. : :::{versionadded} 23.10.0 - You can use `-dump-hashes json` to dump the task hash keys as JSON for easier post-processing. See the {ref}`caching and resuming tips ` for more details. + You can use `-dump-hashes json` to dump the task hash keys as JSON for easier post-processing. See the {ref}`cache-failure-compare` for more details. ::: `-e.=` diff --git a/docs/reference/operator.md b/docs/reference/operator.md index ba7b78837e..242dee40c2 100644 --- a/docs/reference/operator.md +++ b/docs/reference/operator.md @@ -867,7 +867,7 @@ The `merge` operator may return a queue channel or value channel depending on th - If the first argument is a value channel, the `merge` operator will return a value channel merging the first value from each input, regardless of whether there are queue channel inputs with additional values. :::{danger} -In general, the use of the `merge` operator is discouraged. Processes and channel operators are not guaranteed to emit items in the order that they were received, as they are executed concurrently. Therefore, if you try to merge output channels from different processes, the resulting channel may be different on each run, which will cause resumed runs to {ref}`not work properly `. +In general, the use of the `merge` operator is discouraged. Processes and channel operators are not guaranteed to emit items in the order that they were received, as they are executed concurrently. Therefore, if you try to merge output channels from different processes, the resulting channel may be different on each run, which will cause resumed runs to {ref}`not work properly `. You should always use a matching key (e.g. sample ID) to merge multiple channels, so that they are combined in a deterministic way. For this purpose, you can use the [join](#join) operator. ::: diff --git a/docs/script.md b/docs/script.md index eec9320ffe..48b611dd66 100644 --- a/docs/script.md +++ b/docs/script.md @@ -45,7 +45,7 @@ println str ``` :::{warning} -Variables can also be declared without `def` in some cases. However, this practice is discouraged outside of simple code snippets because it can lead to a {ref}`race condition `. +Variables can also be declared without `def` in some cases. However, this practice is discouraged outside of simple code snippets because it can lead to a {ref}`race condition `. ::: ## Lists diff --git a/docs/troubleshooting/cache-failures.md b/docs/troubleshooting/cache-failures.md new file mode 100644 index 0000000000..5a1d9d8e4f --- /dev/null +++ b/docs/troubleshooting/cache-failures.md @@ -0,0 +1,198 @@ +(cache-failure-page)= + +# Cache failures + +Cache failures occur when a task that was supposed to be cached was re-executed or a task that was supposed to be re-executed was cached. This page provides an overview of common causes for cache failures and strategies to identify them. + +(cache-failure-common)= + +## Common causes + +Common causes of cache failures include: + +- {ref}`Resume not being enabled ` +- {ref}`Non-default cache directives ` +- {ref}`Modified inputs ` +- {ref}`Inconsistent file attributes ` +- {ref}`Race condition on a global variable ` +- {ref}`Non-deterministic process inputs ` + +The causes of these cache failure and solutions to resolve them are described in detail below. + +(cache-failure-resume)= + +### Resume not enabled + +The `-resume` option is required to resume a pipeline. Ensure `-resume` has been enabled in your run command or your nextflow configuration file. + +(cache-failure-directives)= + +### Non-default cache directives + +The `cache` directive is enabled by default. However, you can disable or modify its behavior for a specific process. For example: + +```nextflow +process FOO { + cache false + // ... +} +``` + +Ensure that the cache has not been set to a non-default value. See {ref}`process-cache` for more information about the `cache` directive. + +(cache-failure-modified)= + +### Modified inputs + +Modifying inputs that are used in the task hash will invalidate the cache. Common causes of modified inputs include: + +- Changing input files +- Resuming from a different session ID +- Changing the process name +- Changing the task container image or Conda environment +- Changing the task script +- Changing a bundled script used by the task + +:::{note} +Changing the value of any directive, except {ref}`process-ext`, will not inactivate the task cache. +::: + +A hash for an input file is calculated from the complete file path, the last modified timestamp, and the file size to calculate. If any of these attributes change the task will be re-executed. If a process modifies its input files it cannot be resumed. Processes that modify their own input files are considered to be an anti-pattern and should be avoided. + +(cache-failure-inconsistent)= + +### Inconsistent file attributes + +Some shared file systems, such as NFS, may report inconsistent file timestamps. If you encounter this problem, use the `'lenient'` {ref}`caching mode ` to ignore the last modified timestamp and only use the file path. + +(cache-failure-race-condition)= + +### Race condition on a global variable + +Race conditions can in disrupt caching behavior of your pipeline. For example: + +```nextflow +Channel.of(1,2,3) | map { v -> X=v; X+=2 } | view { v -> "ch1 = $v" } +Channel.of(1,2,3) | map { v -> X=v; X*=2 } | view { v -> "ch2 = $v" } +``` + +In the above example, `X` is declared in each `map` closure. Without the `def` keyword, or other type qualifier, the variable `X` is global to the entire script. Operators and executed concurrently and, as `X` is global, there is a *race condition* that causes the emitted values to vary depending on the order of the concurrent operations. If these values were passed to a process as inputs the process would execute different tasks during each run due to the race condition. + +To resolve this failure type, ensure the variable is not global by using a local variable: + +```nextflow +Channel.of(1,2,3) | map { v -> def X=v; X+=2 } | view { v -> "ch1 = $v" } +``` + +Alternatively, remove the variable: + +```nextflow +Channel.of(1,2,3) | map { v -> v * 2 } | view { v -> "ch2 = $v" } +``` + +(cache-failure-nondeterministic)= + +### Non-deterministic process inputs + +A process that merges inputs from different sources non-deterministically may invalidate the cache. For example: + +```nextflow +workflow { + ch_foo = Channel.of( ['1', '1.foo'], ['2', '2.foo'] ) + ch_bar = Channel.of( ['2', '2.bar'], ['1', '1.bar'] ) + gather(ch_foo, ch_bar) +} + +process gather { + input: + tuple val(id), file(foo) + tuple val(id), file(bar) + + script: + """ + merge_command $foo $bar + """ +} +``` + +In the above example, the inputs will be merged without matching. This is the same way method used by the {ref}`operator-merge` operator. When merged, the inputs are incorrect, non-deterministic, and invalidate the cache. + +To resolve this failure type, ensure channels are deterministic by joining them before invoking the process: + +```nextflow +workflow { + ch_foo = Channel.of( ['1', '1.foo'], ['2', '2.foo'] ) + ch_bar = Channel.of( ['2', '2.bar'], ['1', '1.bar'] ) + gather(ch_foo.join(ch_bar)) +} + +process gather { + input: + tuple val(id), file(foo), file(bar) + + script: + """ + merge_command $foo $bar + """ +} +``` + +(cache-failure-compare)= + +## Compare task hashes + +By identifying differences between hashes you can detect changes that may be causing cache failures. + +To compare the task hashes for a resumed run: + +1. Run your pipeline with the `-log` and `-dump-hashes` options: + + ```bash + nextflow -log run_initial.log run -dump-hashes + ``` + +2. Run your pipeline with the `-log`, `-dump-hashes`, and `-resume` options: + + ```bash + nextflow -log run_resumed.log run -dump-hashes -resume + ``` + +3. Extract the task hash lines from each log: + + ```bash + cat run_initial.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d ' ' -f 10- | sort | awk '{ print; print ""; }' > run_initial.tasks.log + cat run_resumed.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d ' ' -f 10- | sort | awk '{ print; print ""; }' > run_resumed.tasks.log + ``` + +4. Compare the runs: + + ```bash + diff run_initial.tasks.log run_resumed.tasks.log + ``` + + :::{tip} + You can also compare the hash lines using a graphical diff viewer. + ::: + +:::{versionadded} 23.10.0 +::: + +Task hashes can also be extracted into a diff using `-dump-hashes json`. The following is an example Bash script to compare two runs and produce a diff: + +```bash +nextflow -log run_1.log run $pipeline -dump-hashes json +nextflow -log run_2.log run $pipeline -dump-hashes json -resume + +get_hashes() { + cat $1 \ + | grep 'cache hash:' \ + | cut -d ' ' -f 10- \ + | sort \ + | awk '{ print; print ""; }' +} + +get_hashes run_1.log > run_1.tasks.log +get_hashes run_2.log > run_2.tasks.log + +diff run_1.tasks.log run_2.tasks.log +``` diff --git a/docs/troubleshooting/compute-storage.md b/docs/troubleshooting/compute-storage.md new file mode 100644 index 0000000000..4aa4f50df8 --- /dev/null +++ b/docs/troubleshooting/compute-storage.md @@ -0,0 +1,44 @@ +(compute-storage-page)= + +# Compute and storage + +This page describes common compute and storage errors and strategies to resolve them. + +(aws-compute-storage)= + +## Amazon Web Services + +### Job queue not found + +**`JobQueue not found`** + +This error occurs when Nextflow cannot locate the specified AWS Batch job queue. It usually happens when the job queue does not exist, is not enabled, or there is a region mismatch between the configuration and the AWS Batch environment. + +To resolve this error, ensure you have defined an AWS region in your `nextflow.config` file and that it matches your Batch environment region. + +### Process terminated for an unknown reason + +**`Process terminated for an unknown reason -- Likely it has been terminated by the external system`** + +This error typically occurs when AWS Batch is unable to execute the process script. The most common reason is that the specified Docker container image has a non-standard entrypoint that prevents the execution of the Bash launcher script required by Nextflow to run the job. Another possible cause is an issue with the AWS CLI failing to run correctly within the job environment. + +To resolve this error, ensure the Docker container image used for the job does not have a custom entrypoint overriding or preventing Bash from launching and that the AWS CLI is properly installed. + +Check the following logs for more detailed error information: + +- The `.nextflow.log` file +- The Job execution log in the AWS Batch dashboard +- The CloudWatch logs found in the `/aws/batch/job` log group + +### Process stalled in RUNNABLE status + +If a process execution is stalled in the RUNNABLE status you may see an output similar to the following: + +``` +executor > awsbatch (1) +process > (1) [ 0%] 0 of .... +``` + +This error occurs when a job remains stuck in the RUNNABLE state in AWS Batch and never progresses to execution. In the AWS Console, the job will be listed as RUNNABLE indefinitely, indicating that it’s waiting to be scheduled but cannot proceed. The root cause is often related to issues with the Compute Environment, Docker configuration, or network settings. + +See [Why is my AWS Batch job stuck in RUNNABLE status?](https://repost.aws/knowledge-center/batch-job-stuck-runnable-status) for several resolutions and tips to investigate this error. diff --git a/docs/troubleshooting/language-server.md b/docs/troubleshooting/language-server.md new file mode 100644 index 0000000000..5b2a1ea7a2 --- /dev/null +++ b/docs/troubleshooting/language-server.md @@ -0,0 +1,52 @@ +(language-server-errors-page)= + +# Language server errors + +This page describes common language server errors and strategies to resolve them. + +## Common errors + +### Filesystem changes + +The language server does not detect certain filesystem changes. For example, changing the current Git branch. + +To resolve this issue, restart the language server from the command palette to sync it with your workspace. See [Stop and restart](#stop-and-restart) for more information. + +### Third-party plugins + +The language server does not recognize configuration options from third-party plugins and will report unrecognized config option warnings. There is currently no solution to suppress them. + +### Groovy scripts + +The language server provides limited support for Groovy scripts in the lib directory. Errors in Groovy scripts are not reported as diagnostics, and changing a Groovy script does not automatically re-compile the Nextflow scripts that reference it. + +To resolve this issue, edit or close and re-open the Nextflow script to refresh the diagnostics. + +## Stop and restart + +In the event of an error, stop or restart the language server from the Command Palette. The following stop and restart commands are available: + +- `Nextflow: Stop language server` +- `Nextflow: Restart language server` + +See {ref}`vscode-commands` for a fill list of Nextflow VS Code extension commands. + +## View logs + +Error logs can be useful for troubleshooting errors. + +To view logs in VS Code: + +1. Open the **Output** tab in your console. +2. Select **Nextflow Language Server** from the dropdown. + +To show additional log messages in VS Code: + +1. Open the **Extensions** view in the left-hand menu. +2. Select the **Nextflow** extension. +3. Select the **Manage** icon. +3. Enable **Nextflow > Debug** in the extension settings. + +## Report an issue + +Report issues at [`nextflow-io/vscode-language-nextflow`](https://github.com/nextflow-io/vscode-language-nextflow) or [`nextflow-io/language-server`](https://github.com/nextflow-io/language-server). When reporting issues, include a minimal code snippet that reproduces the issue and any error logs from the server. diff --git a/docs/vscode.md b/docs/vscode.md index 76576e8002..a15b3fc5bd 100644 --- a/docs/vscode.md +++ b/docs/vscode.md @@ -62,21 +62,13 @@ The **Preview DAG** CodeLens is only available when the script does not contain ## Troubleshooting -In the event of an error, you can stop or restart the language server from the command palette. See [Commands](#commands) for the set of available commands. +See {ref}`language-server-errors-page` for common language server limitations, errors, and strategies to resolve them. -Report issues at [nextflow-io/vscode-language-nextflow](https://github.com/nextflow-io/vscode-language-nextflow) or [nextflow-io/language-server](https://github.com/nextflow-io/language-server). When reporting, include a minimal code snippet that reproduces the issue and any error logs from the server. To view logs, open the **Output** tab and select **Nextflow Language Server** from the dropdown. Enable **Nextflow > Debug** in the [extension settings](#settings) to show additional log messages while debugging. - -## Limitations - -- The language server does not detect certain filesystem changes, such as changing the current Git branch. Restart the language server from the command palette to sync it with your workspace. - -- The language server does not recognize configuration options from third-party plugins and will report "Unrecognized config option" warnings for them. - -- The language server provides limited support for Groovy scripts in the `lib` directory. Errors in Groovy scripts are not reported as diagnostics, and changing a Groovy script does not automatically re-compile the Nextflow scripts that reference it. Edit the Nextflow script or close and re-open it to refresh the diagnostics. +(vscode-commands)= ## Commands -The following commands are available from the command palette: +The following commands are available from the Command Palette: - Restart language server - Stop language server From f7c391a4d48f5873af5ddb07dc620bf75796f6fe Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Thu, 6 Mar 2025 05:41:50 +1300 Subject: [PATCH 02/10] Language improvements Signed-off-by: Christopher Hakkaart --- docs/troubleshooting/cache-failures.md | 14 +++++++------- docs/troubleshooting/compute-storage.md | 10 +++++----- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/troubleshooting/cache-failures.md b/docs/troubleshooting/cache-failures.md index 5a1d9d8e4f..8735059700 100644 --- a/docs/troubleshooting/cache-failures.md +++ b/docs/troubleshooting/cache-failures.md @@ -17,7 +17,7 @@ Common causes of cache failures include: - {ref}`Race condition on a global variable ` - {ref}`Non-deterministic process inputs ` -The causes of these cache failure and solutions to resolve them are described in detail below. +Causes of these cache failures and solutions to resolve them are described in detail below. (cache-failure-resume)= @@ -57,7 +57,7 @@ Modifying inputs that are used in the task hash will invalidate the cache. Commo Changing the value of any directive, except {ref}`process-ext`, will not inactivate the task cache. ::: -A hash for an input file is calculated from the complete file path, the last modified timestamp, and the file size to calculate. If any of these attributes change the task will be re-executed. If a process modifies its input files it cannot be resumed. Processes that modify their own input files are considered to be an anti-pattern and should be avoided. +Hashes for input files are calculated from the complete file path, the last modified timestamp, and the file size to calculate. If any of these attributes change, tasks will be re-executed. If a process modifies its input files it cannot be resumed. Processes that modify their own input files are considered to be an anti-pattern and should be avoided. (cache-failure-inconsistent)= @@ -76,15 +76,15 @@ Channel.of(1,2,3) | map { v -> X=v; X+=2 } | view { v -> "ch1 = $v" } Channel.of(1,2,3) | map { v -> X=v; X*=2 } | view { v -> "ch2 = $v" } ``` -In the above example, `X` is declared in each `map` closure. Without the `def` keyword, or other type qualifier, the variable `X` is global to the entire script. Operators and executed concurrently and, as `X` is global, there is a *race condition* that causes the emitted values to vary depending on the order of the concurrent operations. If these values were passed to a process as inputs the process would execute different tasks during each run due to the race condition. +In the above example, `X` is declared in each `map` closure. Without the `def` keyword, or other type qualifier, the variable `X` is global to the entire script. Operators and executed concurrently and, as `X` is global, there is a *race condition* that causes emitted values to vary depending on the order of concurrent operations. If these values were passed to a process as inputs the process would execute different tasks during each run due to race conditions. -To resolve this failure type, ensure the variable is not global by using a local variable: +To resolve this failure type, ensure variables are not global by using local variables: ```nextflow Channel.of(1,2,3) | map { v -> def X=v; X+=2 } | view { v -> "ch1 = $v" } ``` -Alternatively, remove the variable: +Alternatively, remove the variables: ```nextflow Channel.of(1,2,3) | map { v -> v * 2 } | view { v -> "ch2 = $v" } @@ -94,7 +94,7 @@ Channel.of(1,2,3) | map { v -> v * 2 } | view { v -> "ch2 = $v" } ### Non-deterministic process inputs -A process that merges inputs from different sources non-deterministically may invalidate the cache. For example: +Processes that merge inputs from different sources non-deterministically may invalidate the cache. For example: ```nextflow workflow { @@ -115,7 +115,7 @@ process gather { } ``` -In the above example, the inputs will be merged without matching. This is the same way method used by the {ref}`operator-merge` operator. When merged, the inputs are incorrect, non-deterministic, and invalidate the cache. +In the above example, inputs will be merged without matching. This is the same way method used by the {ref}`operator-merge` operator. When merged, the inputs are incorrect, non-deterministic, and invalidate the cache. To resolve this failure type, ensure channels are deterministic by joining them before invoking the process: diff --git a/docs/troubleshooting/compute-storage.md b/docs/troubleshooting/compute-storage.md index 4aa4f50df8..72e4549c05 100644 --- a/docs/troubleshooting/compute-storage.md +++ b/docs/troubleshooting/compute-storage.md @@ -10,9 +10,9 @@ This page describes common compute and storage errors and strategies to resolve ### Job queue not found -**`JobQueue not found`** +**`JobQueue not found`** -This error occurs when Nextflow cannot locate the specified AWS Batch job queue. It usually happens when the job queue does not exist, is not enabled, or there is a region mismatch between the configuration and the AWS Batch environment. +This error occurs when Nextflow cannot locate the specified AWS Batch job queue. It usually happens when job queues do not exist, are not enabled, or there is a region mismatch between the configuration and the AWS Batch environment. To resolve this error, ensure you have defined an AWS region in your `nextflow.config` file and that it matches your Batch environment region. @@ -26,9 +26,9 @@ To resolve this error, ensure the Docker container image used for the job does n Check the following logs for more detailed error information: -- The `.nextflow.log` file -- The Job execution log in the AWS Batch dashboard -- The CloudWatch logs found in the `/aws/batch/job` log group +- `.nextflow.log` file +- Job execution log in the AWS Batch dashboard +- CloudWatch logs found in the `/aws/batch/job` log group ### Process stalled in RUNNABLE status From d3b54bac6e842b98e0f728231e1ae68f50737f48 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Tue, 11 Mar 2025 03:52:04 +1300 Subject: [PATCH 03/10] Revert sections Signed-off-by: Christopher Hakkaart --- docs/aws.md | 35 ++++- docs/cache-and-resume.md | 186 ++++++++++++++++++++++ docs/index.md | 9 -- docs/process.md | 2 +- docs/reference/operator.md | 2 +- docs/script.md | 2 +- docs/troubleshooting/cache-failures.md | 198 ------------------------ docs/troubleshooting/compute-storage.md | 44 ------ docs/troubleshooting/language-server.md | 52 ------- docs/vscode.md | 47 +++++- 10 files changed, 269 insertions(+), 308 deletions(-) delete mode 100644 docs/troubleshooting/cache-failures.md delete mode 100644 docs/troubleshooting/compute-storage.md delete mode 100644 docs/troubleshooting/language-server.md diff --git a/docs/aws.md b/docs/aws.md index fe597cb658..9a6b658a44 100644 --- a/docs/aws.md +++ b/docs/aws.md @@ -476,7 +476,40 @@ The above snippet defines two volume mounts for the jobs executed in your pipeli ### Troubleshooting -See {ref}`aws-compute-storage` for common AWS compute and storage errors and strategies to resolve them. +

Job queue not found

+ +**`JobQueue not found`** + +This error occurs when Nextflow cannot locate the specified AWS Batch job queue. It usually happens when the job queue does not exist, is not enabled, or there is a region mismatch between the configuration and the AWS Batch environment. + +To resolve this error, ensure you have defined an AWS region in your `nextflow.config` file and that it matches your Batch environment region. + +

Process terminated for an unknown reason

+ +**`Process terminated for an unknown reason -- Likely it has been terminated by the external system`** + +This error typically occurs when AWS Batch is unable to execute the process script. The most common reason is that the specified Docker container image has a non-standard entrypoint that prevents the execution of the Bash launcher script required by Nextflow to run the job. Another possible cause is an issue with the AWS CLI failing to run correctly within the job environment. + +To resolve this error, ensure the Docker container image used for the job does not have a custom entrypoint overriding or preventing Bash from launching and that the AWS CLI is properly installed. + +Check the following logs for more detailed error information: + +- The `.nextflow.log` file +- The Job execution log in the AWS Batch dashboard +- The CloudWatch logs found in the `/aws/batch/job` log group + +

Process stalled in RUNNABLE status

+ +If a process execution is stalled in the RUNNABLE status you may see an output similar to the following: + +``` +executor > awsbatch (1) +process > (1) [ 0%] 0 of .... +``` + +This error occurs when a job remains stuck in the RUNNABLE state in AWS Batch and never progresses to execution. In the AWS Console, the job will be listed as RUNNABLE indefinitely, indicating that it’s waiting to be scheduled but cannot proceed. The root cause is often related to issues with the Compute Environment, Docker configuration, or network settings. + +See [Why is my AWS Batch job stuck in RUNNABLE status?](https://repost.aws/knowledge-center/batch-job-stuck-runnable-status) for several resolutions and tips to investigate this error. ## AWS Fargate diff --git a/docs/cache-and-resume.md b/docs/cache-and-resume.md index 56ad384dae..213da6b15c 100644 --- a/docs/cache-and-resume.md +++ b/docs/cache-and-resume.md @@ -65,6 +65,192 @@ When a previous task is retrieved from the task cache on a resumed run, Nextflow For this reason, it is important to preserve both the task cache (`.nextflow/cache`) and work directories in order to resume runs successfully. You can use the {ref}`cli-clean` command to delete specific runs from the cache. +## Troubleshooting + +Cache failures occur when a task that was supposed to be cached was re-executed or a task that was supposed to be re-executed was cached. This page provides an overview of common causes for cache failures and strategies to identify them. + +Common causes of cache failures include: + +- {ref}`Resume not being enabled ` +- {ref}`Non-default cache directives ` +- {ref}`Modified inputs ` +- {ref}`Inconsistent file attributes ` +- {ref}`Race condition on a global variable ` +- {ref}`Non-deterministic process inputs ` + +The causes of these cache failure and solutions to resolve them are described in detail below. + +(cache-failure-resume)= + +### Resume not enabled + +The `-resume` option is required to resume a pipeline. Ensure `-resume` has been enabled in your run command or your nextflow configuration file. + +(cache-failure-directives)= + +### Non-default cache directives + +The `cache` directive is enabled by default. However, you can disable or modify its behavior for a specific process. For example: + +```nextflow +process FOO { + cache false + // ... +} +``` + +Ensure that the cache has not been set to a non-default value. See {ref}`process-cache` for more information about the `cache` directive. + +(cache-failure-modified)= + +### Modified inputs + +Modifying inputs that are used in the task hash will invalidate the cache. Common causes of modified inputs include: + +- Changing input files +- Resuming from a different session ID +- Changing the process name +- Changing the task container image or Conda environment +- Changing the task script +- Changing a bundled script used by the task + +:::{note} +Changing the value of any directive, except {ref}`process-ext`, will not inactivate the task cache. +::: + +A hash for an input file is calculated from the complete file path, the last modified timestamp, and the file size to calculate. If any of these attributes change the task will be re-executed. If a process modifies its input files it cannot be resumed. Processes that modify their own input files are considered to be an anti-pattern and should be avoided. + +(cache-failure-inconsistent)= + +### Inconsistent file attributes + +Some shared file systems, such as NFS, may report inconsistent file timestamps. If you encounter this problem, use the `'lenient'` {ref}`caching mode ` to ignore the last modified timestamp and only use the file path. + +(cache-global-var-race-condition)= + +### Race condition on a global variable + +Race conditions can in disrupt caching behavior of your pipeline. For example: + +```nextflow +Channel.of(1,2,3) | map { v -> X=v; X+=2 } | view { v -> "ch1 = $v" } +Channel.of(1,2,3) | map { v -> X=v; X*=2 } | view { v -> "ch2 = $v" } +``` + +In the above example, `X` is declared in each `map` closure. Without the `def` keyword, or other type qualifier, the variable `X` is global to the entire script. Operators and executed concurrently and, as `X` is global, there is a *race condition* that causes the emitted values to vary depending on the order of the concurrent operations. If these values were passed to a process as inputs the process would execute different tasks during each run due to the race condition. + +To resolve this failure type, ensure the variable is not global by using a local variable: + +```nextflow +Channel.of(1,2,3) | map { v -> def X=v; X+=2 } | view { v -> "ch1 = $v" } +``` + +Alternatively, remove the variable: + +```nextflow +Channel.of(1,2,3) | map { v -> v * 2 } | view { v -> "ch2 = $v" } +``` + +(cache-nondeterministic-inputs)= + +### Non-deterministic process inputs + +A process that merges inputs from different sources non-deterministically may invalidate the cache. For example: + +```nextflow +workflow { + ch_foo = Channel.of( ['1', '1.foo'], ['2', '2.foo'] ) + ch_bar = Channel.of( ['2', '2.bar'], ['1', '1.bar'] ) + gather(ch_foo, ch_bar) +} +process gather { + input: + tuple val(id), file(foo) + tuple val(id), file(bar) + script: + """ + merge_command $foo $bar + """ +} +``` + +In the above example, the inputs will be merged without matching. This is the same way method used by the {ref}`operator-merge` operator. When merged, the inputs are incorrect, non-deterministic, and invalidate the cache. + +To resolve this failure type, ensure channels are deterministic by joining them before invoking the process: + +```nextflow +workflow { + ch_foo = Channel.of( ['1', '1.foo'], ['2', '2.foo'] ) + ch_bar = Channel.of( ['2', '2.bar'], ['1', '1.bar'] ) + gather(ch_foo.join(ch_bar)) +} +process gather { + input: + tuple val(id), file(foo), file(bar) + script: + """ + merge_command $foo $bar + """ +} +``` + +(cache-failure-compare)= + +## Compare task hashes + +By identifying differences between hashes you can detect changes that may be causing cache failures. + +To compare the task hashes for a resumed run: + +1. Run your pipeline with the `-log` and `-dump-hashes` options: + + ```bash + nextflow -log run_initial.log run -dump-hashes + ``` + +2. Run your pipeline with the `-log`, `-dump-hashes`, and `-resume` options: + + ```bash + nextflow -log run_resumed.log run -dump-hashes -resume + ``` + +3. Extract the task hash lines from each log: + + ```bash + cat run_initial.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d ' ' -f 10- | sort | awk '{ print; print ""; }' > run_initial.tasks.log + cat run_resumed.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d ' ' -f 10- | sort | awk '{ print; print ""; }' > run_resumed.tasks.log + ``` + +4. Compare the runs: + + ```bash + diff run_initial.tasks.log run_resumed.tasks.log + ``` + + :::{tip} + You can also compare the hash lines using a graphical diff viewer. + ::: + +:::{versionadded} 23.10.0 +::: + +Task hashes can also be extracted into a diff using `-dump-hashes json`. The following is an example Bash script to compare two runs and produce a diff: + +```bash +nextflow -log run_1.log run $pipeline -dump-hashes json +nextflow -log run_2.log run $pipeline -dump-hashes json -resume +get_hashes() { + cat $1 \ + | grep 'cache hash:' \ + | cut -d ' ' -f 10- \ + | sort \ + | awk '{ print; print ""; }' +} +get_hashes run_1.log > run_1.tasks.log +get_hashes run_2.log > run_2.tasks.log +diff run_1.tasks.log run_2.tasks.log +``` + ## Resume from a specific run Nextflow resumes from the previous run by default. If you want to resume from an earlier run, simply specify the session ID for that run with the `-resume` option: diff --git a/docs/index.md b/docs/index.md index 2252f01f57..88494a1d5f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -144,15 +144,6 @@ developer/packages developer/plugins ``` -```{toctree} -:hidden: -:caption: Troubleshooting -:maxdepth: 1 -troubleshooting/cache-failures.md -troubleshooting/compute-storage.md -troubleshooting/language-server.md -``` - ```{toctree} :hidden: :caption: Tutorials diff --git a/docs/process.md b/docs/process.md index 7851f295a8..4ed236fc79 100644 --- a/docs/process.md +++ b/docs/process.md @@ -808,7 +808,7 @@ The above example executes the `bar` process three times because `x` is a value ``` :::{note} -In general, multiple input channels should be used to process *combinations* of different inputs, using the `each` qualifier or value channels. Having multiple queue channels as inputs is equivalent to using the {ref}`operator-merge` operator, which is not recommended as it may lead to {ref}`non-deterministic process inputs `. +In general, multiple input channels should be used to process *combinations* of different inputs, using the `each` qualifier or value channels. Having multiple queue channels as inputs is equivalent to using the {ref}`operator-merge` operator, which is not recommended as it may lead to {ref}`non-deterministic process inputs `. ::: See also: {ref}`channel-types`. diff --git a/docs/reference/operator.md b/docs/reference/operator.md index 242dee40c2..ba7b78837e 100644 --- a/docs/reference/operator.md +++ b/docs/reference/operator.md @@ -867,7 +867,7 @@ The `merge` operator may return a queue channel or value channel depending on th - If the first argument is a value channel, the `merge` operator will return a value channel merging the first value from each input, regardless of whether there are queue channel inputs with additional values. :::{danger} -In general, the use of the `merge` operator is discouraged. Processes and channel operators are not guaranteed to emit items in the order that they were received, as they are executed concurrently. Therefore, if you try to merge output channels from different processes, the resulting channel may be different on each run, which will cause resumed runs to {ref}`not work properly `. +In general, the use of the `merge` operator is discouraged. Processes and channel operators are not guaranteed to emit items in the order that they were received, as they are executed concurrently. Therefore, if you try to merge output channels from different processes, the resulting channel may be different on each run, which will cause resumed runs to {ref}`not work properly `. You should always use a matching key (e.g. sample ID) to merge multiple channels, so that they are combined in a deterministic way. For this purpose, you can use the [join](#join) operator. ::: diff --git a/docs/script.md b/docs/script.md index 48b611dd66..eec9320ffe 100644 --- a/docs/script.md +++ b/docs/script.md @@ -45,7 +45,7 @@ println str ``` :::{warning} -Variables can also be declared without `def` in some cases. However, this practice is discouraged outside of simple code snippets because it can lead to a {ref}`race condition `. +Variables can also be declared without `def` in some cases. However, this practice is discouraged outside of simple code snippets because it can lead to a {ref}`race condition `. ::: ## Lists diff --git a/docs/troubleshooting/cache-failures.md b/docs/troubleshooting/cache-failures.md deleted file mode 100644 index 8735059700..0000000000 --- a/docs/troubleshooting/cache-failures.md +++ /dev/null @@ -1,198 +0,0 @@ -(cache-failure-page)= - -# Cache failures - -Cache failures occur when a task that was supposed to be cached was re-executed or a task that was supposed to be re-executed was cached. This page provides an overview of common causes for cache failures and strategies to identify them. - -(cache-failure-common)= - -## Common causes - -Common causes of cache failures include: - -- {ref}`Resume not being enabled ` -- {ref}`Non-default cache directives ` -- {ref}`Modified inputs ` -- {ref}`Inconsistent file attributes ` -- {ref}`Race condition on a global variable ` -- {ref}`Non-deterministic process inputs ` - -Causes of these cache failures and solutions to resolve them are described in detail below. - -(cache-failure-resume)= - -### Resume not enabled - -The `-resume` option is required to resume a pipeline. Ensure `-resume` has been enabled in your run command or your nextflow configuration file. - -(cache-failure-directives)= - -### Non-default cache directives - -The `cache` directive is enabled by default. However, you can disable or modify its behavior for a specific process. For example: - -```nextflow -process FOO { - cache false - // ... -} -``` - -Ensure that the cache has not been set to a non-default value. See {ref}`process-cache` for more information about the `cache` directive. - -(cache-failure-modified)= - -### Modified inputs - -Modifying inputs that are used in the task hash will invalidate the cache. Common causes of modified inputs include: - -- Changing input files -- Resuming from a different session ID -- Changing the process name -- Changing the task container image or Conda environment -- Changing the task script -- Changing a bundled script used by the task - -:::{note} -Changing the value of any directive, except {ref}`process-ext`, will not inactivate the task cache. -::: - -Hashes for input files are calculated from the complete file path, the last modified timestamp, and the file size to calculate. If any of these attributes change, tasks will be re-executed. If a process modifies its input files it cannot be resumed. Processes that modify their own input files are considered to be an anti-pattern and should be avoided. - -(cache-failure-inconsistent)= - -### Inconsistent file attributes - -Some shared file systems, such as NFS, may report inconsistent file timestamps. If you encounter this problem, use the `'lenient'` {ref}`caching mode ` to ignore the last modified timestamp and only use the file path. - -(cache-failure-race-condition)= - -### Race condition on a global variable - -Race conditions can in disrupt caching behavior of your pipeline. For example: - -```nextflow -Channel.of(1,2,3) | map { v -> X=v; X+=2 } | view { v -> "ch1 = $v" } -Channel.of(1,2,3) | map { v -> X=v; X*=2 } | view { v -> "ch2 = $v" } -``` - -In the above example, `X` is declared in each `map` closure. Without the `def` keyword, or other type qualifier, the variable `X` is global to the entire script. Operators and executed concurrently and, as `X` is global, there is a *race condition* that causes emitted values to vary depending on the order of concurrent operations. If these values were passed to a process as inputs the process would execute different tasks during each run due to race conditions. - -To resolve this failure type, ensure variables are not global by using local variables: - -```nextflow -Channel.of(1,2,3) | map { v -> def X=v; X+=2 } | view { v -> "ch1 = $v" } -``` - -Alternatively, remove the variables: - -```nextflow -Channel.of(1,2,3) | map { v -> v * 2 } | view { v -> "ch2 = $v" } -``` - -(cache-failure-nondeterministic)= - -### Non-deterministic process inputs - -Processes that merge inputs from different sources non-deterministically may invalidate the cache. For example: - -```nextflow -workflow { - ch_foo = Channel.of( ['1', '1.foo'], ['2', '2.foo'] ) - ch_bar = Channel.of( ['2', '2.bar'], ['1', '1.bar'] ) - gather(ch_foo, ch_bar) -} - -process gather { - input: - tuple val(id), file(foo) - tuple val(id), file(bar) - - script: - """ - merge_command $foo $bar - """ -} -``` - -In the above example, inputs will be merged without matching. This is the same way method used by the {ref}`operator-merge` operator. When merged, the inputs are incorrect, non-deterministic, and invalidate the cache. - -To resolve this failure type, ensure channels are deterministic by joining them before invoking the process: - -```nextflow -workflow { - ch_foo = Channel.of( ['1', '1.foo'], ['2', '2.foo'] ) - ch_bar = Channel.of( ['2', '2.bar'], ['1', '1.bar'] ) - gather(ch_foo.join(ch_bar)) -} - -process gather { - input: - tuple val(id), file(foo), file(bar) - - script: - """ - merge_command $foo $bar - """ -} -``` - -(cache-failure-compare)= - -## Compare task hashes - -By identifying differences between hashes you can detect changes that may be causing cache failures. - -To compare the task hashes for a resumed run: - -1. Run your pipeline with the `-log` and `-dump-hashes` options: - - ```bash - nextflow -log run_initial.log run -dump-hashes - ``` - -2. Run your pipeline with the `-log`, `-dump-hashes`, and `-resume` options: - - ```bash - nextflow -log run_resumed.log run -dump-hashes -resume - ``` - -3. Extract the task hash lines from each log: - - ```bash - cat run_initial.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d ' ' -f 10- | sort | awk '{ print; print ""; }' > run_initial.tasks.log - cat run_resumed.log | grep 'INFO.*TaskProcessor.*cache hash' | cut -d ' ' -f 10- | sort | awk '{ print; print ""; }' > run_resumed.tasks.log - ``` - -4. Compare the runs: - - ```bash - diff run_initial.tasks.log run_resumed.tasks.log - ``` - - :::{tip} - You can also compare the hash lines using a graphical diff viewer. - ::: - -:::{versionadded} 23.10.0 -::: - -Task hashes can also be extracted into a diff using `-dump-hashes json`. The following is an example Bash script to compare two runs and produce a diff: - -```bash -nextflow -log run_1.log run $pipeline -dump-hashes json -nextflow -log run_2.log run $pipeline -dump-hashes json -resume - -get_hashes() { - cat $1 \ - | grep 'cache hash:' \ - | cut -d ' ' -f 10- \ - | sort \ - | awk '{ print; print ""; }' -} - -get_hashes run_1.log > run_1.tasks.log -get_hashes run_2.log > run_2.tasks.log - -diff run_1.tasks.log run_2.tasks.log -``` diff --git a/docs/troubleshooting/compute-storage.md b/docs/troubleshooting/compute-storage.md deleted file mode 100644 index 72e4549c05..0000000000 --- a/docs/troubleshooting/compute-storage.md +++ /dev/null @@ -1,44 +0,0 @@ -(compute-storage-page)= - -# Compute and storage - -This page describes common compute and storage errors and strategies to resolve them. - -(aws-compute-storage)= - -## Amazon Web Services - -### Job queue not found - -**`JobQueue not found`** - -This error occurs when Nextflow cannot locate the specified AWS Batch job queue. It usually happens when job queues do not exist, are not enabled, or there is a region mismatch between the configuration and the AWS Batch environment. - -To resolve this error, ensure you have defined an AWS region in your `nextflow.config` file and that it matches your Batch environment region. - -### Process terminated for an unknown reason - -**`Process terminated for an unknown reason -- Likely it has been terminated by the external system`** - -This error typically occurs when AWS Batch is unable to execute the process script. The most common reason is that the specified Docker container image has a non-standard entrypoint that prevents the execution of the Bash launcher script required by Nextflow to run the job. Another possible cause is an issue with the AWS CLI failing to run correctly within the job environment. - -To resolve this error, ensure the Docker container image used for the job does not have a custom entrypoint overriding or preventing Bash from launching and that the AWS CLI is properly installed. - -Check the following logs for more detailed error information: - -- `.nextflow.log` file -- Job execution log in the AWS Batch dashboard -- CloudWatch logs found in the `/aws/batch/job` log group - -### Process stalled in RUNNABLE status - -If a process execution is stalled in the RUNNABLE status you may see an output similar to the following: - -``` -executor > awsbatch (1) -process > (1) [ 0%] 0 of .... -``` - -This error occurs when a job remains stuck in the RUNNABLE state in AWS Batch and never progresses to execution. In the AWS Console, the job will be listed as RUNNABLE indefinitely, indicating that it’s waiting to be scheduled but cannot proceed. The root cause is often related to issues with the Compute Environment, Docker configuration, or network settings. - -See [Why is my AWS Batch job stuck in RUNNABLE status?](https://repost.aws/knowledge-center/batch-job-stuck-runnable-status) for several resolutions and tips to investigate this error. diff --git a/docs/troubleshooting/language-server.md b/docs/troubleshooting/language-server.md deleted file mode 100644 index 5b2a1ea7a2..0000000000 --- a/docs/troubleshooting/language-server.md +++ /dev/null @@ -1,52 +0,0 @@ -(language-server-errors-page)= - -# Language server errors - -This page describes common language server errors and strategies to resolve them. - -## Common errors - -### Filesystem changes - -The language server does not detect certain filesystem changes. For example, changing the current Git branch. - -To resolve this issue, restart the language server from the command palette to sync it with your workspace. See [Stop and restart](#stop-and-restart) for more information. - -### Third-party plugins - -The language server does not recognize configuration options from third-party plugins and will report unrecognized config option warnings. There is currently no solution to suppress them. - -### Groovy scripts - -The language server provides limited support for Groovy scripts in the lib directory. Errors in Groovy scripts are not reported as diagnostics, and changing a Groovy script does not automatically re-compile the Nextflow scripts that reference it. - -To resolve this issue, edit or close and re-open the Nextflow script to refresh the diagnostics. - -## Stop and restart - -In the event of an error, stop or restart the language server from the Command Palette. The following stop and restart commands are available: - -- `Nextflow: Stop language server` -- `Nextflow: Restart language server` - -See {ref}`vscode-commands` for a fill list of Nextflow VS Code extension commands. - -## View logs - -Error logs can be useful for troubleshooting errors. - -To view logs in VS Code: - -1. Open the **Output** tab in your console. -2. Select **Nextflow Language Server** from the dropdown. - -To show additional log messages in VS Code: - -1. Open the **Extensions** view in the left-hand menu. -2. Select the **Nextflow** extension. -3. Select the **Manage** icon. -3. Enable **Nextflow > Debug** in the extension settings. - -## Report an issue - -Report issues at [`nextflow-io/vscode-language-nextflow`](https://github.com/nextflow-io/vscode-language-nextflow) or [`nextflow-io/language-server`](https://github.com/nextflow-io/language-server). When reporting issues, include a minimal code snippet that reproduces the issue and any error logs from the server. diff --git a/docs/vscode.md b/docs/vscode.md index a15b3fc5bd..1981123b48 100644 --- a/docs/vscode.md +++ b/docs/vscode.md @@ -62,7 +62,52 @@ The **Preview DAG** CodeLens is only available when the script does not contain ## Troubleshooting -See {ref}`language-server-errors-page` for common language server limitations, errors, and strategies to resolve them. +### Common issues + +

Filesystem changes

+ +The language server does not detect certain filesystem changes. For example, changing the current Git branch. + +To resolve this issue, restart the language server from the command palette to sync it with your workspace. See [Stop and restart](#stop-and-restart) for more information. + +

Third-party plugins

+ +The language server does not recognize configuration options from third-party plugins and will report unrecognized config option warnings. There is currently no solution to suppress them. + +

Groovy scripts

+ +The language server provides limited support for Groovy scripts in the lib directory. Errors in Groovy scripts are not reported as diagnostics, and changing a Groovy script does not automatically re-compile the Nextflow scripts that reference it. + +To resolve this issue, edit or close and re-open the Nextflow script to refresh the diagnostics. + +## Stop and restart + +In the event of an error, stop or restart the language server from the Command Palette. The following stop and restart commands are available: + +- `Nextflow: Stop language server` +- `Nextflow: Restart language server` + +See {ref}`vscode-commands` for a fill list of Nextflow VS Code extension commands. + +## View logs + +Error logs can be useful for troubleshooting errors. + +To view logs in VS Code: + +1. Open the **Output** tab in your console. +2. Select **Nextflow Language Server** from the dropdown. + +To show additional log messages in VS Code: + +1. Open the **Extensions** view in the left-hand menu. +2. Select the **Nextflow** extension. +3. Select the **Manage** icon. +3. Enable **Nextflow > Debug** in the extension settings. + +## Report an issue + +Report issues at [`nextflow-io/vscode-language-nextflow`](https://github.com/nextflow-io/vscode-language-nextflow) or [`nextflow-io/language-server`](https://github.com/nextflow-io/language-server). When reporting issues, include a minimal code snippet that reproduces the issue and any error logs from the server. (vscode-commands)= From 951fabf621b99b65bfb9048139cd78b43ae96962 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Tue, 11 Mar 2025 04:01:19 +1300 Subject: [PATCH 04/10] Fix placeholder Signed-off-by: Christopher Hakkaart --- docs/aws.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/aws.md b/docs/aws.md index 9a6b658a44..99cb280063 100644 --- a/docs/aws.md +++ b/docs/aws.md @@ -478,7 +478,7 @@ The above snippet defines two volume mounts for the jobs executed in your pipeli

Job queue not found

-**`JobQueue not found`** +**`JobQueue not found`** This error occurs when Nextflow cannot locate the specified AWS Batch job queue. It usually happens when the job queue does not exist, is not enabled, or there is a region mismatch between the configuration and the AWS Batch environment. From 929cb56e3ab8be11e79cc61aeac63defe569eba3 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Tue, 11 Mar 2025 04:15:43 +1300 Subject: [PATCH 05/10] Update link Signed-off-by: Christopher Hakkaart --- docs/cache-and-resume.md | 2 +- docs/reference/cli.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/cache-and-resume.md b/docs/cache-and-resume.md index 213da6b15c..2d8f80165a 100644 --- a/docs/cache-and-resume.md +++ b/docs/cache-and-resume.md @@ -194,7 +194,7 @@ process gather { } ``` -(cache-failure-compare)= +(cache-compare-hashes)= ## Compare task hashes diff --git a/docs/reference/cli.md b/docs/reference/cli.md index 3dd97852b1..4f94d17571 100644 --- a/docs/reference/cli.md +++ b/docs/reference/cli.md @@ -980,7 +980,7 @@ The `run` command is used to execute a local pipeline script or remote pipeline `-dump-hashes` : Dump task hash keys for debugging purposes. : :::{versionadded} 23.10.0 - You can use `-dump-hashes json` to dump the task hash keys as JSON for easier post-processing. See the {ref}`cache-failure-compare` for more details. + You can use `-dump-hashes json` to dump the task hash keys as JSON for easier post-processing. See the {ref}`cache-compare-hashes` for more details. ::: `-e.=` From 3dc2b9b1e39e98d91cf2b25530619426437818db Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Wed, 12 Mar 2025 09:19:16 +1300 Subject: [PATCH 06/10] Add tip section Signed-off-by: Christopher Hakkaart --- docs/cache-and-resume.md | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/docs/cache-and-resume.md b/docs/cache-and-resume.md index 2d8f80165a..bb0257520d 100644 --- a/docs/cache-and-resume.md +++ b/docs/cache-and-resume.md @@ -196,7 +196,19 @@ process gather { (cache-compare-hashes)= -## Compare task hashes +## Tips + +## Resume from a specific run + +Nextflow resumes from the previous run by default. If you want to resume from an earlier run, simply specify the session ID for that run with the `-resume` option: + +```bash +nextflow run rnaseq-nf -resume 4dc656d2-c410-44c8-bc32-7dd0ea87bebf +``` + +You can use the {ref}`cli-log` command to view all previous runs as well as the task executions for each run. + +### Compare task hashes By identifying differences between hashes you can detect changes that may be causing cache failures. @@ -250,13 +262,3 @@ get_hashes run_1.log > run_1.tasks.log get_hashes run_2.log > run_2.tasks.log diff run_1.tasks.log run_2.tasks.log ``` - -## Resume from a specific run - -Nextflow resumes from the previous run by default. If you want to resume from an earlier run, simply specify the session ID for that run with the `-resume` option: - -```bash -nextflow run rnaseq-nf -resume 4dc656d2-c410-44c8-bc32-7dd0ea87bebf -``` - -You can use the {ref}`cli-log` command to view all previous runs as well as the task executions for each run. From 45842dc475c36e2d8209b251665eab05980a8a64 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Wed, 12 Mar 2025 11:02:09 +1300 Subject: [PATCH 07/10] Move headings down a level Signed-off-by: Christopher Hakkaart --- docs/vscode.md | 82 +++++++++++++++++++++++++------------------------- 1 file changed, 41 insertions(+), 41 deletions(-) diff --git a/docs/vscode.md b/docs/vscode.md index 1981123b48..d758c9ae8f 100644 --- a/docs/vscode.md +++ b/docs/vscode.md @@ -60,27 +60,45 @@ To preview the DAG of a workflow, select the **Preview DAG** CodeLens above the The **Preview DAG** CodeLens is only available when the script does not contain any errors. ::: -## Troubleshooting +(vscode-commands)= -### Common issues +## Commands -

Filesystem changes

+The following commands are available from the Command Palette: -The language server does not detect certain filesystem changes. For example, changing the current Git branch. +- Restart language server +- Stop language server -To resolve this issue, restart the language server from the command palette to sync it with your workspace. See [Stop and restart](#stop-and-restart) for more information. +(vscode-settings)= -

Third-party plugins

+## Settings -The language server does not recognize configuration options from third-party plugins and will report unrecognized config option warnings. There is currently no solution to suppress them. +The following settings are available: -

Groovy scripts

+`nextflow.debug` +: Enable debug logging and debug information in hover hints. -The language server provides limited support for Groovy scripts in the lib directory. Errors in Groovy scripts are not reported as diagnostics, and changing a Groovy script does not automatically re-compile the Nextflow scripts that reference it. +`nextflow.files.exclude` +: Configure glob patterns for excluding folders from being searched for Nextflow scripts and configuration files. -To resolve this issue, edit or close and re-open the Nextflow script to refresh the diagnostics. +`nextflow.formatting.harshilAlignment` +: Use the [Harshil Alignment™️](https://nf-co.re/docs/contributing/code_editors_and_styling/harshil_alignment) when formatting Nextflow scripts and config files. + +`nextflow.java.home` +: Specifies the folder path to the JDK. Use this setting if the extension cannot find Java automatically. + +`nextflow.paranoidWarnings` +: Enable additional warnings for future deprecations, potential problems, and other discouraged patterns. + +## Language server + +Most of the functionality of the VS Code extension is provided by the [Nextflow language server](https://github.com/nextflow-io/language-server), which implements the [Language Server Protocol (LSP)](https://microsoft.github.io/language-server-protocol/) for Nextflow scripts and config files. + +The language server is distributed as a standalone Java application. It can be integrated with any editor that functions as an LSP client. Currently, only the VS Code integration is officially supported, but community contributions for other editors are welcome. + +## Troubleshooting -## Stop and restart +### Stop and restart In the event of an error, stop or restart the language server from the Command Palette. The following stop and restart commands are available: @@ -89,7 +107,7 @@ In the event of an error, stop or restart the language server from the Command P See {ref}`vscode-commands` for a fill list of Nextflow VS Code extension commands. -## View logs +### View logs Error logs can be useful for troubleshooting errors. @@ -105,42 +123,24 @@ To show additional log messages in VS Code: 3. Select the **Manage** icon. 3. Enable **Nextflow > Debug** in the extension settings. -## Report an issue +### Common errors -Report issues at [`nextflow-io/vscode-language-nextflow`](https://github.com/nextflow-io/vscode-language-nextflow) or [`nextflow-io/language-server`](https://github.com/nextflow-io/language-server). When reporting issues, include a minimal code snippet that reproduces the issue and any error logs from the server. - -(vscode-commands)= - -## Commands - -The following commands are available from the Command Palette: - -- Restart language server -- Stop language server - -(vscode-settings)= - -## Settings +

Filesystem changes

-The following settings are available: +The language server does not detect certain filesystem changes. For example, changing the current Git branch. -`nextflow.debug` -: Enable debug logging and debug information in hover hints. +To resolve this issue, restart the language server from the command palette to sync it with your workspace. See [Stop and restart](#stop-and-restart) for more information. -`nextflow.files.exclude` -: Configure glob patterns for excluding folders from being searched for Nextflow scripts and configuration files. +

Third-party plugins

-`nextflow.formatting.harshilAlignment` -: Use the [Harshil Alignment™️](https://nf-co.re/docs/contributing/code_editors_and_styling/harshil_alignment) when formatting Nextflow scripts and config files. +The language server does not recognize configuration options from third-party plugins and will report unrecognized config option warnings. There is currently no solution to suppress them. -`nextflow.java.home` -: Specifies the folder path to the JDK. Use this setting if the extension cannot find Java automatically. +

Groovy scripts

-`nextflow.paranoidWarnings` -: Enable additional warnings for future deprecations, potential problems, and other discouraged patterns. +The language server provides limited support for Groovy scripts in the lib directory. Errors in Groovy scripts are not reported as diagnostics, and changing a Groovy script does not automatically re-compile the Nextflow scripts that reference it. -## Language server +To resolve this issue, edit or close and re-open the Nextflow script to refresh the diagnostics. -Most of the functionality of the VS Code extension is provided by the [Nextflow language server](https://github.com/nextflow-io/language-server), which implements the [Language Server Protocol (LSP)](https://microsoft.github.io/language-server-protocol/) for Nextflow scripts and config files. +### Report an issue -The language server is distributed as a standalone Java application. It can be integrated with any editor that functions as an LSP client. Currently, only the VS Code integration is officially supported, but community contributions for other editors are welcome. +Report issues at [`nextflow-io/vscode-language-nextflow`](https://github.com/nextflow-io/vscode-language-nextflow) or [`nextflow-io/language-server`](https://github.com/nextflow-io/language-server). When reporting issues, include a minimal code snippet that reproduces the issue and any error logs from the server. From e32c39c39646c4a28dc22d2d4dd520d796394c08 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Wed, 12 Mar 2025 12:38:18 +1300 Subject: [PATCH 08/10] Fix missed heading Signed-off-by: Christopher Hakkaart --- docs/cache-and-resume.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/cache-and-resume.md b/docs/cache-and-resume.md index bb0257520d..c16d7fab5d 100644 --- a/docs/cache-and-resume.md +++ b/docs/cache-and-resume.md @@ -198,7 +198,7 @@ process gather { ## Tips -## Resume from a specific run +### Resume from a specific run Nextflow resumes from the previous run by default. If you want to resume from an earlier run, simply specify the session ID for that run with the `-resume` option: From dcae1812baa0dad24baf2dcc64ad830f0d5d2cd3 Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Wed, 12 Mar 2025 12:47:46 +1300 Subject: [PATCH 09/10] Prepare for review Signed-off-by: Christopher Hakkaart --- docs/vscode.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/vscode.md b/docs/vscode.md index d758c9ae8f..1a814d9801 100644 --- a/docs/vscode.md +++ b/docs/vscode.md @@ -141,6 +141,6 @@ The language server provides limited support for Groovy scripts in the lib direc To resolve this issue, edit or close and re-open the Nextflow script to refresh the diagnostics. -### Report an issue +### Reporting issues -Report issues at [`nextflow-io/vscode-language-nextflow`](https://github.com/nextflow-io/vscode-language-nextflow) or [`nextflow-io/language-server`](https://github.com/nextflow-io/language-server). When reporting issues, include a minimal code snippet that reproduces the issue and any error logs from the server. +Report issues at [nextflow-io/vscode-language-nextflow](https://github.com/nextflow-io/vscode-language-nextflow) or [nextflow-io/language-server](https://github.com/nextflow-io/language-server). When reporting issues, include a minimal code snippet that reproduces the issue and any error logs from the server. From aafa33a6433a9e56872f63cdbe68091f3d28ce3a Mon Sep 17 00:00:00 2001 From: Christopher Hakkaart Date: Wed, 21 May 2025 16:54:21 +1200 Subject: [PATCH 10/10] Remove link to removed section Signed-off-by: Christopher Hakkaart --- docs/vscode.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/vscode.md b/docs/vscode.md index 24278ad19e..83a107a0cb 100644 --- a/docs/vscode.md +++ b/docs/vscode.md @@ -71,8 +71,6 @@ In the event of an error, stop or restart the language server from the Command P - `Nextflow: Stop language server` - `Nextflow: Restart language server` -See {ref}`vscode-commands` for a fill list of Nextflow VS Code extension commands. - ### View logs Error logs can be useful for troubleshooting errors.