Skip to content

Commit 7439ce2

Browse files
Update your first script (#5664)
Signed-off-by: Christopher Hakkaart <chris.hakkaart@seqera.io> Co-authored-by: Ben Sherman <bentshermann@gmail.com>
1 parent dc6cc41 commit 7439ce2

File tree

1 file changed

+161
-69
lines changed

1 file changed

+161
-69
lines changed

docs/your-first-script.md

Lines changed: 161 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -2,115 +2,207 @@
22

33
# Your first script
44

5+
This guide details fundamental skills to run a basic Nextflow pipeline. It includes:
6+
7+
- Running a pipeline
8+
- Modifying and resuming a pipeline
9+
- Configuring a pipeline parameter
10+
11+
<h3>Prerequisites</h3>
12+
13+
You will need the following to get started:
14+
15+
- Nextflow. See {ref}`install-page` for instructions to install or update your version of Nextflow.
16+
517
## Run a pipeline
618

7-
This script defines two processes. The first splits a string into 6-character chunks, writing each one to a file with the prefix `chunk_`, and the second receives these files and transforms their contents to uppercase letters. The resulting strings are emitted on the `result` channel and the final output is printed by the `view` operator. Copy the following example into your favorite text editor and save it to a file named `tutorial.nf`:
19+
You will run a basic Nextflow pipeline that splits a string of text into two files and then converts lowercase letters to uppercase letters. You can see the pipeline here:
20+
21+
```{code-block} groovy
22+
:class: copyable
23+
// Default parameter input
24+
params.str = "Hello world!"
25+
26+
// splitString process
27+
process splitString {
28+
publishDir "results/lower"
29+
30+
input:
31+
val x
32+
33+
output:
34+
path 'chunk_*'
35+
36+
script:
37+
"""
38+
printf '${x}' | split -b 6 - chunk_
39+
"""
40+
}
841
9-
```{literalinclude} snippets/your-first-script.nf
10-
:language: nextflow
42+
// convertToUpper process
43+
process convertToUpper {
44+
publishDir "results/upper"
45+
tag "$y"
46+
47+
input:
48+
path y
49+
50+
output:
51+
path 'upper_*'
52+
53+
script:
54+
"""
55+
cat $y | tr '[a-z]' '[A-Z]' > upper_${y}
56+
"""
57+
}
58+
59+
// Workflow block
60+
workflow {
61+
ch_str = Channel.of(params.str) // Create a channel using parameter input
62+
ch_chunks = splitString(ch_str) // Split string into chunks and create a named channel
63+
convertToUpper(ch_chunks.flatten()) // Convert lowercase letters to uppercase letters
64+
}
1165
```
1266

13-
Execute the script by entering the following command in your terminal:
67+
This script defines two processes:
68+
69+
- `splitString`: takes a string input, splits it into 6-character chunks, and writes the chunks to files with the prefix `chunk_`
70+
- `convertToUpper`: takes files as input, transforms their contents to uppercase letters, and writes the uppercase strings to files with the prefix `upper_`
71+
72+
The `splitString` output is emitted as a single element. The `flatten` operator splits this combined element so that each file is treated as a sole element.
73+
74+
The outputs from both processes are published in subdirectories, that is, `lower` and `upper`, in the `results` directory.
75+
76+
To run your pipeline:
77+
78+
1. Create a new file named `main.nf` in your current directory
79+
2. Copy and save the above pipeline to your new file
80+
3. Run your pipeline using the following command:
81+
82+
```{code-block}
83+
:class: copyable
84+
nextflow run main.nf
85+
```
86+
87+
You will see output similar to the following:
1488
1589
```console
16-
$ nextflow run tutorial.nf
90+
N E X T F L O W ~ version 24.10.3
91+
92+
Launching `main.nf` [big_wegener] DSL2 - revision: 13a41a8946
1793
18-
N E X T F L O W ~ version 23.10.0
1994
executor > local (3)
20-
[69/c8ea4a] process > splitLetters [100%] 1 of 1 ✔
21-
[84/c8b7f1] process > convertToUpper [100%] 2 of 2 ✔
22-
HELLO
23-
WORLD!
95+
[82/457482] splitString (1) | 1 of 1 ✔
96+
[2f/056a98] convertToUpper (chunk_aa) | 2 of 2 ✔
2497
```
2598

26-
:::{note}
27-
For versions of Nextflow prior to `22.10.0`, you must explicitly enable DSL2 by adding `nextflow.enable.dsl=2` to the top of the script or by using the `-dsl2` command-line option.
28-
:::
29-
30-
You can see that the first process is executed once, and the second twice. Finally the result string is printed.
99+
Nextflow creates a `work` directory to store files used during a pipeline run. Each execution of a process is run as a separate task. The `splitString` process is run as one task and the `convertToUpper` process is run as two tasks. The hexadecimal string, for example, `82/457482`, is the beginning of a unique hash. It is a prefix used to identify the task directory where the script was executed.
31100

32-
It's worth noting that the process `convertToUpper` is executed in parallel, so there's no guarantee that the instance processing the first split (the chunk `Hello`) will be executed before the one processing the second split (the chunk `world!`). Thus, you may very likely see the final result printed in a different order:
101+
:::{tip}
102+
Run your pipeline with `-ansi-log false` to see each task printed on a separate line:
33103

104+
```{code-block} bash
105+
:class: copyable
106+
nextflow run main.nf -ansi-log false
34107
```
35-
WORLD!
36-
HELLO
108+
109+
You will see output similar to the following:
110+
111+
```console
112+
N E X T F L O W ~ version 24.10.3
113+
Launching `main.nf` [peaceful_watson] DSL2 - revision: 13a41a8946
114+
[43/f1f8b5] Submitted process > splitString (1)
115+
[a2/5aa4b1] Submitted process > convertToUpper (chunk_ab)
116+
[30/ba7de0] Submitted process > convertToUpper (chunk_aa)
37117
```
38118

39-
:::{tip}
40-
The hexadecimal string, e.g. `22/7548fa`, is the unique hash of a task, and the prefix of the directory where the task is executed. You can inspect a task's files by changing to the directory `$PWD/work` and using this string to find the specific task directory.
41-
:::
119+
:::
42120

43121
(getstarted-resume)=
44122

45123
## Modify and resume
46124

47-
Nextflow keeps track of all the processes executed in your pipeline. If you modify some parts of your script, only the processes that are actually changed will be re-executed. The execution of the processes that are not changed will be skipped and the cached result used instead. This helps a lot when testing or modifying part of your pipeline without having to re-execute it from scratch.
125+
Nextflow tracks task executions in a task cache, a key-value store of previously executed tasks. The task cache is used in conjunction with the work directory to recover cached tasks. If you modify and resume your pipeline, only the processes that are changed will be re-executed. The cached results will be used for tasks that don't change.
48126

49-
For the sake of this tutorial, modify the `convertToUpper` process in the previous example, replacing the process script with the string `rev $x`, like so:
127+
You can enable resumability using the `-resume` flag when running a pipeline. To modify and resume your pipeline:
50128

51-
```nextflow
52-
process convertToUpper {
53-
input:
54-
path x
55-
output:
56-
stdout
57-
58-
script:
59-
"""
60-
rev $x
61-
"""
62-
}
63-
```
129+
1. Open `main.nf`
130+
2. Replace the `convertToUpper` process with the following:
64131

65-
Then save the file with the same name, and execute it by adding the `-resume` option to the command line:
132+
```{code-block} groovy
133+
:class: copyable
134+
process convertToUpper {
135+
publishDir "results/upper"
136+
tag "$y"
66137
67-
```bash
68-
nextflow run tutorial.nf -resume
69-
```
138+
input:
139+
path y
70140
71-
It will print output similar to this:
141+
output:
142+
path 'upper_*'
143+
144+
script:
145+
"""
146+
rev $y > upper_${y}
147+
"""
148+
}
149+
```
150+
151+
3. Save your changes
152+
4. Run your updated pipeline using the following command:
153+
154+
```{code-block} bash
155+
:class: copyable
156+
nextflow run main.nf -resume
157+
```
158+
159+
You will see output similar to the following:
160+
161+
```console
162+
N E X T F L O W ~ version 24.10.3
163+
164+
Launching `main.nf` [furious_curie] DSL2 - revision: 5490f13c43
72165
73-
```
74-
N E X T F L O W ~ version 23.10.0
75166
executor > local (2)
76-
[69/c8ea4a] process > splitLetters [100%] 1 of 1, cached: 1 ✔
77-
[d0/e94f07] process > convertToUpper [100%] 2 of 2 ✔
78-
olleH
79-
!dlrow
167+
[82/457482] splitString (1) | 1 of 1, cached: 1 ✔
168+
[02/9db40b] convertToUpper (chunk_aa) | 2 of 2 ✔
80169
```
81170

82-
You will see that the execution of the process `splitLetters` is actually skipped (the process ID is the same), and its results are retrieved from the cache. The second process is executed as expected, printing the reversed strings.
83-
84-
:::{tip}
85-
The pipeline results are cached by default in the directory `$PWD/work`. Depending on your script, this folder can take up a lot of disk space. It's a good idea to clean this folder periodically, as long as you know you won't need to resume any pipeline runs.
86-
:::
171+
Nextflow skips the execution of the `splitString` process and retrieves the results from the cache. The `convertToUpper` process is executed twice.
87172

88-
For more information, see the {ref}`cache-resume-page` page.
173+
See {ref}`cache-resume-page` for more information about Nextflow cache and resume functionality.
89174

90175
(getstarted-params)=
91176

92177
## Pipeline parameters
93178

94-
Pipeline parameters are simply declared by prepending to a variable name the prefix `params`, separated by dot character. Their value can be specified on the command line by prefixing the parameter name with a double dash character, i.e. `--paramName`
179+
Parameters are used to control the inputs to a pipeline. They are declared by prepending a variable name to the prefix `params`, separated by dot character. Parameters can be specified on the command line by prefixing the parameter name with a double dash character, for example, `--paramName`. Parameters specified on the command line override parameters specified in a main script.
95180

96-
For the sake of this tutorial, you can try to execute the previous example specifying a different input string parameter, as shown below:
181+
You can configure the `str` parameter in your pipeline. To modify your `str` parameter:
97182

98-
```bash
99-
nextflow run tutorial.nf --str 'Bonjour le monde'
100-
```
183+
1. Run your pipeline using the following command:
101184

102-
The string specified on the command line will override the default value of the parameter. The output will look like this:
185+
```{code-block} bash
186+
:class: copyable
187+
nextflow run main.nf --str 'Bonjour le monde'
188+
```
189+
190+
You will see output similar to the following:
191+
192+
```console
193+
N E X T F L O W ~ version 24.10.3
194+
195+
Launching `main.nf` [distracted_kalam] DSL2 - revision: 082867d4d6
103196
104-
```
105-
N E X T F L O W ~ version 23.10.0
106197
executor > local (4)
107-
[8b/16e7d7] process > splitLetters [100%] 1 of 1 ✔
108-
[eb/729772] process > convertToUpper [100%] 3 of 3 ✔
109-
m el r
110-
edno
111-
uojnoB
198+
[55/a3a700] process > splitString (1) [100%] 1 of 1 ✔
199+
[f4/af5ddd] process > convertToUpper (chunk_ac) [100%] 3 of 3 ✔
112200
```
113201

114-
:::{versionchanged} 20.11.0-edge
115-
Any `.` (dot) character in a parameter name is interpreted as the delimiter of a nested scope. For example, `--foo.bar Hello` will be interpreted as `params.foo.bar`. If you want to have a parameter name that contains a `.` (dot) character, escape it using the back-slash character, e.g. `--foo\.bar Hello`.
116-
:::
202+
The input string is now longer and the `splitString` process splits it into three chunks. The `convertToUpper` process is run three times.
203+
204+
See {ref}`cli-params` for more information about modifying pipeline parameters.
205+
206+
<h2>Next steps</h2>
207+
208+
Your first script is a brief introduction to running pipelines, modifying and resuming pipelines, and pipeline parameters. See [training.nextflow.io](https://training.nextflow.io/) for further Nextflow training modules.

0 commit comments

Comments
 (0)