Skip to content

seqeralabs/nf-training-intro

Repository files navigation

Go with the (Next)flow!

Welcome to our introductory session on Nextflow!

Nextflow is a powerful workflow language designed to streamline complex computational workflows, often used in fields like bioinformatics. The goal of this training workshop is not to transform you overnight into a coding expert or bioinformatics scientist. Instead, we aim to highlight the key features and capabilities offered by using Nextflow with a set of very simple, relatable examples that should be digestible by anyone!

By the end of this workshop, you will grasp the essentials as to why Nextflow is a leading solution for managing large-scale data analysis and how it empowers users to achieve remarkable scientific breakthroughs with efficiency and flexibility.

More specifically, this session will cover the following key capabilities of Nextflow:

  • Scalability
  • Parallelism
  • Reproducibility
  • Resumability
  • Reporting
  • Flexibility

If you walk away from this workshop being able to understand and communicate these advantages to others then we will consider the workshop a success!

Getting started

For this workshop, we are going to be using a tool called GitPod, which provides us with a fully managed environment to deliver the training. You will need a GitHub account, so if you don't have one, go to GitHub and create one for yourself.

Open the GitPod environment

You can begin a new Gitpod environment by following the link below:

Open in GitPod

This will bring you to the Gitpod launch screen where you can click "Continue":

Gitpod launch

Solving critter confusion

The background

In the different sections of this tutorial, we will explore how to run an existing classification model using OpenAI's Contrastive Language–Image Pre-training (CLIP). It is a versatile tool that can understand and classify images based on natural language descriptions. Again don't worry, the previous sentence is a mouthful of random words to most of us - all will become clear as we progress through the tutorial!

The challenge

We have intentionally picked a use case quite different from the usual scientific workloads, to make the transferability of the concepts more relatable as we progress through the workshop.

You have been given a set of pictures of your colleagues' animals, and you want to be able to classify them, so you can make attractive collages of them to present at your company retreat. Unfortunately, you are an alien recently arrived on Earth and you don't know the difference between cats, dogs and spiders. Fortunately for you, you can enlist the help of AI.

The aim

To make this workshop palatable for everyone we have already:

  • Dumped a bunch of animal pictures in the data/ folder.
  • Written a simple Python script called classify.py, which uses CLIP to classify the bloomin' critters.

Your mission should you choose to accept is going to be to:

  • Use CLIP to assign critters to classes based on a list of labels you provide.
  • Make a collage of the critters in each class.
  • Combine the collages to create a single, glorious critter cornucopia.

nf-training-intro metro map

Exercises

This workshop has been split up into 4 sections with decreasing manual intervention to highlight the key strengths of Nextflow. The last section will help to contextualize how we can leverage Nextflow in combination with the Seqera Platform to solve challenges like scalability and parallelism.

Once you have successfully provisioned a GitPod environment (see Getting Started) please complete the following critter classification sections in order:

Summary

Congratulations on completing Go with the (Next)flow! 🎉 You've now gained a base-level understanding of Nextflow, its integration within the Platform, and why Seqera is the preferred choice for our customers.

In summary, you should have a concrete understanding of the key capabilities of Nextflow:

  • Scalability: Nextflow allows for scalability from launching in a GitPod environment to cloud environments.
  • Parallelism: Tasks like classifying pictures can be run at the same time.
  • Reproducibility: Nextflow pipelines can be used over and over, even when the input changes (for example, adding different pictures to classify, or adding new prompts).
  • Resumability: If the pipeline fails, you can easily pick back up and don't need to start from scratch - for example, you can run Collage without having to run Resize again.
  • Reporting: Nextflow provides logs that you can examine to troubleshoot and Seqera Platform allows you to view reports quickly.
  • Flexibility: You can make selections based on what you need done - adding "best_dog" to the prompt.

Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 12