EBI Genome Bioinformatics: Scaling Things Up

This is the code repository used for the "Scaling Things Up" section of the EBI course Genome Bioinformatics, named in previous years as "NGS Bioinformatics".

This sections follows the previous 3 days of the course, where command line tools and basic bioinformatics commands to index files and align fastqs to a reference genome have been acquired. Here we focus on reusing the commands learnt during previous days, to run the same commands using parallelisation and job scheduling.

The following README is a copy of the 2021 Google Docs walkthrough of the interactive part of the session.

Parallelisation

Run git clone on this repository
Go into the folder you just cloned, and then inside the “Parallelisation” folder
Open the align_all_extra_fqs.sh script. What do you think the script will do?
Do you think the script will take a long time to run? What command could we use to time how long a script takes?

Modify the script so that instead of running each alignment, it echos the align command to a file we will call align_commands.sh

If you try but still can’t do it, you can use the correction in the align_all_extra_fqs__correction.sh file, from the same repository

Run the script using the parallel command, you can even use the time command to measure how long it takes to run

parallel < align_commands.sh

How long did it take when using parallel to run the command?

Job Schedulers

Remove the echo we added to align_all_extra_fqs.sh so that it will run everything in a for loop
Do you remember how to submit a job with slurm? (hint: its the sbatch command followed by what you want to run)

sbatch align_all_extra_fqs.sh

Run squeue to see your job running. You should see something like this:
We will now kill our job, we do this using the scancel command followed by the JOBID. For me, this is scancel 8 . Find your jobid with squeue and cancel the job
Remove the bam files we generated here

rm -f *.bam

Edit the align_all_extra_fqs.sh file to submit each bwa mem command to slurm

This means you wrap the bwa mem line in quotes, and prefix with sbatch --wrap

sbatch --wrap "bwa mem rCRS.fa $R1_read $R2_read | samtools view -b - | samtools sort - -o $R1_read.bam"

See all the jobs running at once

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Parallelisation		Parallelisation
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EBI Genome Bioinformatics: Scaling Things Up

Parallelisation

Job Schedulers

About

Uh oh!

Releases

Packages

Languages

PKatarina/EBI-Bioinformatics-course-Scaling_things_up

Folders and files

Latest commit

History

Repository files navigation

EBI Genome Bioinformatics: Scaling Things Up

Parallelisation

Job Schedulers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages