Data analysis for the manuscript, “Nociceptor clock genes control excitability and pain perception in a sex and time-dependent manner”
Important:
-
Consider reading the
README.html
file which has a floating table of contents. -
This project assumes you are using resources from the The Centre for Advanced Computing, which uses a SLURM job scheduler.
- It is highly recommended that you use a cloud computing system. You may need to edit scripts to load dependencies in a manner compatible with your system.
-
Ensure all scripts and data are stored in an R project folder.
-
Script names are numbered so the order of execution is more obvious.
-
Set the R current working directory to the project working directory. Most scripts assume that the project directory is the current working directory.
-
Caution! Some scripts use absolute paths (especially bash scripts)
- Run the following commands in the terminal to replace the
absolutePath
spaceholder found in scripts with your absolute path to the project directory.
find . -type f -name "*.sh" -exec sed -i'' -e 's#absolutePath#/my/custom/path#g' {} + find . -type f -name "*.R" -exec sed -i'' -e 's#absolutePath#/my/custom/path#g' {} +
- Run the following commands in the terminal to replace the
Primary session info:
- R version 3.6.0 (2019-04-26)
- Platform: x86_64-redhat-linux-gnu (64-bit)
- Running under: CentOS Linux 7 (Core)
- Matrix products: default
- BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
Packages:
R version 3.6.0
Package | Version |
---|---|
AnnotationDbi | 1.48.0 |
arrayQualityMetrics | 3.42.0 |
Biobase | 2.46.0 |
cividis | 0.2.0 |
DESeq2 | 1.26.0 |
dplyr | 1.1.0 |
ggplot2 | 3.4.1 |
ggrepel | 0.9.3 |
gprofiler2 | 0.2.1 |
IsoformSwitchAnalyzeR | 1.8.0 |
knitr | 1.42 |
optparse | 1.7.3 |
pheatmap | 1.0.12 |
renv | 0.17.3 |
rmarkdown | 2.20 |
stringr | 1.5.0 |
tibble | 3.1.8 |
tidyr | 1.3.0 |
R version 4.2.1
Package | Version |
---|---|
dplyr | 1.0.9 |
IsoformSwitchAnalyzeR | 1.17.04 |
Notice the ../0_helpers
folder. This directory contains many R
functions that minimize repetition of code and are generally helpful.
-
Navigate to
./1_stringtie
-
Run
1_writePass1Scripts.R
to write individual scripts for pass 1. Execute scripts in thepass1IndivScripts
directory. Use the2_checkSuccess.R
andjobsToRun.sh
scripts to monitor progress.# 1 cpu, 5 GB memory # REF_GTF is the full GTF file from Gencode module load StdEnv/2020 stringtie/2.1.5 stringtie $INPUT -p 5 -G $REF_GTF -o $OUT_GTF
-
Run
3_writeGtfLists.R
to prepare the merging of individual GTFS from pass 1. -
Run
*.sh*
files in the3_merge
folder to execute the merging of GTF files.# 5 cpu, 3 GB memory module load StdEnv/2020 stringtie/2.1.5 stringtie --merge -p 20 -o $OUTPUT -G $REF_GTF $GTFS_LIST
-
Evaluate StringTie performance with
4.1_writeGffCompareScripts.R
and*.sh
scripts in the4_gffCompare
folder. -
Run
5_writePass2Scripts.R
to write individual scripts for pass 2. Execute scripts in thepass2IndivScripts
directory. Use the2_checkSuccess.R
andjobsToRun.sh
scripts to monitor progress.# 1 cpu, 5 GB memory # REF_GTF is the merged gtf that corresponds to this sample's tissue module load StdEnv/2020 stringtie/2.1.5 stringtie $INPUT -b $BALL -e -p 5 -G $REF_GTF -o $OUT_GTF
-
To get transcript id to gene name mapping, run
6_isoformAnalyzeR/isoformAnalyzeR.R
.
- Navigate to
2_dataPrep
- Clean the count matrix
- Run
0_id2name.R
to get a dataframe with ensembl ID to gene name/symbol conversion information. - Run
1_outlierRemoval.R
to … a. Perform outlier detection with arrayQualityMetrics. A sample is considered an outlier if- it is marked as an outlier before and after normalization by the same outlier detection metrics, and/or,
- it is marked as an outlier by multiple outlier detection metrics after normalization
- Note: No samples were considered outliers and removed.
- Normalize counts with the median of ratios method
- Run
2_filtering.R
to perform non-specific filtering to remove lowly expressed features. - This and the last step are performed more-so to optimize the filtering threshold that will be used for DeSeq2, and prepare counts for unknown future analyses.
- Navigate to
./3_deseqCandidate
- Prepare candidate gene lists. Download gene lists from KEGG with
1.0_getKeggGenes.R
. Run1.1_prepareCanddiates.R
to clean candidate lists. - Run
2_writeScripts.R
to write a bash script for each analysis.- Candidate genes are removed after the lowly expressed genes are removed.
- Run
*.sh
scripts in thebash
folder to execute analyses.