-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Dear Shen Wei,
Firstly thank you for developing this nice tool.
To tell my story short,
As the title says, my question is that I am trying to search for the presence/absence and abundance of a small gene cluster (~ operon) of 15 kb named NisinA.fasta from a bacteria against three sets of metagenomics data fastq reads coming from the SRA database coming three different publications and try to see the difference between them.
So I downloaded the 3 datasets from SRA website and
for example, using..
#fastq-dl --accession PRJNA542703 --provider SRA
I wanted to search my NisinA against the raw reads of each sample of the study.
So per each sample,
seqtk seq -a read_R1.fastq.gz > reads_R1.fasta
seqtk seq -a read_R2.fastq.gz > reads_R2.fasta
cat read_R1.fasta reads_R.fasta > reads_combined.fasta
#index each sample database
makeblastdb -in reads_combined.fasta -dbtype nucl -out reads_db
blastn -query NisinA.fasta -db reads_db -out results.txt -evalue 1e-5 -outfmt 6
However, this method may take so long and is computationally expensive, I was wondering can I answer my question with kmcp with same fashion with compute, index, and search.
If yes, could it give me more hints on the abundance of my operon of interest per each sample?
PS: I work on HPC.
apologies for any inconvenience
Best,
Ahmed