Skip to content

search of gene cluster against database form metagenomics fastQ files. #53

@AhmedElsherbini

Description

@AhmedElsherbini

Dear Shen Wei,

Firstly thank you for developing this nice tool.

To tell my story short,

As the title says, my question is that I am trying to search for the presence/absence and abundance of a small gene cluster (~ operon) of 15 kb named NisinA.fasta from a bacteria against three sets of metagenomics data fastq reads coming from the SRA database coming three different publications and try to see the difference between them.

So I downloaded the 3 datasets from SRA website and

for example, using..

#fastq-dl --accession PRJNA542703 --provider SRA

I wanted to search my NisinA against the raw reads of each sample of the study.

So per each sample,

seqtk seq -a read_R1.fastq.gz > reads_R1.fasta
seqtk seq -a read_R2.fastq.gz > reads_R2.fasta

cat read_R1.fasta reads_R.fasta > reads_combined.fasta

#index each sample database
makeblastdb -in reads_combined.fasta -dbtype nucl -out reads_db

blastn -query NisinA.fasta -db reads_db -out results.txt -evalue 1e-5 -outfmt 6

However, this method may take so long and is computationally expensive, I was wondering can I answer my question with kmcp with same fashion with compute, index, and search.

If yes, could it give me more hints on the abundance of my operon of interest per each sample?

PS: I work on HPC.

apologies for any inconvenience

Best,
Ahmed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions