search of gene cluster against database form metagenomics fastQ files.

Dear Shen Wei, 

Firstly thank you for developing this nice tool. 

To tell my story short,

As the title says, my question is that I am trying to search for the presence/absence and abundance of a small gene cluster (~ operon) of 15 kb named *NisinA.fasta* from a bacteria against **three** sets of metagenomics data fastq reads coming from the SRA database coming three different publications and try to see the difference between them.

So I downloaded the 3 datasets from SRA website and 

for example, using..

#fastq-dl --accession PRJNA542703  --provider SRA


I wanted to search my NisinA against the raw reads of each sample of the study.


So per each sample,

```bash
seqtk seq -a read_R1.fastq.gz > reads_R1.fasta
seqtk seq -a read_R2.fastq.gz > reads_R2.fasta

cat read_R1.fasta reads_R.fasta > reads_combined.fasta

#index each sample database
makeblastdb -in reads_combined.fasta -dbtype nucl -out reads_db

blastn -query NisinA.fasta -db reads_db -out results.txt -evalue 1e-5 -outfmt 6
```
However, this method may take so long and is computationally expensive, I was wondering can I answer my question with kmcp with same fashion with compute, index, and search.

If yes, could it give me more hints on the abundance of my operon of interest per each sample?

PS: I work on HPC.

apologies for any inconvenience


Best,
Ahmed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

search of gene cluster against database form metagenomics fastQ files. #53

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

search of gene cluster against database form metagenomics fastQ files. #53

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions