-
Notifications
You must be signed in to change notification settings - Fork 9
Plasmid Database
pedroscampoy edited this page May 31, 2018
·
15 revisions
Please, follow those steps to download a reliable and complete plasmid database. This is going to take several hours but needs to be done only once.
ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/plasmids.txt
This command outputs a raw FASTA with about 12000 sequences
for i in $(cat plasmids.txt | awk 'BEGIN{FS="\t"} (NR>2) {if ($6 ~ "N") {print $6;} else {print $7}}'); do curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=$i&retmode=text&rettype=fasta"; done > plasmids.fna
From PlasmidID folder execute:
NOTE:
- -i argument is the route to plasmids.fna
- The output will be the same as the input
- Memmory (-M) and number of threads (-T) can vary depending on the computer than execute this command
lib/cdhit_cluster.sh -i FILE_TO/plasmids.fna -p -c 100 -M 20000 -T 8
NOTE2:
This step is optional, PlasmidID works with any DNA database. Redundancy removal is useful in order to reduce execution time. Also, any other clustering software is welcome.