Skip to content

Plasmid Database

pedroscampoy edited this page May 31, 2018 · 15 revisions

Please, follow those steps to download a reliable and complete plasmid database. This is going to take several hours but needs to be done only once.

1. Download plasmid database info file:

ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/plasmids.txt

2. Extract sequences from all accession numbers into a FASTA file using eutils:

This command outputs a raw FASTA with about 12000 sequences

for i in $(cat plasmids.txt | awk 'BEGIN{FS="\t"} (NR>2) {if ($6 ~ "N") {print $6;} else {print $7}}'); do curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=$i&retmode=text&rettype=fasta"; done > plasmids.fna

3. Remove redundancy

From PlasmidID folder execute: NOTE: *-i argument is the route to plasmids.fna *The output will be the same as the input *Memmory (-M) and number of threads (-T) can vary depending on the computer than execute this command

´´´bash lib/cdhit_cluster.sh -i FILE_TO/plasmids.fna -p -c 100 -M 20000 -T 8 ´´´

Clone this wiki locally