-
Notifications
You must be signed in to change notification settings - Fork 1
multicom-toolbox/DNSS
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
################################################################################################################# # # # Software : DNSS (A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction) # # Release : 1.1 (October 2016) # # # # Author(s) : Matt Spencer, Jesse Eickholt, and Jianlin Cheng # # Maintainance : Jie Hou(jh7x3@mail.missouri.edu ), Badri Adhikari (bap54@mail.missouri.edu) # # Copyright : Bioinformatics, Data Mining, and Machine Learning Lab (BDM) # # Department of Computer Science # # University of Missouri, Columbia # # # ################################################################################################################# Note that probably every run will result in a warning from the blast software. This is not a problem, and SS predictions should work fine despite this error. ---------------------------------------------------------------------------- 1. Before installing DNSS, the following tools must be downloaded and installed 1). Python2: DNSS was fully developed and tested under Python 2.7.6 * Note: Python is recommended to install in system, in our program, will call 'python2 *py' directly. 2). ncbi-blast-2.2.25+: blast programs should be in folder './programs', for example, the binary file should be in 'programs/ncbi-blast-2.2.25+/bin/' * Note: We had hard code of 'ncbi-blast-2.2.25+' in 'scripts/gen-pssm-less-stringent.sh' and 'scripts/generate-pssm.sh' So if other version is used, please make sure to change the version in two scripts. Executable blast can be downloaed from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.25/ For example: $ cd ./programs $ wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.25/ncbi-blast-2.2.25+-x64-linux.tar.gz $ tar -zxvf ncbi-blast-2.2.25+-x64-linux.tar.gz 3). Non-redundency database(nr90): Please put nr90 files that were formated by blast into folder './nr_database' *Note: Please make sure the prefix of nr database is 'nr90', for example, nr90.00.phr, nr90.00.pin, etc If other version of nr database is used, please change the nr version in 'scripts/gen-pssm-less-stringent.sh' and 'scripts/generate-pssm.sh' Our lab provided nr70, nr90, and nr for users to download, please try $ cd ./nr_database a) nr90: $ wget http://sysbio.rnet.missouri.edu/bdm_download/nr_database/nr90.tar.gz $ tar -zxvf nr90.tar.gz $ mv nr90/* ./ $ rm -rf nr90 nr90.tar.gz b) nr70(optional): $ wget http://sysbio.rnet.missouri.edu/bdm_download/nr_database/nr70.tar.gz $ tar -zxvf nr70.tar.gz $ mv nr70/* ./ $ rm -rf nr70 nr70.tar.gz c) nr(optional): $ wget http://sysbio.rnet.missouri.edu/bdm_download/nr_database/nr.tar.gz $ tar -zxvf nr.tar.gz $ mv nr/* ./ $ rm -rf nr nr.tar.gz ---------------------------------------------------------------------------- 2. Configuation Usage: $ cd DNSS $ perl configure.pl <DNSS folder> Example: $ perl configure.pl /home/jh7x3/DNSS/ ---------------------------------------------------------------------------- 3. Instructions for predicting secondary structure: * Enter the scripts/ directory. * Use the PredictSS.pl script to predict SS. * This tool also support both GPU and CPU to run prediction, the default is running on CPU. Before running CPU, please check if GPU can be called for cudamat $ cd scripts/ $ perl GPU_detect.pl * There are two ways to indicate the protein to predict: ---------------------------------------------------------------------------- (1) Directly give a sequence: You can indicate a residue sequence in the command line directly, and name the sequence to make resulting files easier to find. To use this procedure, it is reccomended that the -name tag is used in addition to the -seq tag. -name is not required, but if it is not used, the program will automatically assign a less useful name to the protein (such as 0000). Usage: $ perl PredictSS.pl -seq <AA sequence> -name <Protein Name> -device <GPU/CPU> -out -out <output folder> Example: a). CPU: perl /home/jh7x3/DNSS/scripts/PredictSS.pl -seq GNVVIEVDMANGWRGNASGSTSHSGITYSADGVTFAALGDGVGAVFDIARPTTLEDAVIAMVVNVSAEFKASEANLQIFAQLKEDWSKGEWDALAGSSELTADTDLTLTATIDEDDDKFNQTARDVQVGIQAKGTPAGTITIKSVTITLAQEA -name Prot1 -out ./output/ b). GPU: perl /home/jh7x3/DNSS/scripts/PredictSS.pl -device GPU -seq GNVVIEVDMANGWRGNASGSTSHSGITYSADGVTFAALGDGVGAVFDIARPTTLEDAVIAMVVNVSAEFKASEANLQIFAQLKEDWSKGEWDALAGSSELTADTDLTLTATIDEDDDKFNQTARDVQVGIQAKGTPAGTITIKSVTITLAQEA -name Prot1 -out ./output/ ---------------------------------------------------------------------------- (2) Predict from protein file: This is the suggested method for predicting single proteins, as it is less confusing. Create a fasta file containing the residue sequence and name the file <prot-name>.fasta. The program will use the name of the file to indicate the protein that was predicted, and name output files accordingly. Usage: $ ./PredictSS.pl -seq <file name>.fasta -file -device <GPU/CPU> -out <output folder> Example: a) CPU: perl /home/jh7x3/DNSS/scripts/PredictSS.pl -seq test/1GNY-A.fasta -file -device CPU -out output/ b) Gpu: perl /home/jh7x3/DNSS/scripts/PredictSS.pl -seq test/1GNY-A.fasta -file -device GPU -out output/ * Note that the 1GNY-A.fasta file is available at that location to use as a test protein. * Note that using the -out tag to designate an output directory is recommended, or else the scripts folder will get cluttered by predictions. (In fact, I recommend changing the default output directory to something else to avoid this fate.) * More details about the parameter, can simply type $ perl scripts/PredictSS.pl ---------------------------------------------------------------------------- 4. Predicting multiple proteins: It is quite easy to predict multiple proteins in sequence using the -indir tag. Just save fasta files of all the proteins you want to predict in a directory, use the -indir tag, and all fasta files in that directory will be predicted. Usage: $ ./PredictSS.pl -indir <input directory> -out <output directory> -device <GPU/CPU> Example: a) CPU: perl /home/jh7x3/DNSS/scripts/PredictSS.pl -indir ./test/ -out ./output/ -device CPU b) GPU: perl /home/jh7x3/DNSS/scripts/PredictSS.pl -indir ./test/ -out ./output/ -device GPU ---------------------------------------------------------------------------- 5. Recommended improvements: As stated above, I recommend changing the default directory for output to avoid cluttering the scripts/ directory. Currently the program saves all intermediate files in the temp/ directory and its subdirectories. I recommend revising the script to delete these intermediate files, as they will build up to be quite large if left unattended. ---------------------------------------------------------------------------- If you have questions or suggestions, please contact: Jie Hou(jh7x3@mail.missouri.edu), Jianlin Cheng, PhD Bioinformatics, Data Mining, and Machine Learning Lab (BDM) Department of Computer Science University of Missouri, Columbia Email: chengji@missouri.edu ---------------------------------------------------------------------------- Citation: Spencer, Matt, Jesse Eickholt, and Jianlin Cheng. "A deep learning network approach to ab initio protein secondary structure prediction." IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 12.1 (2015): 103-112.
About
Deep learning networks for protein secondary structure prediction
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published