Skip to content

mldlproject/2021-iNSP-CTDC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 

Repository files navigation

iNSP-GCAAP: Identifying Non-classical Secreted Proteins using Global Composition of Amino Acid Properties

H. T. Pham, T-H Nguyen-Vo, Q. H. Trinh, T. T. T. Do*, and B. P. Nguyen

alt text

Motivation

Non-classical secreted proteins refer to a group of proteins released into the extracellular environment under the facilitation of different biological transporting pathways apart from the Sec/Tat system. As experimental determination of non-classical secreted proteins is often costly and requires skilled handling techniques, computational approaches are necessary. In this study, we introduce iNSP-GCAAP, a computational prediction framework, to identify non-classical secreted proteins. We propose using global composition of a customized set of amino acid properties to encode sequence data and use the random forest algorithm for classification. We used the training dataset introduced by Zhang et al. (Bioinformatics, 36(3), 704–712, 2020) to develop our model and test it with the independent test set in the same study.

Results

The area under the receiver operating characteristic curve (AUC) on that test set was 0.9256 which outperformed other state-of-the-art methods using the same datasets. Our framework is also deployed as a user-friendly web-based application to support the research community to predict non-classical secreted proteins.

Availability and Implementation

Source code and data are available upon request.

Web-based Application

Click here

Contact

Go to contact information

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published