CysDBase: A comprehensive database of Cysteine Post-Translational Modifications (PTMs) across protein sequence, class, cellular localization, biological pathway, structure, and taxonomy
Database collection of cysteine Post-Translational Modifications namely Disulphide, S-glutathionylation, S-nitrosylation, S-sulphenylation, S-palmitoylation, Thioether and Metal-binding. First database to store the information of Thioether, S-sulphenylation and across various metals that exhibits Metal-binding.
Cysteine thiol undergoes various post-translational modifications that contribute to a large number of biochemical, physiological, and cellular processes. Consolidated information about these modifications would be a valued addition to the scientific community for ready reference. A few published datasets are available on specific cysteine modifications; those are less diverse in nature. The aim of this work is to develop a large repository of multiple cysteine post-translational modifications from all the available species encompassing the maximum number of features.
This CysDBase database reports seven cysteine modifications from 1,14,56,639 cysteine extracted from proteins belonging to various taxa that belongs to Virus, Eukaryote, Bacteria and Archaea. These seven modifications are – Disulphide, Metal-binding, S-sulphenylation, Thioether, S-glutathionylation, S-palmitoylation and S-nitrosylation. Corresponding to each cysteine residue, the following features are reported in the database – post-translational modifications, protein sequences (within window size, 7), cellular location, pathway, PubMed ID, PDB_ID, buried fraction and protein microenvironment (rHpy). To note, each residue may have multiple modifications. Some features may not be returned for certain entries, due to the unavailability of the experimental data; for example, if PDB ID is not available buried fraction and protein microenvironment will not be reported. This database, for the first time, reports Thioether, S-sulphenylation and across various metals that exhibits Metal-binding of cysteine residue.
- CysDBase database is a repository of seven post-translational modifications from 1,14,56,639 cysteine residues extracted from various proteins present in various taxa that belongs to Virus, Eukaryote, Bacteria and Archaea. The features corresponding to each cysteine entry are post-translational modification, protein sequence (within a given window size), location, pathway, PubMed ID, PDB ID, buried fraction, and protein microenvironment (subject to availability of the experimental data).
- For the first time the database reports the protein microenvironments for various cysteine post-translational modifications.
- This is the first database reporting Thioether, S-sulphenylation and across various metal that exhibits Metal-binding.
Post-translational modifications of cysteine residues were scattered throughout the literature. A comprehensive report is presented here. The search results produce diverse features such as the location of the modifications, pathways, protein microenvironment, etc., which can be applied to address various biological questions.
CysDBase curates the experimentally determined cysteine modifications from the maximum number of species and reports the highest number of features, with search queries. The database reports the protein microenvironment feature, for the first time.
CysDBase database is a comprehensive repository of seven cysteine post-translational modifications and their related features, reporting for the maximum number of species and diverse features.
CysDBase Web Server Link - https://cysdbase.bits-hyderabad.ac.in/
Dataset files for the CysDBase can be accessed only after send a email request to banerjee.debi@hyderabad.bits-pilani.ac.in with sending details of Name, Email, Name of the Institute and Place of the Institute.
Pandas==2.3.2
- Send a email request to banerjee.debi@hyderabad.bits-pilani.ac.in for downloading the CSV files for the respective three python codes.
- There are three python codes available to curate the data for the respective query.
- First python code is for General query where the user can enter any query namely UniProt_ID or Organism or Cell organelle or Biological pathway and can download the data related to query and the results are present in the CSV file.
- Second python code is for FASTA Sequences download where the user can enter any query namely UniProt_ID or Organisms or Biological pathway or Cell organelle and can download the data related to query and the results are present in the CSV file.
- Third python code is for Protein Strucutral microenvironment where the user enter any query namely UniProt_ID or PDB_ID, but the query must be in the capital letters and the results are present in the CSV file.
- Tutorials to access query
a. Download and save the dataset related to query.
b. Run the python code cysdbase_query.py
c. It will ask for query namely UniProt_ID or Organisms or Biological pathway or Cell organelle. You can enter based on your requirement.
d. The output for the python code is obtained in a query_output.csv file. - Tutorials to access FASTA sequences download.
a. Download and save the dataset related to query.
b. Run the python code fasta_sequences.py.
c. It will ask for query namely UniProt_ID or Organisms or Biological pathway or Cell organelle. You can enter based on your requirement.
d. The output for the python code is obtained in a query_output.csv file. - Tutorials to access the Protein Strucutral Microenvironment.
a. Download and save the dataset related to query.
b. Run the python code cysdbase_menv.py.
c. It will ask for query namely UniProt_ID or PDB_ID and the query must be in capital letters.
d. The output for the python code is obtained in a cysdbase_menv_output.csv file.
If the query you have entered is not showing that indicates the query is not in the database.
HD acknowledges the financial support from the Indian Council of Medical Research (ICMR)- Senior Research Fellow (SRF), File No: BMI/11(99)/2022; DB acknowledges the financial support from the Department of Science and Technology (DST), Science and Engineering Research Board (SERB), India, File No: EMR/2017/002953
Please contact;
Prinicipal Investigator:-
Dr.Debashree Bandyopadhyay,
Associate Professor,
BITS Pilani, Hyderabad Campus,
INDIA
Developer:-
Devarakonda Himaja,
PhD Student in Bioinformatics,
BITS Pilani, Hyderabad Campus,
INDIA