[GSoC 2025 project 5] discussion about HBond interactions from implicit hydrogens in ProLIF #4962

yuyuan871111 · 2025-03-12T10:46:59Z

yuyuan871111
Mar 12, 2025

I am Stuart, interested in contributing to the project 5 (calculating HBond interactions from implicit hydrogens in ProLIF). I recently considered a solution, but I found several questions, and I would like to ask mentors for clarification.

In the project details, you mentioned that the expected outcomes are to compare the complexes without using other protonation tools. However, positioning hydrogens without relaxation or optimization can result in inaccurate predictions of hydrogen bonds, particularly for residues with rotatable side chains and hydrogen donors (e.g., CYS, LYS, SER, THR, and TYR) even if the heavy atoms are given. For those non-deterministic implicit hydrogens, it will be difficult to calculate the hydrogen interactions based on angles and distances. (In contrast, implicit hydrogens for some residues are deterministic due to their local resonance effect, like ARG, ASN, GLN, TRP. For these residues, adding implicit hydrogens should be fine.)

I think the possible reason for not using the protonation tool is the computation costs. But, achieving accurate predictions by relaxing implicit hydrogens comes at the cost of computational efficiency, potentially resulting in the same computational expense as other protonation tools.

Also, the protonation tools usually consider the environment of the protein, such as global pH and local residue/ligand-induced effect (like PROPKA does). These factors might also affect the calculation of the hydrogen interactions. For example, the pH will affect the protonation states of several residues, including ARG, ASP, GLU, HIS, TYR and LYS, and thus lead to different hydrogen bond interactions.

I would like to ask if there is any further reason why we want to position hydrogens ideally without using protonation tools (except for the computational costs)? And do we want to add the hydrogen relaxation to the codes (if not using other protonation tool)?

What would be the acceptable assumption for the hydrogen bonds calculations with implicit hydrogens (e.g., pH 7 environment and not considering local residue/ligand-induced environment)?

cbouysset · 2025-03-13T11:56:28Z

cbouysset
Mar 13, 2025

Hi Stuart and thank you for considering participating in a GSoC project with us!

Your understanding of why we don't want to use protonation tool is correct, the goal of this project is to provide a faster (but possibly less accurate) way of estimating interactions from PDB files that don't contain explicit hydrogen atoms.

The exact implementation is going to be left up to the candidate to decide, but I'd like to emphasise that because we aren't relying on a protonation tool, it might be best to assume that there will be no hydrogens to relax explicitly and that we won't use any local environment effect to infer the protonation state. Essentially we want to only use the heavy atoms positions to infer their ability to participate in hydrogen-bonding, i.e. if after using this implicit-hydrogen method two residues are predicted to interact with a hydrogen bond, then by adding hydrogens with a protonation tool and optimising the HBond network we should see that interaction between both residues.

As an example, if you take a look at the 3D viewer implemented in the PDB (molstar), it assumes all nitrogen atoms in a histidine can take part as a donor. The goal of this project would be something similar.

We can also respect different residue naming convention, this is where the "helper" function plays a role, e.g. making sure HIE only considers the epsilon hydrogen as a donor, whereas for HIS and HIP you would consider both. Same for carboxylic acid side chains, you would consider both O as donors unless the residue naming tells you otherwise. Also RDKit (which would be the basis for implementing this helper function) doesn't understand non-standard residue names so that's another role for it, making sure HIE and other non-standard residues have proper bond orders assigned.

Hopefully this answers some of your questions!

2 replies

yuyuan871111 Mar 13, 2025
Author

@cbouysset, thanks a lot!

I just submitted my pre-proposal a couple of minutes ago before you reply. I'm not sure if there is a chance to modify it. XD

In my pre-proposal, I tried to include some solutions for the protonated states (e.g. pH and HIS states). But, the idea here seems to focus more on (1) the SMARTS pattern design for finding the hydrogen donor and acceptor and (2) the naming convention and potential issues across the packages.

But, thanks again for clarification. I will also try to have a merged PR (if invited) in the MDAnalysis community or related projects to meet the requirement.

cbouysset Mar 13, 2025

Don't worry about the pre-proposal, both myself and @talagayev are aware now so it shouldn't cause any problem, we'll try to review it soon.

the idea here seems to focus more on (1) the SMARTS pattern design for finding the hydrogen donor and acceptor and (2) the naming convention and potential issues across the packages.

I'd say the project requires

modifying the SMARTS for HBond donor (for the implicit version),
dealing with naming conventions in PDB files (at least the most common cases)
I'll also add for the implementation, because we aren't looking at the position of the explicit H atom anymore to judge whether groups of atoms are interacting or not, we'll need an alternative way of doing things. molstar that I linked to in my previous message does it one way, I know some docking algorithms also score hydrogen bonds using heavy atoms only, so there are approximate solutions out there already that we can take inspiration from.

H-EKE · 2025-03-16T12:17:12Z

H-EKE
Mar 16, 2025

Hi @cbouysset @talagayev , I'm Hocine, and I’d love to contribute to Project 5. I’ve put together a pre-proposal where I address histidine protonation, but I didn’t go into detail on nomenclature differences in CHARMM/AMBER since I wasn’t sure how relevant that was or how long the pre-proposal should be. I also tried to emphasize avoiding explicit protonation tools, but I’m not sure if it’s clear enough.

If we’ve already submitted the pre-proposal, can I still modify it, or would you not recommend doing so?

Would love to hear your thoughts!

9 replies

yuyuan871111 Mar 18, 2025
Author

Thanks @H-EKE, I also have the same questions.

talagayev Mar 18, 2025
Collaborator

Hey @H-EKE and @yuyuan871111,

From the discussion that we had internally it was enough if you merge a PR in either of those, so yes for example merging a PR in ProLIF is enough.
Last year when I did it, it was the case that I created a Google Docs, that I shared with the mentors ( sent them a link to the Google Docs in the MDAnalysis Discord chat I think) where the mentors can then take a look at the final proposal and also give feedback, before you submit it on the GSoC website.

Hope this helps :)

H-EKE Mar 18, 2025

Hi @talagayev ,

Thanks for your answer, it was very helpful! :)

H-EKE Mar 23, 2025

Hi @cbouy @talagayev

I have put my pre-prosoal in this google doc (https://docs.google.com/document/d/1TanfDv1VUsrBv6h3pP51P4D9B6L5xshZNasFmHpYP6Q/edit?usp=sharing). I would appreciate your feedback and suggestions.

Just to clarify before I start panicking: I have never worked with mypy before, but I have done my best.
I wanted to check if the PR will be evaluated as part of the GSoC selection process? And if the evaluation isn’t positive, will we be automatically discarded from consideration?

cbouy Mar 26, 2025
Collaborator

Just realised I forgot to reply here: you will need a PR merged in MDAnalysis (or ProLIF for this project) to be selected. And just in case it's not clear, having changes requested on your PR is not a fail, just iterate over it until it get accepted before the deadline.

yuyuan871111 · 2025-03-27T08:55:39Z

yuyuan871111
Mar 27, 2025
Author

A trivial thing for @cbouy and @talagayev:

The size of this project is between medium and large. But, the application system of GSoC only has three options: small, medium, large. So, the question is: which one would you suggest selecting when filling in the application forms?

2 replies

cbouysset Mar 27, 2025

I'd say it depends on:

how much content you wish to cover in your project proposal. E.g. if you only want to develop the code with tests and docs, or if you also want to add a comparative benchmark, or explore different implementations, this will extend the time required to complete the project.
how familiar you are with best coding practices. Some people will have an easy time with git, adding tests, writing docs...etc., and others are going to have to learn all this pretty much from scratch and will require more time.

yuyuan871111 Mar 28, 2025
Author

Thanks!!

Jules-Cesar9 · 2025-03-27T11:56:58Z

Jules-Cesar9
Mar 27, 2025

Hi @cbouy and @talagayev,
I am Jules Cesar. I submitted my pre-proposal (Project 6), but I haven't received any response regarding whether it was reviewed or not.

1 reply

cbouysset Mar 27, 2025

Hi @Jules-Cesar9,

We have received your pre-proposal and will try to give an answer within a week of your submission date as stated in the wiki. It may take Valerij and I slightly longer as the ProLIF projects have received quite a lot of interest, so don't panic if you don't see an answer tomorrow, but it will arrive in the weekend.

H-EKE · 2025-04-05T00:46:41Z

H-EKE
Apr 5, 2025

Hi @cbouy @talagayev ,

Based on my difficulties with the PR, I have changed my timeline from medium to large project. Would it be that okay?
If I shared the google doc wit you, could you please give me feedback if you have time?

2 replies

cbouy Apr 5, 2025
Collaborator

Hi @H-EKE, yes that's fine!

For the proposal, please submit it directly through the GSoC portal, with a link to the Google doc in the PDF directly. This way only mentors can see it and they can all give feedback. You can then resubmit the improved proposal through the GSoC portal as often as you want.

H-EKE Apr 6, 2025

Thanks for your reply @cbouy. I just submitted

H-EKE · 2025-05-08T19:15:10Z

H-EKE
May 8, 2025

Congrats @yuyuan871111 ! ☺️

1 reply

yuyuan871111 May 9, 2025
Author

Thank you @H-EKE

[GSoC 2025 project 5] discussion about HBond interactions from implicit hydrogens in ProLIF #4962

Uh oh!

Replies: 6 comments · 17 replies

Uh oh!

Uh oh!

yuyuan871111 Mar 13, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuyuan871111 Mar 18, 2025 Author

Uh oh!

talagayev Mar 18, 2025 Collaborator

Uh oh!

Uh oh!

Uh oh!

cbouy Mar 26, 2025 Collaborator

Uh oh!

yuyuan871111 Mar 27, 2025 Author

Uh oh!

Uh oh!

yuyuan871111 Mar 28, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cbouy Apr 5, 2025 Collaborator

Uh oh!

Uh oh!

Uh oh!

yuyuan871111 May 9, 2025 Author

Replies: 6 comments 17 replies

yuyuan871111 Mar 13, 2025
Author

yuyuan871111 Mar 18, 2025
Author

talagayev Mar 18, 2025
Collaborator

cbouy Mar 26, 2025
Collaborator

yuyuan871111
Mar 27, 2025
Author

yuyuan871111 Mar 28, 2025
Author

cbouy Apr 5, 2025
Collaborator

yuyuan871111 May 9, 2025
Author