Replies: 1 comment 11 replies
-
Hi, there is no simple recipe for this, I think. With only 500K compounds you may struggle creating a prior generating a sufficiently high percentage of valid SMILES. This assumes that you would be using the same network size as the ChEMBL prior. We have not tested what the LSTM hyperparameters should be for "small" training sets. What you could do, is to use the current ChEMBL prior and apply TL with your own dataset. Your reported memory footprint does not make sense to me. As you are running on a GPU, the main determining factor is GPU memory. You should be able to train a prior of your size with less than 10GB GPU memory. I note that Cheers, |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
Since reinvent was trained on Chembl dataset, I would like to retrain the reinvent model with a new dataset (~500k SMILES) that covers different chemical space as Chembl. Is transfer learning the right way to do it? For now I am testing TL from the provided reinvent.prior model, and I plan to do TL from an empty model as discussed in #87
However, I am facing an issue that huge amount of memory is needed, causing my job being killed by the system. Job report with qacct command shows:
Is this normal behavior for my dataset? Since Chembl is much bigger than my dataset, how much memory was needed for the original reinvent training? Below is my input toml file:
Beta Was this translation helpful? Give feedback.
All reactions