Implementation of CellPLM #19

HelloWorldLTY · 2024-12-14T15:32:13Z

Describe your changes

Checklist before requesting a review

I have performed a self-review of my code
Check the correct box. Does this PR contain:
- Breaking changes
- New functionality
- Major changes
- Minor changes
- Bug fixes
Proposed changes are described in the CHANGELOG.md
CI Tests succeed and look good!

Hi, I have some questions about the CI test.

For CellPLM, it is only runnable in gpu, not in cpu. Even if they provide the device keyword, but their pre-trained models are trained based on gpu and we may need to modify their packages.

For scFoundation, I keep on meeting the problem:

File "/tmp/viash-run-scfoundation-5vhnk1.py", line 60, in
sys.path.append(meta["resources_dir"])
KeyError: 'resources_dir'

It is strange and in cellplm or scgpt I did not meet such an error. It will be great if someone can take a look and help me resolve this problem.

Also, scFoundation does not allow us to upload the model weights elsewhere publicly, could we store the model weights internally and load it if possible? Thanks.

lazappi · 2024-12-17T08:50:25Z

Thanks for getting started with this 🎉!

For CellPLM, it is only runnable in gpu, not in cpu. Even if they provide the device keyword, but their pre-trained models are trained based on gpu and we may need to modify their packages.

Running on a GPU is ok. When this is the case the config file for the component should inherit from base_method.yaml instead of comp_method.yaml. This will skip some of the tests on the CI where we only have a CPU.

For scFoundation, I keep on meeting the problem:

File "/tmp/viash-run-scfoundation-5vhnk1.py", line 60, in sys.path.append(meta["resources_dir"]) KeyError: 'resources_dir'

It is strange and in cellplm or scgpt I did not meet such an error. It will be great if someone can take a look and help me resolve this problem.

I think this might be because you have deleted the ## VIASH END tag. This is important for the component to work properly and the section between ## VIASH START and ## VIASH END should only contain dictionaries as in the components for other methods.

Also, scFoundation does not allow us to upload the model weights elsewhere publicly, could we store the model weights internally and load it if possible? Thanks.

Can you please point us to where this is documented? We will take a look and see what is possible.

I noticed you have added a lot of extra files, some of which have copyright notices. The added code should be the minimum needed to run each method and if most cases should be stored in the directory for that component. Can you please remove anything that isn't needed and move it to the appropriate directory?

HelloWorldLTY · 2024-12-17T13:43:47Z

Hi, I got this information after directly asking the authors of scFoundation. Do you think if it will be helpful to reach out these authors to see if they allow us to create a google drive link for download? Thanks.

Update: I have made corrections based on your recommendations. For the copyright file, I do not think we can easily remove tham as they are depended files used in scFoundation infernece (check the load.py and ger_embedding.py). Do you think I need to clean them and finally have a file without risk, or just remove this method from our list?

lazappi · 2024-12-18T07:20:32Z

Hi, I got this information after directly asking the authors of scFoundation. Do you think if it will be helpful to reach out these authors to see if they allow us to create a google drive link for download? Thanks.

How do they currently distribute the model? We don't need Google Drive specifically, but we do need some public location where the model is available.

Update: I have made corrections based on your recommendations. For the copyright file, I do not think we can easily remove tham as they are depended files used in scFoundation infernece (check the load.py and ger_embedding.py). Do you think I need to clean them and finally have a file without risk, or just remove this method from our list?

If the files have been copied directly then we should definitely not remove any copyright notices. It sounds like there are some licensing issues around scFoundation that we would need to work out. For now I think it would be better to remove it and just add CellPLM until we are able to look into this more (or open a new PR with just CellPLM if that's easier).

@rcannood Do you have any other thoughts?

HelloWorldLTY · 2024-12-18T14:03:30Z

Hi, all options sound good to me.

scFoundation distributs the weights based on onedrive bussiness version, and I tried multiple approach to download the weights via link or in the codes but they all fail. It seems that onedrive has more strict useage.

lazappi · 2024-12-19T06:31:04Z

scFoundation distributs the weights based on onedrive bussiness version, and I tried multiple approach to download the weights via link or in the codes but they all fail. It seems that onedrive has more strict useage.

Ok. Let's change this PR to just have CellPLM or open a new one (whichever is easier) until we can look more into how to handle scFoundation.

HelloWorldLTY · 2024-12-19T17:39:52Z

Hi, sound sounds good to me. I have removed the content related to scFoundation.

lazappi

I've had a closer look at the code and made some comments. Can you please try to address these? Let me know if anything isn't clear.

CHANGELOG.md

src/methods/cellplm/config.vsh.yaml

src/methods/cellplm/script.py

HelloWorldLTY · 2024-12-26T20:56:34Z

I have resolved these comments. Please let mw know if you have more questions. Thanks a lot.

CHANGELOG.md

src/methods/cellplm/config.vsh.yaml

src/methods/cellplm/script.py

HelloWorldLTY · 2025-01-07T18:41:53Z

Hi, I have made necessary adjustment of the codes. For the instructions of file reading, if you prefer having a local file, I can adjust it, and maybe in this case I cannot test the codes in my local computer.

src/methods/cellplm/config.vsh.yaml

src/methods/cellplm/script.py

lazappi · 2025-01-08T11:29:03Z

Hi, I have made necessary adjustment of the codes. For the instructions of file reading, if you prefer having a local file, I can adjust it, and maybe in this case I cannot test the codes in my local computer.

I think it still downloads from Google Drive? Are there maybe some updates missing.

For using a local file you could look at the SCimilarity implementation as a example. It should be pretty similar for testing locally, you would just need to download and save the model somewhere rather than having it done by the script.

HelloWorldLTY · 2025-01-08T22:25:04Z

Hi, I have updated the codes with local file checking. Does this look good? Maybe you can manually download models with different versions and test them in cloud.

lazappi · 2025-01-09T06:34:13Z

Thanks! I will do some more testing and make some suggestions/changes.

Add code for extracting model files

lazappi · 2025-01-10T10:53:53Z

@HelloWorldLTY I have tested this and it seems to be working 🎉! I made a few changes but it was mostly to tidy things up and make it more consistent with other methods.

I have one questions before we finalise it. You used normalised data as input but the example data in the tutorial seems to be counts. Did you find any documentation for what the input should be?

HelloWorldLTY · 2025-01-10T13:49:46Z

Hi @lazappi happy to hear that, and your question is really great and we need to clarify it. Actually they do not have any document-level instructions in the original code, and their shared data have different format (some are count-based, like gse155468, others are normalized-based, like demo_train). I search their codes and found that they have mentioned the log-norm function in the code base, which is used to evaluate reconstruct and denoise effect. Therefore, I think they may expect log-norm data as inputs. See: https://github.com/search?q=repo%3AOmicsML%2FCellPLM%20normalize&type=code. Maybe we can provide codes for both implementation.

lazappi · 2025-01-13T08:38:57Z

Thanks for checking! I had a closer look at the example datasets and it looks like they have counts except for demo_train.h5ad. I asked here OmicsML/CellPLM#22 but if they don't get back to us soon I think we should switch to counts.

HelloWorldLTY · 2025-01-13T13:02:48Z

Make sense. Thanks for checking!

lazappi · 2025-01-24T07:24:14Z

The authors replied and said they expect counts as input so we should make that change. It looks like there are some minor conflicts now so I can take care of that as well.

HelloWorldLTY · 2025-01-24T20:13:27Z

Thanks a lot. That is fantastic

* origin/main: (28 commits) Adjust pyliger dependencies (openproblems-bio#43) Minor run adjustments (openproblems-bio#44) Fix reference in task description (openproblems-bio#45) Remove grouping bottleneck (openproblems-bio#42) Retry with single core in geneformer (openproblems-bio#41) Adjustments based on benchmark run (openproblems-bio#39) Update compute environment (openproblems-bio#35) Replace "+" in batch names for liger (openproblems-bio#37) Update common submodule to commit 62268aa (openproblems-bio#38) Adjust features filter in scPRINT (openproblems-bio#36) minor tweaks to the cell cycle component (openproblems-bio#34) Update common submodule to commit a2e845 (openproblems-bio#33) Increase time resources (openproblems-bio#27) GPU adjustments (openproblems-bio#32) Add batch_hvg var column to solution (openproblems-bio#31) Add fine-tuned scGPT (openproblems-bio#17) Bump scIB to v1.1.7 (openproblems-bio#30) Pin scPRINT version (openproblems-bio#25) Increase UCE memory tag (openproblems-bio#24) Add exit codes helper file (openproblems-bio#26) ...

lazappi · 2025-01-28T06:39:50Z

@rcannood I think this is ready for you to take a look at

HelloWorldLTY added 2 commits December 8, 2024 10:29

update cellplm and scfoundation

5170369

update cellplm

6156a77

update cellplm and scf

f479b85

only cellplm

62bbea8

HelloWorldLTY changed the title ~~Implementation of CellPLM and scFoundation~~ Implementation of CellPLM Dec 19, 2024

lazappi requested changes Dec 20, 2024

View reviewed changes

update cellplm all

85a21a2

update doi

ff325ac

lazappi reviewed Jan 7, 2025

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

src/methods/cellplm/config.vsh.yaml Outdated Show resolved Hide resolved

lazappi reviewed Jan 7, 2025

View reviewed changes

src/methods/cellplm/script.py Outdated Show resolved Hide resolved

update all:

c3c03a6

lazappi reviewed Jan 8, 2025

View reviewed changes

src/methods/cellplm/config.vsh.yaml Show resolved Hide resolved

src/methods/cellplm/script.py Outdated Show resolved Hide resolved

lazappi requested changes Jan 8, 2025

View reviewed changes

src/methods/cellplm/script.py Outdated Show resolved Hide resolved

src/methods/cellplm/script.py Outdated Show resolved Hide resolved

update correct label

6a98ccb

lazappi added 5 commits January 10, 2025 08:33

Remove scgpt from cellplm config

28f3ce0

Add model_name argument to cellplm

f35db1d

Add code for extracting model files

Warn if GPU is not available for cellplm

b96e12c

Fix model temp dir cleanup

e6a15cd

Add cellplm to benchmark workflow

4454035

Tidy and style cellplm script

3951a7d

lazappi added 3 commits January 27, 2025 16:06

Adjust cellplm to use counts

7b4f4a9

Add non-applicable exit code to CellPLM

938d673

lazappi requested a review from rcannood January 28, 2025 06:40

Implementation of CellPLM #19

Are you sure you want to change the base?

Implementation of CellPLM #19

Uh oh!

Conversation

HelloWorldLTY commented Dec 14, 2024 • edited by lazappi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Checklist before requesting a review

Uh oh!

lazappi commented Dec 17, 2024

Uh oh!

HelloWorldLTY commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lazappi commented Dec 18, 2024

Uh oh!

HelloWorldLTY commented Dec 18, 2024

Uh oh!

lazappi commented Dec 19, 2024

Uh oh!

HelloWorldLTY commented Dec 19, 2024

Uh oh!

lazappi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HelloWorldLTY commented Dec 26, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HelloWorldLTY commented Jan 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lazappi commented Jan 8, 2025

Uh oh!

HelloWorldLTY commented Jan 8, 2025

Uh oh!

lazappi commented Jan 9, 2025

Uh oh!

lazappi commented Jan 10, 2025

Uh oh!

HelloWorldLTY commented Jan 10, 2025

Uh oh!

lazappi commented Jan 13, 2025

Uh oh!

HelloWorldLTY commented Jan 13, 2025

Uh oh!

lazappi commented Jan 24, 2025

Uh oh!

HelloWorldLTY commented Jan 24, 2025

Uh oh!

lazappi commented Jan 28, 2025

Uh oh!

Uh oh!

HelloWorldLTY commented Dec 14, 2024 •

edited by lazappi

Loading

HelloWorldLTY commented Dec 17, 2024 •

edited

Loading