Skip to content

arashabadi/cm4ai_codefest2025

Repository files navigation

2025 CM4AI Hackathon at UAB

Logo

This repo is related to the participation in 2025 CM4AI Hackathon at UAB


Day 1 (9AM-5PM)

Some suggested projects:

  1. Embedding
  2. Building community network and hierarchy > classic louvain algorithm
  3. Visible Neural Networks (VNN)

Provided compute power (TACC by The University of Texas at Austin):

$ ssh username@frontera.tacc.utexas.edu

Our Team (Embedding Mafia)

  • Arash Abadi (UAB)
  • Morgan Smith (UT Health San Antonio)
  • Rebecca Bernal (UT Health San Antonio)
  • Jedediah Smith (UAB)
  • Mona Shabana (UAB)

Selected Project:

We are going to Implement alternate image embedding method SubCell for immunofluorescence images:

Let's run IF tutorial and then subcell

  1. First try cm4ai-tutorial-immunofluorescence/ we should download 11GB data IF images in RO-Crate format

we downloaded by python src/download.py

  1. SubCell requires segmented images of cells , so we are going to perform cell segmentaiton by the same tool they used for preparing their data to train their model. HPA Cell Segmenation
  • Try HPA Cell Segmenation > reuqires cuda toolkit (NVIDIA GPU) >> Run on TACC or Cheaha

  • at HPA Cell Segmentation github Clone > conda env create -f environment.yml > sh install.sh

Hpacellseg should be run as a python script.


Day 2 (9AM-5PM)

I have to connect my github account to TACC / shift to cheaha / use wget or curl to download the prepared code from github into TACC. I will go with wget raw file (python script to run hpacellseg) from github.

Prepare testing data to run hpacellseg

  1. connect to TACC via ssh
  2. idev > 3 (default)
  3. cd $WORK
  4. cd ./analysis
  5. bash data_transfer.sh to transfer 10 images to copy 10 images from "cm4ai-tutorial-immunofluorescence-main/data/raw/paclitaxel/blue" to "./data/"
  6. conda activate hpacellseg

Run hpacellseg > crop images > run subcell

  1. python ./run_hpa_segmentation.py

it will generate a directory called segmentation_results in the same directory of analysis

  1. I will transfer the results into my local machine (MacBook) via scp command
#Transfer the results into my local machine (MacBook) via scp protocol
hostname #(should be local machine)
cd ~
scp -r USERNAME@frontera.tacc.utexas.edu:/work2/10900/USERNAME/frontera/analysis/segmentation_results ~/tacc  # tacc is a testing directory in my local machine
  1. Now let's prepare the data for input of subcell

    We have selected 10 first images from paclitaxel channels (related to cm4ai-tutorial-immunofluorescence/) for hpacellseg input.

  2. Run hpacellseg (Rebecca)

    https://github.com/Bayes-Student1/CM4AI-Group-Project-

  3. prepare cropped images for subcell input (Morgan)

    https://github.com/morgansmith27

  4. Run subcell (Jedediah)

    https://github.com/OriginalBrick/cm4ai-codefest

What we’ll be working on for the next few days.

  • Morgan: Cropping and subcellular visualization on the stacked images (all colors on new dataset) via Google Colab (possibly Visual Studio Code)
  • Jebediah: Subcell tutorial working to change data
  • Rebecca: Currently rerunning the segmentation and renaming the files VS Code
  • Arash: Project management and github maintanence
  • Mona: Background/Significance for powerpoint
  • Editing google slides/powerpoint for everyone

Acknowledgments

We would like to thank the following people who provided significant assistance and support throughout these two days:


My extra notes:

Project Theme - Data Embedding

Data embedding involves transforming high-dimensional biological data (e.g., imaging, proteomics, gene expression) into lower-dimensional representations that preserve meaningful patterns or relationships. The Cell Map pipeline starts by generating embeddings for biological entities such as proteins or genes from each input data source (IF image, AP-MS, and/or perturb-seq). After source-specific embedding, a joint embedding is created and used to generate the protein-protein interaction (PPI) network, which is then used to create hierarchical cell maps. In the current pipeline, a DenseNet model pre-trained on images from the Human Protein Atlas is used to generate image embeddings and node2vec is used to generate embeddings for AP-MS data. Joint/co-embeddings have been implemented using muse and proteingps in the current pipeline. Additional background is provided in Schaffer et al. (2025) and Lenkiewicz et al. (2025).

  • CM4AI preprint:

    Input data streams are integrated via the multi-scale integrated cell (MuSIC) software pipeline employing deep learning models and community detection algorithm.

  • In MuSIC paper:

    For image embedding we used DenseNet, a convolutional neural network with superior performance in capturing protein locations relative to counter-stained cellular landmarks

  • In U2OS Multi-Modal Cell Map paper:

    For the IF data, we applied DenseNet-121

  • U2OS Cell Map data to visualize via cytoscape: https://musicmaps.ai/u2os-cellmap/


Cell Mapping Publications & Background Reading

CM4AI Pipeline and Tools

The official CM4AI Cell Map Pipeline code and documentation are available at:

In addition to these repositories, development forks and environment setup instructions that may be more easily adapted to CodeFest projects are available at:

This development environment can be used to easily make changes to individual steps in the cell map AI/ML pipeline and log training parameters/metrics to MLFlow to assess the impact of new methods or pipeline configurations on generated cell maps.


Perturbation Correlation Network

  1. For each perturbation, compute the mean of all cells (perturbation mean)
  2. Compute the pairwise Pearson correlation matrix of perturbation means
  3. Use UMAP on the correlation matrix to visualize which perturbations correlate similarly

cells that are cluster each other will have similar perturbation means -> results in similar cell phenotype after those perturbations

To install CellMaps Pipeline:

conda create -n cm4ai python=3.8
conda activate cm4ai
pip install cellmaps_pipeline

About

This repo is related to participation in 2025 CM4AI Hackathon at UAB

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published