Skip to content

scPoli dimensionality and other parameters #260

@officialprofile

Description

@officialprofile

Hello everyone,

I try to integrate around 500 000 cells from around a dozen GSE datasets. I'm not sure however how to asses the optimal number of parameters, the results I get are more or less ok, but not great. I will be very gratefull for clarifying sever things and suggesting what to do and what to avoid.

  1. I assume we should rely only on HVGs, aroung 2000-5000 I guess, not on all genes? Should the count matrix be normalized?
  2. Does the number of total and pre-training epochs affect significantly the final embedding? For example, will changing the default 100 total and 70 pre to 200 and 150 improve noticeably the result?
  3. Is there any rule of thumb on how to asses the number of embedding and latent dimensions? Is 50 and 20 a reasonable choice? Or maybe a 50 and 50? I have the intuition for regular PC or Harmony dimensionality selection, but you neural network has fundamentally different nature, especially the latent dimensionality is a mystery for me.
  4. I don't want to transfer any labels, so I removed cell_type_keys parameter from the scPoli function, a set labeled_indices to []. This is not a correct approach right, we need to set the labeled_indices anyway (scPoli Model for Unsupervised Use #224)?

My code looks like this:

scpoli_model = scPoli(
    adata=new_adata,
    condition_keys=['GSE'],
    # cell_type_keys=cell_type_key,
    embedding_dims=50,
    latent_dim=20,
    recon_loss='nb',
)

scpoli_model.train(
    n_epochs=100,
    pretraining_epochs=70,
    early_stopping_kwargs=early_stopping_kwargs,
    eta=5,
)

scpoli_query = scPoli.load_query_data(
    adata=new_adata,
    reference_model=scpoli_model,
    labeled_indices=[],
)

data_latent= scpoli_query.get_latent(new_adata, mean=True)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions