scPoli dimensionality and other parameters

Hello everyone,

I try to integrate around 500 000 cells from around a dozen GSE datasets. I'm not sure however how to asses the optimal number of parameters, the results I get are more or less ok, but not great. I will be very gratefull for clarifying sever things and suggesting what to do and what to avoid.

1. I assume we should rely only on HVGs, aroung 2000-5000 I guess, not on all genes? Should the count matrix be normalized?
2. Does the number of total and pre-training epochs affect significantly the final embedding? For example, will changing the default 100 total and 70 pre to 200 and 150 improve noticeably the result?
3. Is there any rule of thumb on how to asses the number of embedding and latent dimensions? Is 50 and 20 a reasonable choice? Or maybe a 50 and 50? I have the intuition for regular PC or Harmony dimensionality selection, but you neural network has fundamentally different nature, especially the latent dimensionality is a mystery for me.
4. I don't want to transfer any labels, so I removed cell_type_keys parameter from the scPoli function, a set labeled_indices to []. This is not a correct approach right, we need to set the labeled_indices anyway (https://github.com/theislab/scarches/issues/224)?

My code looks like this:

```{python}
scpoli_model = scPoli(
    adata=new_adata,
    condition_keys=['GSE'],
    # cell_type_keys=cell_type_key,
    embedding_dims=50,
    latent_dim=20,
    recon_loss='nb',
)

scpoli_model.train(
    n_epochs=100,
    pretraining_epochs=70,
    early_stopping_kwargs=early_stopping_kwargs,
    eta=5,
)

scpoli_query = scPoli.load_query_data(
    adata=new_adata,
    reference_model=scpoli_model,
    labeled_indices=[],
)

data_latent= scpoli_query.get_latent(new_adata, mean=True)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scPoli dimensionality and other parameters #260

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

scPoli dimensionality and other parameters #260

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions