KrishnaswamyLab
diff --git a/‎.travis.yml
Lines changed: 1 addition & 1 deletion b/‎.travis.yml
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md
Lines changed: 15 additions & 16 deletions b/‎README.md
Lines changed: 15 additions & 16 deletions
diff --git a/‎multiscale_phate/compress.py
Lines changed: 15 additions & 3 deletions b/‎multiscale_phate/compress.py
Lines changed: 15 additions & 3 deletions
diff --git a/‎multiscale_phate/condense.py
Lines changed: 10 additions & 2 deletions b/‎multiscale_phate/condense.py
Lines changed: 10 additions & 2 deletions
diff --git a/‎multiscale_phate/diffuse.py
Lines changed: 10 additions & 3 deletions b/‎multiscale_phate/diffuse.py
Lines changed: 10 additions & 3 deletions
diff --git a/‎multiscale_phate/embed.py
Lines changed: 72 additions & 4 deletions b/‎multiscale_phate/embed.py
Lines changed: 72 additions & 4 deletions
@@ -12,7 +12,7 @@ script:
 - nose2
 deploy:
   provider: pypi
-  user: scottgigante
+  user: mkuchroo
   password: ${PYPI_PASSWORD}
   distributions: sdist bdist_wheel
   skip_existing: true
 
@@ -1,4 +1,4 @@
-Multiscale_PHATE
+Multiscale PHATE
 ================
 
 [![Latest PyPi version](https://img.shields.io/pypi/v/multiscale_phate.svg)](https://pypi.org/project/multiscale_phate/)
@@ -8,36 +8,35 @@ Multiscale_PHATE
 [![GitHub stars](https://img.shields.io/github/stars/KrishnaswamyLab/Multiscale_PHATE.svg?style=social&label=Stars)](https://github.com/KrishnaswamyLab/Multiscale_PHATE/)
 [![Code style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 
-This is a short description of the package.
+Multiscale PHATE is a python package for multiresolution analysis of high dimensional data. For an in-depth explanation of the algorithm and applications, please read our manuscript on [BioRxiv](https://www.biorxiv.org/content/10.1101/2020.11.15.383661v1.article-info).
+
+The biomedical community is producing increasingly high dimensional datasets integrated from hundreds of patient samples that current computational techniques are unable to explore. Current tools for dimensionality reduction, such as tSNE, UMAP, and PCA, and clustering, such as Louvain and Leiden, only show a single salient level of granularity in biomedical data. When applied to cellular datasets currently being produced, these techniques are able to visualize and cluster major cell types such as B cells, T cells and myeloid cells. Differences between patient disease states, however, may not be found at the granularity of cell type alone. In fact, appreciation of a finer resolution the manifold would reveal subsets that may be predictive of outcome. This phenomenon is found across biomedical data science, as the cellular state space is known to form a collection of sub-manifolds that disease status can differentially affect.
+
+The goal of Multiscale PHATE is to learn and visualize abstract cellular features and groupings of the data at all levels of granularity in an efficient manner to identify meaningful resolutions. Our approach learns a tree of data granularities which can be cut at coarse levels for high level summarizations of data as well as at fine levels for detailed representations on subsets. Our algorithm is based on a dynamic process we have developed called diffusion condensation, that computes a manifold-intrinsic diffusion space on the original data before slowly condensing data points towards local centers of gravity to form natural, data-driven groupings across multiple granularities.  While this may sound computationally inefficient, we show that we are able to perform these calculations as well as visualize and cluster the data significantly faster than “single-scale” visualization techniques like tSNE, UMAP or PHATE, allowing the analysis of millions of cells within minutes.  When combined with other computational algorithms for high dimensional data analysis, such as MELD, DREMI and TrajcetoryNet, Multiscale PHATE is able to provide deep and detailed insights in biological processes.
 
 Installation
 ------------
 
-Multiscale_PHATE is available on `pip`. Install by running the following in a terminal:
+Multiscale PHATE is available on `pip`. Install by running the following in a terminal:
 
 ```
 pip install --user git+https://github.com/KrishnaswamyLab/Multiscale_PHATE
 ```
 
-Quick start
+Quick Start
 -----------
 
 ```
-import numpy as np
-X = np.random.normal(0, 1, (100, 10))
-
 import multiscale_phate
 mp_op = multiscale_phate.Multiscale_PHATE()
-hp_embedding, cluster_viz, sizes_viz, tree = mp_op.fit_transform(X)
+mp_embedding, mp_clusters, mp_sizes, tree = mp_op.fit_transform(X)
 
 # Plot optimal visualization
-scprep.plot.scatter2d(hp_embedding, s = sizes_viz, c = cluster_viz,
-                      fontsize=16, ticks=False,label_prefix="Multiscale-PHATE", figsize=(16,12))
+scprep.plot.scatter2d(mp_embedding, s = mp_sizes, c = mp_clusters,
+                      fontsize=16, ticks=False,label_prefix="Multiscale PHATE", figsize=(16,12))
+```
 
-# Plot condensation tree
-scprep.plot.scatter3d(tree, c=tree[:,2],fontsize=16, ticks=False, label_prefix="C-PHATE", figsize=(16,12), s=20)
+Guided Tutorial
+-----------
 
-# Embed online data
-Y = np.random.normal(0.5, 1, (50, 10))
-hp_embedding, cluster_viz, sizes_viz, tree = mp_op.transform(Y)
-```
+For more details on using Multiscale PHATE, see our [guided tutorial](tutorial/10X_pbmc.ipynb) using 10X's public PBMC4k dataset.
@@ -33,9 +33,10 @@ def get_compression_features(N, features, n_pca, partitions, landmarks):
     if n_pca > 100:
         n_pca = 100
 
+        n_pca = 100
+
     # if N<100000:
     #     partitions=None
-
     if partitions != None and partitions >= N:
         partitions = None
 
@@ -47,7 +48,7 @@ def get_compression_features(N, features, n_pca, partitions, landmarks):
     return n_pca, partitions
 
 
-def cluster_components(data_subset, num_cluster, size):
+def cluster_components(data_subset, num_cluster, size, random_state=None):
     """Short summary.
 
     Parameters
@@ -58,6 +59,10 @@ def cluster_components(data_subset, num_cluster, size):
         Description of parameter `num_cluster`.
     size : type
         Description of parameter `size`.
+    random_state : integer or numpy.RandomState, optional, default: None
+        The generator used to initialize MiniBatchKMeans.
+        If an integer is given, it fixes the seed.
+        Defaults to the global `numpy` random number generator
 
     Returns
     -------
@@ -80,11 +85,12 @@ def cluster_components(data_subset, num_cluster, size):
         n_init=10,
         max_no_improvement=10,
         verbose=0,
+        random_state=random_state,
     ).fit(data_subset)
     return mbk.labels_
 
 
-def subset_data(data, desired_num_clusters, n_jobs, num_cluster=100):
+def subset_data(data, desired_num_clusters, n_jobs, num_cluster=100, random_state=None):
     """Short summary.
 
     Parameters
@@ -97,6 +103,10 @@ def subset_data(data, desired_num_clusters, n_jobs, num_cluster=100):
         Description of parameter `n_jobs`.
     num_cluster : type
         Description of parameter `num_cluster`.
+    random_state : integer or numpy.RandomState, optional, default: None
+        The generator used to initialize MiniBatchKMeans.
+        If an integer is given, it fixes the seed.
+        Defaults to the global `numpy` random number generator
 
     Returns
     -------
@@ -115,6 +125,7 @@ def subset_data(data, desired_num_clusters, n_jobs, num_cluster=100):
             n_init=10,
             max_no_improvement=10,
             verbose=0,
+            random_state=random_state,
         ).fit(data)
 
         clusters = mbk.labels_
@@ -128,6 +139,7 @@ def subset_data(data, desired_num_clusters, n_jobs, num_cluster=100):
                     data[np.where(clusters == clusters_unique[i])[0], :],
                     num_cluster,
                     size,
+                    random_state=random_state,
                 )
                 for i in range(len(clusters_unique))
             )
 
@@ -85,7 +85,7 @@ def compute_condensation_param(X, granularity):
     return epsilon, merge_threshold
 
 
-def condense(X, clusters, scale, epsilon, merge_threshold, n_jobs):
+def condense(X, clusters, scale, epsilon, merge_threshold, n_jobs, random_state=None):
     """Short summary.
 
     Parameters
@@ -102,6 +102,10 @@ def condense(X, clusters, scale, epsilon, merge_threshold, n_jobs):
         Description of parameter `merge_threshold`.
     n_jobs : type
         Description of parameter `n_jobs`.
+    random_state : integer or numpy.RandomState, optional, default: None
+        The generator used to initialize graphtools.
+        If an integer is given, it fixes the seed.
+        Defaults to the global `numpy` random number generator
 
     Returns
     -------
@@ -141,7 +145,11 @@ def condense(X, clusters, scale, epsilon, merge_threshold, n_jobs):
             while len(merge_pairs) == 0:
                 epsilon = scale * epsilon
                 G = graphtools.Graph(
-                    X_1, knn=min(X_1.shape[0] - 2, 5), bandwidth=epsilon, n_jobs=n_jobs
+                    X_1,
+                    knn=min(X_1.shape[0] - 2, 5),
+                    bandwidth=epsilon,
+                    n_jobs=n_jobs,
+                    random_state=random_state,
                 )
 
                 P_s = G.P.toarray()
 
@@ -6,7 +6,9 @@
 from . import compress
 
 
-def compute_diffusion_potential(data, N, decay, gamma, knn, landmarks=2000, n_jobs=10):
+def compute_diffusion_potential(
+    data, N, decay, gamma, knn, landmarks=2000, n_jobs=10, random_state=None
+):
     """Short summary.
 
     Parameters
@@ -25,6 +27,10 @@ def compute_diffusion_potential(data, N, decay, gamma, knn, landmarks=2000, n_jo
         Description of parameter `landmarks`.
     n_jobs : type
         Description of parameter `n_jobs`.
+    random_state : integer or numpy.RandomState, optional, default: None
+        The generator used to initialize PHATE and PCA.
+        If an integer is given, it fixes the seed.
+        Defaults to the global `numpy` random number generator
 
     Returns
     -------
@@ -40,15 +46,16 @@ def compute_diffusion_potential(data, N, decay, gamma, knn, landmarks=2000, n_jo
         diff_op = phate.PHATE(
             verbose=False,
             n_landmark=landmarks,
-            n_pca=None,
             decay=decay,
             gamma=gamma,
+            n_pca=None,
             knn=knn,
             n_jobs=n_jobs,
+            random_state=random_state,
         )
         diff_op.fit(data)
 
-        pca = sklearn.decomposition.PCA(n_components=25)
+        pca = sklearn.decomposition.PCA(n_components=25, random_state=random_state)
         diff_potential_pca = pca.fit_transform(diff_op.diff_potential)
 
     return (
 
@@ -1,5 +1,6 @@
 import numpy as np
 import phate
+import tasklogger
 
 
 def repulsion(temp):
@@ -68,6 +69,7 @@ def compute_gradient(Xs, merges):
         Description of returned object.
 
     """
+    tasklogger.log_info("Computing gradient...")
     gradient = []
     m = 0
     X = Xs[0]
@@ -86,6 +88,65 @@ def compute_gradient(Xs, merges):
     return np.array(gradient)
 
 
+def get_levels(grad):
+    """Short summary.
+
+    Parameters
+    ----------
+    grad : type
+        Description of parameter `Xs`.
+
+    Returns
+    -------
+    type
+        Description of returned object.
+
+
+    """
+    tasklogger.log_info("Identifying salient levels of resolution...")
+    minimum = np.max(grad)
+    levels = []
+    levels.append(0)
+
+    for i in range(1, len(grad) - 1):
+        if grad[i] <= minimum and grad[i] < grad[i + 1]:
+            levels.append(i)
+            minimum = grad[i]
+    return levels
+
+
+def get_zoom_visualization(
+    Xs,
+    NxTs,
+    zoom_visualization_level,
+    zoom_cluster_level,
+    coarse_cluster_level,
+    coarse_cluster,
+    n_jobs,
+    random_state=None,
+):
+    """Short summary
+
+    Parameters
+    ----------
+
+    random_state : integer or numpy.RandomState, optional, default: None
+        The generator used to initialize MDS.
+        If an integer is given, it fixes the seed.
+        Defaults to the global `numpy` random number generator
+    """
+
+    unique = np.unique(
+        NxTs[zoom_visualization_level], return_index=True, return_counts=True
+    )
+    extract = NxTs[coarse_cluster_level][unique[1]] == coarse_cluster
+
+    subset_X = Xs[zoom_visualization_level]
+    embedding = phate.mds.embed_MDS(subset_X[extract], n_jobs=n_jobs, seed=random_state)
+
+    return embedding, NxTs[zoom_cluster_level][unique[1]][extract], unique[2][extract]
+
+
 def compute_ideal_visualization_layer(gradient, Xs, min_cells=100):
     """Short summary.
 
@@ -117,9 +178,12 @@ def compute_ideal_visualization_layer(gradient, Xs, min_cells=100):
     return min_layer
 
 
-def get_clusters_sizes_2(clusters_full, layer, NxT, X, repulse=False, n_jobs=10):
+def get_clusters_sizes_2(
+    clusters_full, layer, NxT, X, repulse=False, n_jobs=10, random_state=None
+):
     """Short summary.
 
+    Parameters
     Parameters
     ----------
     clusters_full : type
@@ -134,6 +198,10 @@ def get_clusters_sizes_2(clusters_full, layer, NxT, X, repulse=False, n_jobs=10)
         Description of parameter `repulse`.
     n_jobs : type
         Description of parameter `n_jobs`.
+    random_state : integer or numpy.RandomState, optional, default: None
+        The generator used to initialize MDS.
+        If an integer is given, it fixes the seed.
+        Defaults to the global `numpy` random number generator
 
     Returns
     -------
@@ -149,7 +217,7 @@ def get_clusters_sizes_2(clusters_full, layer, NxT, X, repulse=False, n_jobs=10)
     subset_X = X[layer]
 
     if repulse:
-        embedding = phate.mds.embed_MDS(repulsion(subset_X.copy()), n_jobs=n_jobs)
-    else:
-        embedding = phate.mds.embed_MDS(subset_X, n_jobs=n_jobs)
+        subset_X = repulsion(subset_X.copy())
+
+    embedding = phate.mds.embed_MDS(subset_X, n_jobs=n_jobs, seed=random_state)
     return embedding, clusters_full[unique[1]], unique[2]