Segmentation fault with `DiffusionMap`

Hi,

I am getting a segmentation fault with `DiffusionMap`. I am not changing any of the defaults.

1) When inputting a data matrix as `data`, I get the following error:

> *** caught segfault ***
>     address 0x7f7675a30cb0, cause 'memory not mapped'
> Error: Could not call find_knn. Consider specifying `knn_params = list(M = <larger number>)`. Original error:
>     long vectors not supported yet: ../../src/include/Rinlinedfuns.h:537

2) When inputting a SingleCellExperiment object as `data`, I get the following error:

>  *** caught segfault ***
> address 0x7f9dc6fb2cb0, cause 'memory not mapped'
> 
> Traceback:
>  1: knn_asym(data, k, distance)
>  2: knn.covertree::find_knn(data, k, query = query, distance = distance,     sym = sym)
>  3: (function (data, k, ..., query = NULL, distance = c("euclidean",     "cosine", "rankcor", "l2"), method = c("covertree", "hnsw"),     sym = TRUE, verbose = FALSE) {    p <- utils::modifyList(formals(RcppHNSW::hnsw_knn), list(...))    method <- match.arg(method)    distance <- match.arg(distance)    if (!is.double(data)) {        warning("find_knn does not yet support sparse matrices, converting data to a dense matrix.")        data <- as.matrix(data)    }    if (method == "covertree") {        return(knn.covertree::find_knn(data, k, query = query,             distance = distance, sym = sym))    }    if (distance == "rankcor") {        distance <- "cosine"        data <- rank_mat(data)        if (!is.null(query))             query <- rank_mat(query)    }    if (is.null(query)) {        knn <- hnsw_knn(data, k + 1L, distance, M = p$M, ef_construction = p$ef_construction,             ef = p$ef, verbose = verbose)        knn$idx <- knn$idx[, -1, drop = FALSE]        knn$dist <- knn$dist[, -1, drop = FALSE]    }    else {        index <- hnsw_build(data, distance, M = p$M, ef = p$ef_construction,             verbose = verbose)        knn <- hnsw_search(query, index, k, ef = p$ef, verbose = verbose)    }    names(knn)[[1L]] <- "index"    knn$dist_mat <- sparseMatrix(rep(seq_len(nrow(knn$index)),         k), as.vector(knn$index), x = as.vector(knn$dist), dims = c(nrow(if (is.null(query)) data else query),         nrow(data)))    if (is.null(query)) {        if (sym)             knn$dist_mat <- symmetricise(knn$dist_mat)        nms <- rownames(data)    }    else {        nms <- rownames(query)    }    rownames(knn$dist_mat) <- rownames(knn$index) <- rownames(knn$dist) <- nms    colnames(knn$dist_mat) <- rownames(data)    knn})(new("dgCMatrix", i = c(11854L, 32418L, 46422L, 42L, 100L, 173L, 285L, 293L, 419L, 504L, 629L, 694L, 743L, 777L, 835L, 1122L, 1183L, 1214L, 1259L, 1318L, 1382L, 1389L, 1402L, 1407L, 1655L, 1738L, 1779L, 1997L, 2008L, 2018L, 2023L, 2060L, 2204L, 2241L, 2416L, 2500L, 2558L, 2635L, 2690L, 2701L, 2715L, 2738L, 2742L, 2908L, 2982L, 3118L, 3119L, 3153L, 3311L, 3420L, 3566L, 3605L, 3691L, 3695L, 3715L, 3759L, 4015L, 4108L, 4164L, 4209L, 4260L, 4307L, 4319L, 4373L, 4649L, 4672L, 4702L, 4860L, 5361L, 5426L, 5593L, 5595L, 5638L, 5643L, 5675L, 5791L, 5934L, 5937L, 5942L, 6441L, 6442L, 6604L, 6714L, 6731L, 6740L, 6800L, 6844L, 6881L, 6906L, 6954L, 6984L, 7027L, 7033L, 7099L, 7177L, 7196L, 7260L, 7343L, 7356L, 7376L, 7569L, 7688L, 7831L, 7952L, 8024L, 8071L, 8097L, 8128L, 8131L, 8179L, 8207L, 8216L, 8444L, 8503L, 8527L, 8698L, 8718L, 8776L, 8820L, 8856L, 8987L, 8994L, 9116L, 9362L, 9363L, 9383L, 9449L, 9631L, 9686L, 9714L, 9750L, 9826L, 9873L, 10063L, 10079L, 10392L, 10400L, 10469L, 10504L, 10579L, 10600L, 10646L, 10866L, 10961L, 11055L, 11501L, 11511L, 11671L, 11780L, 11823L, 12115L, 12134L, 12242L, 12290L, 12353L, 12411L, 12544L, 12571L, 12890L, 12982L, 13013L, 13019L, 13029L, 13193L, 13259L, 13497L, 13548L, 13646L, 13704L, 13820L, 13896L, 13922L, 14016L, 14026L, 14045L, 14135L, 14158L, 14213L, 14221L, 14280L, 14368L, 14376L, 14390L, 14527L, 14598L, 14776L, 14850L, 14910L, 14942L, 15176L, 15356L, 15496L, 15505L, 15507L, 15566L, 15792L, 15824L, 15842L, 15951L, 16007L, 16331L, 16340L, 16345L, 16352L, 16406L, 16416L, 16471L, 16595L, 16656L, 16785L, 16869L, 16880L, 17217L, 17392L, 17461L, 17579L, 17582L, 17897L, 17948L, 18031L, 18195L, 18331L, 18378L, 18456L, 18459L, 18560L, 18590L, 18657L, 18820L, 18851L, 19034L, 19073L, 19181L, 19403L, 19689L, 19800L, 19851L, 19866L, 19918L, 19967L, 20026L, 20101L, 20104L, 20180L, 20225L, 20262L, 20549L, 20666L, 20737L, 20900L, 21116L, 21412L, 21725L, 21749L

I assume these errors are both down to the large size of my data (~100,000 cells x ~20000 genes) and the best approach would be to input PCA scores rather than the normalised expression values? Or is there another way around this?

Best wishes,
Lucy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Segmentation fault with `DiffusionMap` #63

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Segmentation fault with DiffusionMap #63

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Segmentation fault with `DiffusionMap` #63