Skip to content

Segmentation fault with DiffusionMap #63

@lucygarner

Description

@lucygarner

Hi,

I am getting a segmentation fault with DiffusionMap. I am not changing any of the defaults.

  1. When inputting a data matrix as data, I get the following error:

*** caught segfault ***
address 0x7f7675a30cb0, cause 'memory not mapped'
Error: Could not call find_knn. Consider specifying knn_params = list(M = <larger number>). Original error:
long vectors not supported yet: ../../src/include/Rinlinedfuns.h:537

  1. When inputting a SingleCellExperiment object as data, I get the following error:

*** caught segfault ***
address 0x7f9dc6fb2cb0, cause 'memory not mapped'

Traceback:
1: knn_asym(data, k, distance)
2: knn.covertree::find_knn(data, k, query = query, distance = distance, sym = sym)
3: (function (data, k, ..., query = NULL, distance = c("euclidean", "cosine", "rankcor", "l2"), method = c("covertree", "hnsw"), sym = TRUE, verbose = FALSE) { p <- utils::modifyList(formals(RcppHNSW::hnsw_knn), list(...)) method <- match.arg(method) distance <- match.arg(distance) if (!is.double(data)) { warning("find_knn does not yet support sparse matrices, converting data to a dense matrix.") data <- as.matrix(data) } if (method == "covertree") { return(knn.covertree::find_knn(data, k, query = query, distance = distance, sym = sym)) } if (distance == "rankcor") { distance <- "cosine" data <- rank_mat(data) if (!is.null(query)) query <- rank_mat(query) } if (is.null(query)) { knn <- hnsw_knn(data, k + 1L, distance, M = p$M, ef_construction = p$ef_construction, ef = p$ef, verbose = verbose) knn$idx <- knn$idx[, -1, drop = FALSE] knn$dist <- knn$dist[, -1, drop = FALSE] } else { index <- hnsw_build(data, distance, M = p$M, ef = p$ef_construction, verbose = verbose) knn <- hnsw_search(query, index, k, ef = p$ef, verbose = verbose) } names(knn)[[1L]] <- "index" knn$dist_mat <- sparseMatrix(rep(seq_len(nrow(knn$index)), k), as.vector(knn$index), x = as.vector(knn$dist), dims = c(nrow(if (is.null(query)) data else query), nrow(data))) if (is.null(query)) { if (sym) knn$dist_mat <- symmetricise(knn$dist_mat) nms <- rownames(data) } else { nms <- rownames(query) } rownames(knn$dist_mat) <- rownames(knn$index) <- rownames(knn$dist) <- nms colnames(knn$dist_mat) <- rownames(data) knn})(new("dgCMatrix", i = c(11854L, 32418L, 46422L, 42L, 100L, 173L, 285L, 293L, 419L, 504L, 629L, 694L, 743L, 777L, 835L, 1122L, 1183L, 1214L, 1259L, 1318L, 1382L, 1389L, 1402L, 1407L, 1655L, 1738L, 1779L, 1997L, 2008L, 2018L, 2023L, 2060L, 2204L, 2241L, 2416L, 2500L, 2558L, 2635L, 2690L, 2701L, 2715L, 2738L, 2742L, 2908L, 2982L, 3118L, 3119L, 3153L, 3311L, 3420L, 3566L, 3605L, 3691L, 3695L, 3715L, 3759L, 4015L, 4108L, 4164L, 4209L, 4260L, 4307L, 4319L, 4373L, 4649L, 4672L, 4702L, 4860L, 5361L, 5426L, 5593L, 5595L, 5638L, 5643L, 5675L, 5791L, 5934L, 5937L, 5942L, 6441L, 6442L, 6604L, 6714L, 6731L, 6740L, 6800L, 6844L, 6881L, 6906L, 6954L, 6984L, 7027L, 7033L, 7099L, 7177L, 7196L, 7260L, 7343L, 7356L, 7376L, 7569L, 7688L, 7831L, 7952L, 8024L, 8071L, 8097L, 8128L, 8131L, 8179L, 8207L, 8216L, 8444L, 8503L, 8527L, 8698L, 8718L, 8776L, 8820L, 8856L, 8987L, 8994L, 9116L, 9362L, 9363L, 9383L, 9449L, 9631L, 9686L, 9714L, 9750L, 9826L, 9873L, 10063L, 10079L, 10392L, 10400L, 10469L, 10504L, 10579L, 10600L, 10646L, 10866L, 10961L, 11055L, 11501L, 11511L, 11671L, 11780L, 11823L, 12115L, 12134L, 12242L, 12290L, 12353L, 12411L, 12544L, 12571L, 12890L, 12982L, 13013L, 13019L, 13029L, 13193L, 13259L, 13497L, 13548L, 13646L, 13704L, 13820L, 13896L, 13922L, 14016L, 14026L, 14045L, 14135L, 14158L, 14213L, 14221L, 14280L, 14368L, 14376L, 14390L, 14527L, 14598L, 14776L, 14850L, 14910L, 14942L, 15176L, 15356L, 15496L, 15505L, 15507L, 15566L, 15792L, 15824L, 15842L, 15951L, 16007L, 16331L, 16340L, 16345L, 16352L, 16406L, 16416L, 16471L, 16595L, 16656L, 16785L, 16869L, 16880L, 17217L, 17392L, 17461L, 17579L, 17582L, 17897L, 17948L, 18031L, 18195L, 18331L, 18378L, 18456L, 18459L, 18560L, 18590L, 18657L, 18820L, 18851L, 19034L, 19073L, 19181L, 19403L, 19689L, 19800L, 19851L, 19866L, 19918L, 19967L, 20026L, 20101L, 20104L, 20180L, 20225L, 20262L, 20549L, 20666L, 20737L, 20900L, 21116L, 21412L, 21725L, 21749L

I assume these errors are both down to the large size of my data (~100,000 cells x ~20000 genes) and the best approach would be to input PCA scores rather than the normalised expression values? Or is there another way around this?

Best wishes,
Lucy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions