Skip to content

Availability of Spatial Indexes #17

@rexdouglass

Description

@rexdouglass

I'm experimenting with geospark and find the spatial joins slower than expected.

I've set geospark.join.gridtype to "kdbtree" in my configuration below.

Is there something else I need to do to enable or use spatial indexes when creating, saving, or joining on parquets with a geom column?

library(tidyverse)
library(sparklyr)
library(geospark)
conf <- spark_config()

conf$`sparklyr.cores.local` <- 48
conf$spark.serializer <- "org.apache.spark.serializer.KryoSerializer"
conf$spark.kryo.registrator <- "org.datasyslab.geospark.serde.GeoSparkKryoRegistrator"
conf$spark.kryoserializer.buffer.max <- "2047MB" #Caused by: java.lang.IllegalArgumentException: spark.kryoserializer.buffer.max must be less than 2048 mb, got: + 10240 mb.
conf$geospark.join.gridtype <- "kdbtree"

conf$spark.driver.maxResultSize <- "30G"
conf$spark.memory.fraction <- 0.9
conf$spark.executor.heartbeatInterval <-"6000000s"# "10000000s"
conf$spark.network.timeout <- "6000001s"
conf$spark.local.dir <- "/media/skynet2/905884f0-7546-4273-9061-12a790830beb/spark_temp/"
conf$spark.worker.cleanup.enabled <- "true"
conf$"sparklyr.shell.driver-memory"= "300G"
conf$'spark.driver.maxResultSize' <- 0 #0 is ulimmited
sc <- spark_connect(master = "local", config = conf,
                    version = "2.3.3" #for geospark
) 
sc <- register_gis(sc)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions