-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
I'm experimenting with geospark and find the spatial joins slower than expected.
I've set geospark.join.gridtype to "kdbtree" in my configuration below.
Is there something else I need to do to enable or use spatial indexes when creating, saving, or joining on parquets with a geom column?
library(tidyverse)
library(sparklyr)
library(geospark)
conf <- spark_config()
conf$`sparklyr.cores.local` <- 48
conf$spark.serializer <- "org.apache.spark.serializer.KryoSerializer"
conf$spark.kryo.registrator <- "org.datasyslab.geospark.serde.GeoSparkKryoRegistrator"
conf$spark.kryoserializer.buffer.max <- "2047MB" #Caused by: java.lang.IllegalArgumentException: spark.kryoserializer.buffer.max must be less than 2048 mb, got: + 10240 mb.
conf$geospark.join.gridtype <- "kdbtree"
conf$spark.driver.maxResultSize <- "30G"
conf$spark.memory.fraction <- 0.9
conf$spark.executor.heartbeatInterval <-"6000000s"# "10000000s"
conf$spark.network.timeout <- "6000001s"
conf$spark.local.dir <- "/media/skynet2/905884f0-7546-4273-9061-12a790830beb/spark_temp/"
conf$spark.worker.cleanup.enabled <- "true"
conf$"sparklyr.shell.driver-memory"= "300G"
conf$'spark.driver.maxResultSize' <- 0 #0 is ulimmited
sc <- spark_connect(master = "local", config = conf,
version = "2.3.3" #for geospark
)
sc <- register_gis(sc)
Metadata
Metadata
Assignees
Labels
No labels