-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Hi
I'm currently testing the integration of Cromwell with the TESK backend for running workflows over Kubernetes. After reviewing the documentation and deploying the TESK API with a custom configuration, I've started testing Cromwell workflows.
I've included the values.yaml
snippet for the transfer PVC configuration, including Host Base Path and Container Base Path in the TESK HELM deployment:
transfer:
# If you want local file systems support (i.e. 'file:' urls in inputs and outputs),
# you have to define these 2 properties.
active: true # Allow mounting the PVC to the filer.
wes_base_path: '/data' # Define TESK_API_TASKMASTER_ENVIRONMENT_HOST_BASE_PATH (Source PVC)
tes_base_path: '/data' # Define TESK_API_TASKMASTER_ENVIRONMENT_CONTAINER_BASE_PATH (Container PATH)
pvc_name: 'azurefile-cromwell-pvc' # Defines TESK_API_TASKMASTER_ENVIRONMENT_TRANSFER_PVC_NAME
Here's the configuration I'm using in the Cromwell workflow to configure the backend:
TESK {
actor-factory = "cromwell.backend.impl.tes.TesBackendLifecycleActorFactory"
config {
# Base for workflow executions
root: "/data/hello_wdl_workflow"
dockerRoot: "/data/hello_wdl_workflow"
endpoint = "http://10.199.140.239:8080/ga4gh/tes/v1/tasks"
glob-link-command = "ls -L GLOB_PATTERN 2> /dev/null | xargs -I ? ln -s ? GLOB_DIRECTORY"
# Shared filesystem configuration
filesystems {
# This enables Cromwell to treat the local filesystem as a shared file system.
# All paths referenced in WDLs should be accessible both to Cromwell and TESK containers.
local {
localization: [
"hard-link",
"soft-link",
"copy"
]
caching {
duplication-strategy: ["hard-link", "soft-link","copy"]
hashing-strategy: "file"
}
}
}
I've noticed that the workflow execution consistently deploys a temporary task PVC and a filer Pod to copy the necessary input and output data for each task.
My question is: Is it possible to avoid deploying a temporary task PVC and filer pod every time, and instead use the main transfer PVC as the primary storage for the entire workflow (all tasks)? The reason for this is that we have a significant amount of data on the main transfer PVC that will be reused, and we would prefer to work directly within the PV filesystem structure.
The documentation mentions Shared Filesystem (SFS) support for Cromwell + TESK, but I'm unsure if this means the Filer function must still be invoked for every task. Is there a way to operate directly on the main PVC, especially since it's mounting a volume backed by NFS?
Regards,
Jose E Torres