Skip to content

[Question] Understanding Filer + PVC mounts #211

@jtorrex

Description

@jtorrex

Hi

I'm currently testing the integration of Cromwell with the TESK backend for running workflows over Kubernetes. After reviewing the documentation and deploying the TESK API with a custom configuration, I've started testing Cromwell workflows.

I've included the values.yaml snippet for the transfer PVC configuration, including Host Base Path and Container Base Path in the TESK HELM deployment:

transfer:
    # If you want local file systems support (i.e. 'file:' urls in inputs and outputs),
    # you have to define these 2 properties.
    active: true # Allow mounting the PVC to the filer.
    wes_base_path: '/data'         # Define TESK_API_TASKMASTER_ENVIRONMENT_HOST_BASE_PATH (Source PVC)
    tes_base_path: '/data'         # Define TESK_API_TASKMASTER_ENVIRONMENT_CONTAINER_BASE_PATH (Container PATH)
    pvc_name: 'azurefile-cromwell-pvc' # Defines TESK_API_TASKMASTER_ENVIRONMENT_TRANSFER_PVC_NAME

Here's the configuration I'm using in the Cromwell workflow to configure the backend:

    TESK {
      actor-factory = "cromwell.backend.impl.tes.TesBackendLifecycleActorFactory"

      config {
        # Base for workflow executions
        root: "/data/hello_wdl_workflow"
        dockerRoot: "/data/hello_wdl_workflow"
        endpoint = "http://10.199.140.239:8080/ga4gh/tes/v1/tasks"
        glob-link-command = "ls -L GLOB_PATTERN 2> /dev/null | xargs -I ? ln -s ? GLOB_DIRECTORY"

        # Shared filesystem configuration
        filesystems {
          # This enables Cromwell to treat the local filesystem as a shared file system.
          # All paths referenced in WDLs should be accessible both to Cromwell and TESK containers.
          local {
            localization: [
              "hard-link",
              "soft-link",
              "copy"
            ]
            caching {
                duplication-strategy: ["hard-link", "soft-link","copy"]
                  hashing-strategy: "file"
            }
          }
        }

I've noticed that the workflow execution consistently deploys a temporary task PVC and a filer Pod to copy the necessary input and output data for each task.

My question is: Is it possible to avoid deploying a temporary task PVC and filer pod every time, and instead use the main transfer PVC as the primary storage for the entire workflow (all tasks)? The reason for this is that we have a significant amount of data on the main transfer PVC that will be reused, and we would prefer to work directly within the PV filesystem structure.

The documentation mentions Shared Filesystem (SFS) support for Cromwell + TESK, but I'm unsure if this means the Filer function must still be invoked for every task. Is there a way to operate directly on the main PVC, especially since it's mounting a volume backed by NFS?

Regards,

Jose E Torres

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions