Skip to content

us.gcr.io/broad-dsp-gcr-public/terra-jupyter-hail:latest and us.gcr.io/broad-dsp-gcr-public/terra-jupyter-aou:latest pyspark dependency issue #506

@andrewjordank

Description

@andrewjordank

I am trying to run a hail matrix table extraction using dsub. When I use these two Docker images, I keep getting this error:

Stopped running "user-command": exit status 1: Traceback (most recent call last): File "/mnt/data/script/Aim_1_extracting_genomic_data.py", line 16, in <module> import hail as hl File "/opt/conda/lib/python3.10/site-packages/hail/__init__.py", line 54, in <module> from . import ( File "/opt/conda/lib/python3.10/site-packages/hail/backend/__init__.py", line 1, in <module> from .backend import Backend File "/opt/conda/lib/python3.10/site-packages/hail/backend/backend.py", line 22, in <module> from ..linalg.blockmatrix import BlockMatrix File "/opt/conda/lib/python3.10/site-packages/hail/linalg/__init__.py", line 2, in <module> from .blockmatrix import BlockMatrix, _breeze_from_ndarray, _eigh, _jarray_from_ndarray, _svd File "/opt/conda/lib/python3.10/site-packages/hail/linalg/blockmatrix.py", line 54, in <module> from hail.table import Table File "/opt/conda/lib/python3.10/site-packages/hail/table.py", line 9, in <module> import pyspark ModuleNotFoundError: No module named 'pyspark'

Any ideas on how to fix this issue? Or maybe update the source code to include pyspark? Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions