Upon running Python3 shell inside the container spawned from Charmed Spark Rock image, the import pyspark
fails, with a missing py4j
dependency.
Steps to Reproduce:
docker pull ghcr.io/canonical/charmed-spark:3.5.1-22.04_edge
docker run -d ghcr.io/canonical/charmed-spark:3.5.1-22.04_edge
docker container exec -it <container_name> bash
python3
>>> import pyspark
Observed Error:
>>> import pyspark
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/python/pyspark/__init__.py", line 58, in <module>
from pyspark.conf import SparkConf
File "/opt/spark/python/pyspark/conf.py", line 23, in <module>
from py4j.java_gateway import JVMView, JavaObject
ModuleNotFoundError: No module named 'py4j'
PS, this was found in both base Spark image as well as Jupyterlab image, in versions 3.4 as well as 3.5.