You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can also inspect the `hdfs` folder where the `core-site.xml` and `hdfs-site.xml` from the discovery ConfigMap of the HDFS cluster are located.
144
144
145
-
[NOTE]
146
-
====
147
145
The image defined for the spark job must contain all dependencies needed for that job to run.
148
-
For pyspark jobs, this will mean that Python libraries either need to be baked into the image (this demo contains a Dockerfile that was used to generate an image containing scikit-learn, pandas and their dependencies) or {spark-pkg}[packaged in some other way].
149
-
====
146
+
For PySpark jobs, this will mean that Python libraries either need to be baked into the image or {spark-pkg}[packaged in some other way].
147
+
This demo contains a custom image created from a Dockerfile that is used to generate an image containing scikit-learn, pandas and their dependencies.
148
+
This is described below.
149
+
150
+
=== Install the libraries into a product image
151
+
152
+
Libraries can be added to a custom *product* image launched by the notebook. Suppose a Spark job is prepared like this:
This is created by taking a Spark image, in this case `docker.stackable.tech/stackable/spark-k8s:3.5.0-stackable24.3.0`, installing specific python libraries into it
183
+
, and re-tagging the image:
184
+
185
+
[source,console]
186
+
----
187
+
FROM docker.stackable.tech/stackable/spark-k8s:3.5.0-stackable24.3.0
The notebook reads the measurement data in windowed batches using a loop, computes some predictions for each batch and persists the scores in a separate timescale table.
67
67
68
+
=== Adding libraries
69
+
70
+
There are two ways of doing this:
71
+
72
+
==== Install from within the notebook
73
+
74
+
This can be done by executing `!pip install` from within a notebook cell, as shown in the screenshot:
75
+
76
+
[source,console]
77
+
----
78
+
!pip install psycopg2-binary
79
+
!pip install alibi-detect
80
+
----
81
+
82
+
==== Install the libraries into a custom image
83
+
84
+
Alternatively dependencies can be added into the base image used for jupyterhub.
85
+
This could make use of any Dockerfile mechanism (downloading via `curl`, using a package manager etc.) and is not limited to python libraries.
86
+
To achieve the same imports as mentioned in the previous section, build the Dockerfile like this:
87
+
88
+
[source,console]
89
+
----
90
+
FROM jupyter/pyspark-notebook:python-3.9
91
+
92
+
COPY demos/signal-processing/requirements.txt .
93
+
94
+
RUN pip install --no-cache-dir --upgrade pip && \
95
+
pip install --no-cache-dir -r ./requirements.txt
96
+
----
97
+
98
+
Where `requirements.txt` contains:
99
+
100
+
[source,console]
101
+
----
102
+
psycopg2-binary==2.9.9
103
+
alibi-detect==0.11.4
104
+
----
105
+
106
+
NOTE: Using a custom image requires access to a repository where the image can be made available.
107
+
68
108
== Model details
69
109
70
110
The enriched data is calculated using an online, unsupervised https://docs.seldon.io/projects/alibi-detect/en/stable/od/methods/sr.html[model] that uses a technique called http://www.houxiaodi.com/assets/papers/cvpr07.pdf[Spectral Residuals].
0 commit comments