You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
" WHEN price >= 10.0 AND price < 15.0 THEN 'Medium'\n",
1251
+
" ELSE 'High'\n",
1252
+
" END AS category\n",
1253
+
" FROM products\n",
1254
+
"\"\"\"\n",
1255
+
")\n",
1256
+
"\n",
1257
+
"# Select Statement 2\n",
1258
+
"result2 = spark.sql(\n",
1259
+
"\"\"\"\n",
1260
+
" SELECT name,\n",
1261
+
" CASE\n",
1262
+
" WHEN price < 10.0 THEN 'Low'\n",
1263
+
" WHEN price >= 10.0 AND price < 15.0 THEN 'Medium'\n",
1264
+
" ELSE 'High'\n",
1265
+
" END AS category\n",
1266
+
" FROM products\n",
1267
+
" WHERE quantity > 3\n",
1268
+
"\"\"\"\n",
1269
+
")\n",
1270
+
"\n",
1271
+
"# Display the results\n",
1272
+
"result1.show()\n",
1273
+
"result2.show()"
1274
+
]
1275
+
},
1276
+
{
1277
+
"cell_type": "markdown",
1278
+
"id": "f3ba0e79",
1279
+
"metadata": {},
1280
+
"source": [
1281
+
"Spark UDFs (User-Defined Functions) can help address these issues by encapsulating complex logic that is reused across multiple SQL queries. \n",
1282
+
"\n",
1283
+
"In the code example above, we define a UDF `assign_category_label` that assigns category labels based on price. This UDF is then reused in two different SQL statements."
1284
+
]
1285
+
},
1286
+
{
1287
+
"cell_type": "code",
1288
+
"execution_count": 10,
1289
+
"id": "37f4d9c4",
1290
+
"metadata": {},
1291
+
"outputs": [
1292
+
{
1293
+
"name": "stderr",
1294
+
"output_type": "stream",
1295
+
"text": [
1296
+
"24/04/15 09:28:11 WARN SimpleFunctionRegistry: The function assign_category_label replaced a previously registered function.\n"
1297
+
]
1298
+
},
1299
+
{
1300
+
"name": "stdout",
1301
+
"output_type": "stream",
1302
+
"text": [
1303
+
"+---------+-----+--------+--------+\n",
1304
+
"| name|price|quantity|category|\n",
1305
+
"+---------+-----+--------+--------+\n",
1306
+
"|Product 1| 10.0| 5| Medium|\n",
1307
+
"|Product 2| 15.0| 3| High|\n",
1308
+
"|Product 3| 8.0| 2| Low|\n",
1309
+
"+---------+-----+--------+--------+\n",
1310
+
"\n",
1311
+
"+---------+--------+\n",
1312
+
"| name|category|\n",
1313
+
"+---------+--------+\n",
1314
+
"|Product 1| Medium|\n",
1315
+
"+---------+--------+\n",
1316
+
"\n"
1317
+
]
1318
+
}
1319
+
],
1320
+
"source": [
1321
+
"# Define UDF to assign category label based on price\n",
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#working-with-arrays-made-easier-in-spark-3-5">6.15.4. Working with Arrays Made Easier in Spark 3.5</a></li>
<h2><spanclass="section-number">6.15.6. </span>Leverage Spark UDFs for Reusable Complex Logic in SQL Queries<aclass="headerlink" href="#leverage-spark-udfs-for-reusable-complex-logic-in-sql-queries" title="Permalink to this heading">#</a></h2>
<p>Spark UDFs (User-Defined Functions) can help address these issues by encapsulating complex logic that is reused across multiple SQL queries.</p>
1393
+
<p>In the code example above, we define a UDF <codeclass="docutils literal notranslate"><spanclass="pre">assign_category_label</span></code> that assigns category labels based on price. This UDF is then reused in two different SQL statements.</p>
1394
+
<divclass="cell docutils container">
1395
+
<divclass="cell_input docutils container">
1396
+
<divclass="highlight-ipython3 notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># Define UDF to assign category label based on price</span>
<divclass="output stderr highlight-myst-ansi notranslate"><divclass="highlight"><pre><span></span>24/04/15 09:28:11 WARN SimpleFunctionRegistry: The function assign_category_label replaced a previously registered function.
<liclass="toc-h2 nav-item toc-entry"><aclass="reference internal nav-link" href="#working-with-arrays-made-easier-in-spark-3-5">6.15.4. Working with Arrays Made Easier in Spark 3.5</a></li>
0 commit comments