You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Chapter5/spark.ipynb
+91-55Lines changed: 91 additions & 55 deletions
Original file line number
Diff line number
Diff line change
@@ -1099,7 +1099,7 @@
1099
1099
"id": "a2d96783",
1100
1100
"metadata": {},
1101
1101
"source": [
1102
-
"### PySpark SQL: Enhancing Reusability with Parameterized Queries"
1102
+
"### Writing Safer and Cleaner Spark SQL with PySpark's Parameterized Queries"
1103
1103
]
1104
1104
},
1105
1105
{
@@ -1117,120 +1117,156 @@
1117
1117
]
1118
1118
},
1119
1119
{
1120
-
"cell_type": "markdown",
1121
-
"id": "0ddc2bc2",
1120
+
"cell_type": "code",
1121
+
"execution_count": 12,
1122
+
"id": "8056b8af",
1122
1123
"metadata": {},
1124
+
"outputs": [],
1123
1125
"source": [
1124
-
"In PySpark, parametrized queries enable the same query structure to be reused with different inputs, without rewriting the SQL.\n",
1126
+
"from pyspark.sql import SparkSession\n",
1127
+
"import pandas as pd\n",
1128
+
"from datetime import date, timedelta\n",
1125
1129
"\n",
1126
-
"Additionally, they safeguard against SQL injection attacks by treating input data as parameters rather than as executable code."
1130
+
"spark = SparkSession.builder.getOrCreate()"
1127
1131
]
1128
1132
},
1129
1133
{
1130
-
"cell_type": "code",
1131
-
"execution_count": null,
1132
-
"id": "8056b8af",
1134
+
"cell_type": "markdown",
1135
+
"id": "0ddc2bc2",
1133
1136
"metadata": {},
1134
-
"outputs": [],
1135
1137
"source": [
1136
-
"from pyspark.sql import SparkSession\n",
1137
-
"import pandas as pd \n",
1138
+
"When working with Spark SQL queries, using regular Python string interpolation can lead to security vulnerabilities and require extra steps like creating temporary views. PySpark offers a better solution with parameterized queries, which:\n",
1138
1139
"\n",
1139
-
"spark = SparkSession.builder.getOrCreate()"
1140
+
"- Protect against SQL injection\n",
1141
+
"- Allow using DataFrame objects directly in queries\n",
1142
+
"- Automatically handle date formatting\n",
1143
+
"- Provide a more expressive way to write SQL queries\n",
1144
+
"\n",
1145
+
"Let's compare the traditional approach with parameterized queries:"
"This method allows for easy parameter substitution and direct use of DataFrames, making your Spark SQL queries both safer and more convenient to write and maintain."
0 commit comments