You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This explores data optimization using PySpark, covering big data concepts, pipelines, machine learning, and data transformation techniques. The goal is to enhance data processing efficiency and model performance.
Using SparkSQL, determined key metrics about home sales data. Then, I used Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.