Spark presentation and code for the Scala Meetup
Simple etl and ml examples. Use case is featurization of raw session data and classification of sessions as likely leading to a purchase (or not). Contrived but good enough for a demo. See links on last slide for help on spark and/or jupyter with spark.
Contents:
- spark_pres_jupyter_etl.ipynb : sample etl with map/reduce operations and Datasets
- spark_pres_jupyter_ml.ipynb : classification examples using contrived data
- spark_pres_slides.pptx : a few introductory slides
- various csvs : made up raw/featurized and train/test toy data for etl and classification