Skip to content

gevaland/hw-2_k8s-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark on Kubernetes

Initializing Minikube

Read how to install Minikube: https://minikube.sigs.k8s.io/docs/start/

We have to install Docker as well https://docs.docker.com/engine/install/ubuntu/

Start Minikube:

$ sudo usermod -aG docker $USER
$ minikube start machine
$ minikube dashboard

Deploying Spark on Minikube

Build the Docker image:

$ minikube docker-env
$ docker build -t spark-hadoop:latest -f ./docker/Dockerfile ./docker

Create the deployments and services:

$ kubectl create -f ./kubernetes/spark-master-deployment.yaml
$ kubectl create -f ./kubernetes/spark-master-service.yaml
$ kubectl create -f ./kubernetes/spark-worker-deployment.yaml
$ minikube addons enable ingress
$ kubectl apply -f ./kubernetes/minikube-ingress.yaml

Add an entry to hosts:

$ echo "$(minikube ip) " | sudo tee -a /etc/hosts

Checking the new Spark cluster

Run on Spark to check that it works

val myWords = "HI HI HOW ARE YOU HAH"
val mySplit = myWords.split(" ").foldLeft(Map.empty[String, Int]) {
    (count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
}

Example notebook from Databricks

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. But it is way more then that!

This is a rewritten notebook example from this blog post by Databricks. The intension is to show why Delta Lake is a big deal and how to run Delta Lake without a Databricks services.

Delta Lake examples in this notebook:

  • Convert data to as Delta Lake format
  • Create Delta Lake table
  • Spark SQL capabilities
  • Delete data
  • Update data
  • View audit history of table
  • Merge (union) of two tables which remove duplicates, updates rows and add a new row

Generate

Generate py-files:

pip install -r requirements
python ipynb2py.py

Author

Example notebook here by

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published