Spark on Kubernetes

Initializing Minikube

Read how to install Minikube: https://minikube.sigs.k8s.io/docs/start/

We have to install Docker as well https://docs.docker.com/engine/install/ubuntu/

Start Minikube:

$ sudo usermod -aG docker $USER
$ minikube start machine
$ minikube dashboard

Deploying Spark on Minikube

Build the Docker image:

$ minikube docker-env
$ docker build -t spark-hadoop:latest -f ./docker/Dockerfile ./docker

Create the deployments and services:

$ kubectl create -f ./kubernetes/spark-master-deployment.yaml
$ kubectl create -f ./kubernetes/spark-master-service.yaml
$ kubectl create -f ./kubernetes/spark-worker-deployment.yaml
$ minikube addons enable ingress
$ kubectl apply -f ./kubernetes/minikube-ingress.yaml

Add an entry to hosts:

$ echo "$(minikube ip) " | sudo tee -a /etc/hosts

Checking the new Spark cluster

Run on Spark to check that it works

val myWords = "HI HI HOW ARE YOU HAH"
val mySplit = myWords.split(" ").foldLeft(Map.empty[String, Int]) {
    (count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
}

Example notebook from Databricks

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. But it is way more then that!

This is a rewritten notebook example from this blog post by Databricks. The intension is to show why Delta Lake is a big deal and how to run Delta Lake without a Databricks services.

Delta Lake examples in this notebook:

Convert data to as Delta Lake format
Create Delta Lake table
Spark SQL capabilities
Delete data
Update data
View audit history of table
Merge (union) of two tables which remove duplicates, updates rows and add a new row

Generate

Generate py-files:

pip install -r requirements
python ipynb2py.py

Author

Example notebook here by

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spark on Kubernetes

Initializing Minikube

Deploying Spark on Minikube

Checking the new Spark cluster

Example notebook from Databricks

Generate

Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docker		docker
kubernetes		kubernetes
spark		spark
src		src
README.md		README.md
ipynb2py.py		ipynb2py.py
requirements		requirements

gevaland/hw-2_k8s-spark

Folders and files

Latest commit

History

Repository files navigation

Spark on Kubernetes

Initializing Minikube

Deploying Spark on Minikube

Checking the new Spark cluster

Example notebook from Databricks

Generate

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages