Skip to content

kubox-ai/notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploratory Data Analysis (EDA) using Notebooks in a Kubox Cluster

This repository contains a collection of Jupyter notebooks demonstrating exploratory data analysis techniques using Kubox AI.

Tip

Kubox is currently in its early-stage public preview and under active development. We’re continuously improving and refining the platform, so things may change as we grow. We welcome your feedback and suggestions to help shape the future of Kubox AI.

Introduction

In this examples, we showcase how deploying a data platform co-located with data (data locality) can reduce costs and time. Consider a scenario where you run a query to analyse the New York taxi 300GB dataset stored in AWS S3 from a Google Colab Notebook. This setup would result in USD 20 in AWS egress costs and take over an hour just to download the data. For more information see this blog post.

AWS Cloud Setup

Create an AWS IAM role for the EC2 Instances

If you are creating a GPU based kubox cluster outside ap-southeast-2, you will need to create an Amazon Machine Image (AMI) in your desired AWS region

Download and install kubox command line tool, a single binary required to create a Kubernetes based Kubox cluster.

Install Kubox CLI

curl https://kubox.sh | sh

Setup AWS CLI

aws configure

Tip

If you are new to Kubox see how to create your Hello World AWS Cluster

Basic EDA on NYC Taxi Dataset

https://docs.kubox.ai/examples/nyc-taxi

Clone the Kubox notebooks repository to your local machine:

git clone https://github.com/kubox-ai/notebooks.git

Create a Kubox cluster from the root of the cloned repository:

kubox create -f cluster-basic.yaml

If you have issues creating see the troubleshooting guide here: https://docs.kubox.ai/kb/troubleshooting

A kubeconfig file will be generated as part of the cluster creation process. Set it as an environment variable:

export KUBECONFIG=./basic/cluster/config/kubeconfig

Connect to the kubernetes cluster:

kubectl get pods -n kubox

Port forward the notebook server to your local machine:

kubectl port-forward service/notebook 8080:80 -n kubox

Open your browser and navigate to http://localhost:8080

Delete the cluster when done.

kubox delete -f cluster-basic.yaml

Local Development

Navigate to the basic or gpu directory

cd ./basic

Create a local python virtual environment and activate it. We are using pyenv to manage our python versions. You can use pyenv install to install a python version.

Set the current python version to 3.11.9

pyenv shell 3.11.9

Create a virtual environment

python -m venv .venv

Activate the virtual environment

source .venv/bin/activate

Install poetry

cd code
pip install poetry

Install dependencies

poetry install

Contributing

We welcome contributions! If you find a bug, have a feature request, or want to improve the notebooks, feel free to open an issue or submit a pull request.

License

This repository is licensed under the Apache License 2.0. You are free to use, modify, and distribute this project under the terms of the license. See the LICENSE file for more details.

Packages