Skip to content

Changed Mount demo to include steps to install lakeFS Enterprise also #242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 37 additions & 8 deletions 01_standalone_examples/lakefs-mount-demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

Start by ⭐️ starring [lakeFS open source](https://go.lakefs.io/oreilly-course) project.

This repository includes a Jupyter Notebook which you can run on your local machine.
This demo includes a Jupyter Notebook which you can run on your local machine.

## Prerequisites
* Docker installed on your local machine
* This demo requires connecting to a lakeFS Server. You can either install lakeFS Server locally (https://docs.lakefs.io/quickstart.html), or spin up for free on the lakeFS cloud (https://lakefs.cloud).
* Watch [this video](https://www.youtube.com/watch?v=BgKuoa8LAaU) to understand the use case as well as the demo.
* [Contact lakeFS](https://lakefs.io/contact-sales/) to get the lakeFS Everest binary for Linux x86_64 OS. Download and save the binary on your laptop.
* OPTIONAL: [Contact lakeFS](https://lakefs.io/contact-sales/) to get the token for Fluffy if you want to provision lakeFS Enterprise server.

## Setup

Expand All @@ -18,19 +18,48 @@ This repository includes a Jupyter Notebook which you can run on your local mach
git clone https://github.com/treeverse/lakeFS-samples && cd lakeFS-samples/01_standalone_examples/lakefs-mount-demo
```

2. Run following commands to download and run Docker container which includes Python, Hugging Face datasets library, Pytorch, Jupyter Notebook and lakeFS Python client (Docker image size is around 10GB):
2. You now have two options:

### **Run a Jupyter Notebook server with your existing lakeFS Server**

If you have already [installed lakeFS](https://docs.lakefs.io/deploy/) or are utilizing [lakeFS cloud](https://lakefs.cloud/), all you need to run is the Jupyter notebook server:

```bash
docker build -t lakefs-mount-demo .
docker compose up
```

Once you've finished, run the following to remove all the containers:

```bash
docker compose down
```

docker run -d -p 8892:8888 --privileged --user root -e GRANT_SUDO=yes -v $PWD:/home/jovyan -v $PWD/jupyter_notebook_config.py:/home/jovyan/.jupyter/jupyter_notebook_config.py --name lakefs-mount-demo lakefs-mount-demo
### **Don't have a lakeFS Server or Object Store?**

If you want to provision a lakeFS Enterprise server as well as MinIO for your object store, plus Jupyter then first login to [Treeverse Dockerhub](https://hub.docker.com/u/treeverse) by using the granted token so Fluffy proprietary image can be retrieved:

```bash
docker login -u externallakefs
```

3. Copy the Everest binary for Linux x86_64 OS on your laptop inside "lakeFS-samples/01_standalone_examples/lakefs-mount-demo" folder.
then bring up the full stack:
```bash
docker compose --profile local-lakefs-enterprise up
```

3. Copy the Everest binary for Linux x86_64 OS on your laptop inside

"lakeFS-samples/01_standalone_examples/lakefs-mount-demo" folder.

## URLs and login details

4. Open JupyterLab UI [http://127.0.0.1:8892/](http://127.0.0.1:8892/) in your web browser.
* JupyterLab UI http://localhost:8892/
* lakeFS Enterprise (if provisioned) http://localhost:8084/ (`AKIAIOSFOLKFSSAMPLES` / `wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY`)
* MinIO (if provisioned) http://localhost:9005/ (`minioadmin`/`minioadmin`)

## Demo Instructions

1. Once you have successfully completed setup then open "lakeFS Mount Demo" notebook from JupyterLab UI and follow the instructions.
Demo includes following 3 notebooks. Open any notebook from the JupyterLab UI and follow the instructions.
1. "lakeFS Mount Demo" notebook demonstrates how to mount lakeFS datasets on laptop or server as local filesystem.
1. "lakeFS Mount Demo with Git Integration" notebook demonstrates lakeFS Mount feature as well as how it integrates with Git. In this demo, Git is used to version control your code while lakeFS is used to version control your data and model.
1. "lakeFS Hugging Face Mount Demo" notebook demonstrates lakeFS Mount feature but uses Hugging Face dataset.
142 changes: 142 additions & 0 deletions 01_standalone_examples/lakefs-mount-demo/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
version: "3.5"
name: lakefs-mount-demo
services:
jupyter-notebook:
build: ./jupyter
privileged: true
environment:
# log-level is set to WARN because of noisy stdout problem
# -> See https://github.com/jupyter-server/jupyter_server/issues/1279
- NOTEBOOK_ARGS=--log-level='WARN' --NotebookApp.token='' --NotebookApp.password='' --notebook-dir=/root --allow-root
- NB_USER=root
- NB_UID=0
- NB_GID=0
- CHOWN_HOME=yes
working_dir: /root
ports:
- 8892:8888 # Jupyter
volumes:
- $PWD:/root
- $PWD/../../data/alpaca_training_imgs:/root/alpaca_training_imgs

lakefs:
image: treeverse/lakefs:1
pull_policy: always
ports:
- "8084:8000"
depends_on:
postgres:
condition: service_healthy
minio-setup:
condition: service_completed_successfully
environment:
- LAKEFS_BLOCKSTORE_TYPE=s3
- LAKEFS_BLOCKSTORE_S3_FORCE_PATH_STYLE=true
- LAKEFS_BLOCKSTORE_S3_ENDPOINT=http://minio:9000
- LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=minioadmin
- LAKEFS_BLOCKSTORE_S3_CREDENTIALS_SECRET_ACCESS_KEY=minioadmin
- LAKEFS_AUTH_ENCRYPT_SECRET_KEY=some random secret string
- LAKEFS_LOGGING_LEVEL=INFO
- LAKEFS_STATS_ENABLED=${LAKEFS_STATS_ENABLED:-1}
- LAKECTL_CREDENTIALS_ACCESS_KEY_ID=AKIAIOSFOLKFSSAMPLES
- LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
- LAKECTL_SERVER_ENDPOINT_URL=http://localhost:8000
- LAKEFS_DATABASE_TYPE=postgres
- LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING=postgres://lakefs:lakefs@postgres/postgres?sslmode=disable
- LAKEFS_AUTH_API_ENDPOINT=http://fluffy:9006/api/v1
- LAKEFS_AUTH_UI_CONFIG_RBAC=internal
entrypoint: ["/bin/sh", "-c"]
command:
- |
lakefs setup --user-name everything-bagel --access-key-id "$$LAKECTL_CREDENTIALS_ACCESS_KEY_ID" --secret-access-key "$$LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY" || true
lakefs run &
echo "---- Creating repository ----"
wait-for -t 60 lakefs:8000 -- curl -u "$$LAKECTL_CREDENTIALS_ACCESS_KEY_ID":"$$LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY" -X POST -H "Content-Type: application/json" -d '{ "name": "quickstart", "storage_namespace": "s3://quickstart", "default_branch": "main", "sample_data": true }' http://localhost:8000/api/v1/repositories || true
echo ""
wait-for -t 60 minio:9000 && echo '------------------------------------------------

MinIO admin: http://127.0.0.1:9005/

Username : minioadmin
Password : minioadmin
'
echo "------------------------------------------------"
wait-for -t 60 jupyter-notebook:8888 && echo '

Jupyter: http://127.0.0.1:8894/
'
echo "------------------------------------------------"
echo ""
echo "lakeFS Web UI: http://127.0.0.1:8084/ >(._.)<"
echo " ( )_ "
echo ""
echo " Access Key ID : $$LAKECTL_CREDENTIALS_ACCESS_KEY_ID"
echo " Secret Access Key: $$LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY"
echo ""
echo "-------- Let's go and have axolotl fun! --------"
echo ""
wait
profiles:
- local-lakefs-enterprise

minio-setup:
image: minio/mc:RELEASE.2023-05-18T16-59-00Z
environment:
- MC_HOST_lakefs=http://minioadmin:minioadmin@minio:9000
depends_on:
- minio
volumes:
- ../../data:/data
entrypoint: ["/bin/sh", "-c"]
command:
- |
mc mb lakefs/quickstart lakefs/example lakefs/sample-data
mc cp --recursive /data/* lakefs/sample-data 1>/dev/null # don't be so noisy 🤫
profiles:
- local-lakefs-enterprise

minio:
image: minio/minio:RELEASE.2023-05-18T00-05-36Z
ports:
- "9004:9000"
- "9005:9001"
entrypoint: ["minio", "server", "/data", "--console-address", ":9001"]
profiles:
- local-lakefs-enterprise

postgres:
image: postgres:14
ports:
- "5433:5432"
environment:
POSTGRES_USER: lakefs
POSTGRES_PASSWORD: lakefs
healthcheck:
test: ["CMD", "pg_isready", "-U", "lakefs"]
interval: 10s
retries: 5
start_period: 5s
profiles:
- local-lakefs-enterprise

fluffy:
image: "${FLUFFY_REPO:-treeverse}/fluffy:${TAG:-latest}"
command: "${COMMAND:-run}"
ports:
- "8085:8000"
- "9006:9000"
depends_on:
- "postgres"
environment:
- FLUFFY_LOGGING_LEVEL=INFO
- FLUFFY_DATABASE_TYPE=postgres
- FLUFFY_DATABASE_POSTGRES_CONNECTION_STRING=postgres://lakefs:lakefs@postgres/postgres?sslmode=disable
- FLUFFY_AUTH_ENCRYPT_SECRET_KEY=some random secret string
- FLUFFY_AUTH_SERVE_LISTEN_ADDRESS=0.0.0.0:9006
entrypoint: [ "/app/wait-for", "postgres:5432", "--", "/app/fluffy" ]
profiles:
- local-lakefs-enterprise

networks:
default:
name: bagel
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM jupyter/scipy-notebook:notebook-7.0.6
FROM jupyter/tensorflow-notebook:notebook-7.0.6

USER root

Expand Down

This file was deleted.

Loading
Loading