Lakevision is a tool that provides insights into your Data Lakehouse based on the Apache Iceberg table format.
It lists every namespace and table in your Lakehouse—along with each table’s schema, properties, snapshots, partitions, sort-orders, references/tags, and sample data—and supports nested namespaces. This helps you quickly understand data layout, file locations, and change history.
Lakevision is built with pyiceberg, a FastAPI
backend, and a SvelteKit
frontend, keeping other dependencies to a minimum.
lakevision4.mp4
- Search and view all namespaces in your Lakehouse
- Search and view all tables in your Lakehouse
- Display schema, properties, partition specs, and a summary of each table
- Show record count, file count, and size per partition
- List all snapshots with details
- Graphical summary of record additions over time
- OIDC/OAuth-based authentication support
- Pluggable authorization
- Optional “Chat with Lakehouse” capability
Before running Lakevision, you'll need to create and configure your local .env
file:
cp my.env .env
Then edit .env to provide values for:
-
Your Iceberg catalog configuration (URI, warehouse path, etc.)
🧪 Don’t have a catalog yet? You can start with a sample one. See make
make sample-catalog
in the Makefile section. -
Authentication details (e.g., token or credentials)
-
Optional cloud settings (S3, GCP, etc.)
This avoids modifying my.env
, which is version-controlled and serves as a template.
The easiest way to run Lakevision is with Docker.
-
Clone the repository and
cd
into the project root. -
Build the image
docker build -t lakevision:1.0 .
-
Run the container
Make sure you’ve completed the Environment Setup step first.
docker run --env-file .env -p 8081:8081 lakevision:1.0 /app/start.sh
Once started, the backend listens on port 8000 and Nginx runs on port 8081. Visit http://localhost:8081 to explore the UI.
✅ Tested on Linux and macOS with the Iceberg REST catalog. Other PyIceberg-compatible catalogs should work too.
🧪 Want to try the in-memory sample catalog?
To build the image with the sample in-memory Iceberg catalog included:
docker build --build-arg ENABLE_SAMPLE_CATALOG=true -t lakevision:1.0 .
- In your
.env
, comment out the default catalog settings and uncomment the sample catalog lines. - Then run the container as above
- Python 3.10+
- Node.js 18+
- A running Iceberg catalog
Make sure you’ve completed the Environment Setup step first.
You can use the Makefile to automate common setup steps:
make init-be # Set up Python backend
make sample-catalog # Populate a local Iceberg catalog with sample data
make init-fe # Install frontend dependencies
make run-be # Start backend (FastAPI)
make run-fe # Start frontend (SvelteKit)
make help # List all Makefile commands
Once running, visit http://localhost:8081 to use the app.
Make sure you’ve completed the Environment Setup step first.
💡 Frontend note: All environment variables that begin with
PUBLIC_
must be available in a separate.env
file inside the/fe
folder. You can do this manually, or by running:
make prepare-fe-env
This ensures the frontend build system (Vite) can access the variables during development.
cd be
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
set -a; source ../.env; set +a
PYTHONPATH=app uvicorn app.api:app --reload --port 8000
cd ../fe
npm install
npm run dev -- --port 8081
Implement your custom implementation module in the backend, must follow app/be/authz.py
- init: Authz class configuration
- has_access: Determines if user has access to a specific table
- get_namespace_special_properties: Provide specific namespace properties from the Authz point of view. E.g.: Namespace's owners.
- get_table_special_properties: Provide specific tble properties from the Authz point of view. E.g.: Table is restricted, table's owners, etc.
Configure the following properties in your environment.
- PUBLIC_AUTH_ENABLED=true
- PUBLIC_OPENID_CLIENT_ID=
- OPEN_ID_CLIENT_SECRET=
- PUBLIC_OPENID_PROVIDER_URL=
- PUBLIC_REDIRECT_URI=http://localhost:8081 #E.g. for local usage (or https://localhost:8081)
- AUTHZ_MODULE_NAME=my_authz
- AUTHZ_CLASS_NAME=MyAuthz
and run the be. E.g. make run-be
In case you need to run the frontend with https you can follow this simple steps:
-
Install a compatible plugin-basic-ssl to the vite version in the fe.
Add
"@vitejs/plugin-basic-ssl": "^1.2.0"
under devDependency in the package.json and install dependencies. Refers to: Running Locally section. -
Update the vite config (vite.config.js):
... import basicSsl from '@vitejs/plugin-basic-ssl'; export default defineConfig({ plugins: [ sveltekit(), // Optimize CSS from `carbon-components-svelte` when building for production. optimizeCss(), basicSsl() ], ...
This auto-generates a self-signed cert for dev. You’ll get a warning page you can bypass.
-
Run the frontend. E.g.:
make run-fe
Want to deploy Lakevision on Kubernetes or OpenShift?
Sample manifests are provided in k8s/
, including example Deployment
, Service
, ConfigMap
, and Secret
YAMLs for running the unified (backend + frontend) container.
- See
k8s/README.md
for quickstart instructions and customization notes. - You’ll need to edit the image name and environment variables before deploying.
- Chat with Lakehouse capability using an LLM
- Table-level reports (most snapshots, partitions, columns, size, etc.)
- Optimization recommendations
- Limited SQL capabilities ✅
- Partition details (name, file count, records, size) ✅
- Sample data by partition ✅
- Table-level insights
- Time-travel queries
Contributions are welcome!
- Fork the repository and clone it locally.
- Create a branch for your change, referencing an issue if one exists.
- Add tests for new functionality where appropriate.
- Open a pull request with a clear description of the changes.