From c990122f685e42fbd5c54e930c4ecdd646890176 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Fri, 4 Oct 2024 16:50:08 +0100 Subject: [PATCH 01/24] Remove virtual machines from setup/admin page --- docs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index ef0d82c4c..406edaf62 100644 --- a/docs/index.md +++ b/docs/index.md @@ -265,7 +265,7 @@ Hopsworks provides projects as a secure sandbox in which teams can collaborate a Hopsworks provides a FTI (feature/training/inference) pipeline architecture for ML systems. Each part of the pipeline is defined in a Hopsworks job which corresponds to a Jupyter notebook, a python script or a jar. The production pipelines are then orchestrated with Airflow which is bundled in Hopsworks. Hopsworks provides several python environments that can be used and customized for each part of the FTI pipeline, for example switching between using PyTorch or TensorFlow in the training pipeline. You can train models on as many GPUs as are installed in a Hopsworks cluster and easily share them among users. You can also run Spark, Spark Streaming, or Flink programs on Hopsworks. JupyterLab is also bundled which can be used to run Python and Spark interactively. ## Available on any Platform -Hopsworks is available as a both managed platform in the cloud on AWS, Azure, and GCP, and can be installed on any Linux-based virtual machines (Ubuntu/Redhat compatible), even in air-gapped data centers. Hopsworks is also available as a serverless platform that manages and serves both your features and models. +Hopsworks is available to be installed on a kubernetes cluster in the cloud on AWS, Azure, and GCP, and On-Prem (Ubuntu/Redhat compatible), even in air-gapped data centers. Hopsworks is also available as a serverless platform that manages and serves both your features and models. ## Join the community - Ask questions and give us feedback in the [Hopsworks Community](https://community.hopsworks.ai/) From 4f3e18412826d42d35aebe7730069717863e7c53 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Fri, 4 Oct 2024 17:03:57 +0100 Subject: [PATCH 02/24] Move content around for kubernetes --- .../common/adding_removing_workers.md | 0 .../common/api_key.md | 0 .../common/arrow_flight_duckdb.md | 0 .../common/autoscaling.md | 0 .../{setup_installation => }/common/backup.md | 0 .../common/dashboard.md | 0 docs/{setup_installation => }/common/rondb.md | 0 .../common/scalingup.md | 0 .../common/services.md | 0 .../common/settings.md | 0 .../common/sso/ldap.md | 0 .../common/sso/oauth.md | 0 .../common/terraform.md | 0 .../common/user_management.md | 0 docs/{ => setup_installation}/admin/alert.md | 0 .../admin/audit/audit-logs.md | 0 .../admin/audit/export-audit-logs.md | 0 docs/{ => setup_installation}/admin/auth.md | 0 .../admin/ha-dr/dr.md | 0 .../admin/ha-dr/ha.md | 0 .../admin/ha-dr/intro.md | 0 docs/{ => setup_installation}/admin/index.md | 0 .../admin/ldap/configure-krb.md | 0 .../admin/ldap/configure-ldap.md | 0 .../admin/ldap/configure-project-mapping.md | 0 .../admin/ldap/configure-server.md | 0 .../admin/monitoring/export-metrics.md | 0 .../admin/monitoring/grafana.md | 0 .../admin/monitoring/services-logs.md | 0 .../admin/oauth2/create-azure-client.md | 0 .../admin/oauth2/create-client.md | 0 .../admin/oauth2/create-okta-client.md | 0 .../{ => setup_installation}/admin/project.md | 0 .../admin/roleChaining.md | 0 .../admin/services.md | 0 docs/{ => setup_installation}/admin/user.md | 0 .../admin/variables.md | 0 mkdocs.yml | 116 +++++++----------- 38 files changed, 43 insertions(+), 73 deletions(-) rename docs/{setup_installation => }/common/adding_removing_workers.md (100%) rename docs/{setup_installation => }/common/api_key.md (100%) rename docs/{setup_installation => }/common/arrow_flight_duckdb.md (100%) rename docs/{setup_installation => }/common/autoscaling.md (100%) rename docs/{setup_installation => }/common/backup.md (100%) rename docs/{setup_installation => }/common/dashboard.md (100%) rename docs/{setup_installation => }/common/rondb.md (100%) rename docs/{setup_installation => }/common/scalingup.md (100%) rename docs/{setup_installation => }/common/services.md (100%) rename docs/{setup_installation => }/common/settings.md (100%) rename docs/{setup_installation => }/common/sso/ldap.md (100%) rename docs/{setup_installation => }/common/sso/oauth.md (100%) rename docs/{setup_installation => }/common/terraform.md (100%) rename docs/{setup_installation => }/common/user_management.md (100%) rename docs/{ => setup_installation}/admin/alert.md (100%) rename docs/{ => setup_installation}/admin/audit/audit-logs.md (100%) rename docs/{ => setup_installation}/admin/audit/export-audit-logs.md (100%) rename docs/{ => setup_installation}/admin/auth.md (100%) rename docs/{ => setup_installation}/admin/ha-dr/dr.md (100%) rename docs/{ => setup_installation}/admin/ha-dr/ha.md (100%) rename docs/{ => setup_installation}/admin/ha-dr/intro.md (100%) rename docs/{ => setup_installation}/admin/index.md (100%) rename docs/{ => setup_installation}/admin/ldap/configure-krb.md (100%) rename docs/{ => setup_installation}/admin/ldap/configure-ldap.md (100%) rename docs/{ => setup_installation}/admin/ldap/configure-project-mapping.md (100%) rename docs/{ => setup_installation}/admin/ldap/configure-server.md (100%) rename docs/{ => setup_installation}/admin/monitoring/export-metrics.md (100%) rename docs/{ => setup_installation}/admin/monitoring/grafana.md (100%) rename docs/{ => setup_installation}/admin/monitoring/services-logs.md (100%) rename docs/{ => setup_installation}/admin/oauth2/create-azure-client.md (100%) rename docs/{ => setup_installation}/admin/oauth2/create-client.md (100%) rename docs/{ => setup_installation}/admin/oauth2/create-okta-client.md (100%) rename docs/{ => setup_installation}/admin/project.md (100%) rename docs/{ => setup_installation}/admin/roleChaining.md (100%) rename docs/{ => setup_installation}/admin/services.md (100%) rename docs/{ => setup_installation}/admin/user.md (100%) rename docs/{ => setup_installation}/admin/variables.md (100%) diff --git a/docs/setup_installation/common/adding_removing_workers.md b/docs/common/adding_removing_workers.md similarity index 100% rename from docs/setup_installation/common/adding_removing_workers.md rename to docs/common/adding_removing_workers.md diff --git a/docs/setup_installation/common/api_key.md b/docs/common/api_key.md similarity index 100% rename from docs/setup_installation/common/api_key.md rename to docs/common/api_key.md diff --git a/docs/setup_installation/common/arrow_flight_duckdb.md b/docs/common/arrow_flight_duckdb.md similarity index 100% rename from docs/setup_installation/common/arrow_flight_duckdb.md rename to docs/common/arrow_flight_duckdb.md diff --git a/docs/setup_installation/common/autoscaling.md b/docs/common/autoscaling.md similarity index 100% rename from docs/setup_installation/common/autoscaling.md rename to docs/common/autoscaling.md diff --git a/docs/setup_installation/common/backup.md b/docs/common/backup.md similarity index 100% rename from docs/setup_installation/common/backup.md rename to docs/common/backup.md diff --git a/docs/setup_installation/common/dashboard.md b/docs/common/dashboard.md similarity index 100% rename from docs/setup_installation/common/dashboard.md rename to docs/common/dashboard.md diff --git a/docs/setup_installation/common/rondb.md b/docs/common/rondb.md similarity index 100% rename from docs/setup_installation/common/rondb.md rename to docs/common/rondb.md diff --git a/docs/setup_installation/common/scalingup.md b/docs/common/scalingup.md similarity index 100% rename from docs/setup_installation/common/scalingup.md rename to docs/common/scalingup.md diff --git a/docs/setup_installation/common/services.md b/docs/common/services.md similarity index 100% rename from docs/setup_installation/common/services.md rename to docs/common/services.md diff --git a/docs/setup_installation/common/settings.md b/docs/common/settings.md similarity index 100% rename from docs/setup_installation/common/settings.md rename to docs/common/settings.md diff --git a/docs/setup_installation/common/sso/ldap.md b/docs/common/sso/ldap.md similarity index 100% rename from docs/setup_installation/common/sso/ldap.md rename to docs/common/sso/ldap.md diff --git a/docs/setup_installation/common/sso/oauth.md b/docs/common/sso/oauth.md similarity index 100% rename from docs/setup_installation/common/sso/oauth.md rename to docs/common/sso/oauth.md diff --git a/docs/setup_installation/common/terraform.md b/docs/common/terraform.md similarity index 100% rename from docs/setup_installation/common/terraform.md rename to docs/common/terraform.md diff --git a/docs/setup_installation/common/user_management.md b/docs/common/user_management.md similarity index 100% rename from docs/setup_installation/common/user_management.md rename to docs/common/user_management.md diff --git a/docs/admin/alert.md b/docs/setup_installation/admin/alert.md similarity index 100% rename from docs/admin/alert.md rename to docs/setup_installation/admin/alert.md diff --git a/docs/admin/audit/audit-logs.md b/docs/setup_installation/admin/audit/audit-logs.md similarity index 100% rename from docs/admin/audit/audit-logs.md rename to docs/setup_installation/admin/audit/audit-logs.md diff --git a/docs/admin/audit/export-audit-logs.md b/docs/setup_installation/admin/audit/export-audit-logs.md similarity index 100% rename from docs/admin/audit/export-audit-logs.md rename to docs/setup_installation/admin/audit/export-audit-logs.md diff --git a/docs/admin/auth.md b/docs/setup_installation/admin/auth.md similarity index 100% rename from docs/admin/auth.md rename to docs/setup_installation/admin/auth.md diff --git a/docs/admin/ha-dr/dr.md b/docs/setup_installation/admin/ha-dr/dr.md similarity index 100% rename from docs/admin/ha-dr/dr.md rename to docs/setup_installation/admin/ha-dr/dr.md diff --git a/docs/admin/ha-dr/ha.md b/docs/setup_installation/admin/ha-dr/ha.md similarity index 100% rename from docs/admin/ha-dr/ha.md rename to docs/setup_installation/admin/ha-dr/ha.md diff --git a/docs/admin/ha-dr/intro.md b/docs/setup_installation/admin/ha-dr/intro.md similarity index 100% rename from docs/admin/ha-dr/intro.md rename to docs/setup_installation/admin/ha-dr/intro.md diff --git a/docs/admin/index.md b/docs/setup_installation/admin/index.md similarity index 100% rename from docs/admin/index.md rename to docs/setup_installation/admin/index.md diff --git a/docs/admin/ldap/configure-krb.md b/docs/setup_installation/admin/ldap/configure-krb.md similarity index 100% rename from docs/admin/ldap/configure-krb.md rename to docs/setup_installation/admin/ldap/configure-krb.md diff --git a/docs/admin/ldap/configure-ldap.md b/docs/setup_installation/admin/ldap/configure-ldap.md similarity index 100% rename from docs/admin/ldap/configure-ldap.md rename to docs/setup_installation/admin/ldap/configure-ldap.md diff --git a/docs/admin/ldap/configure-project-mapping.md b/docs/setup_installation/admin/ldap/configure-project-mapping.md similarity index 100% rename from docs/admin/ldap/configure-project-mapping.md rename to docs/setup_installation/admin/ldap/configure-project-mapping.md diff --git a/docs/admin/ldap/configure-server.md b/docs/setup_installation/admin/ldap/configure-server.md similarity index 100% rename from docs/admin/ldap/configure-server.md rename to docs/setup_installation/admin/ldap/configure-server.md diff --git a/docs/admin/monitoring/export-metrics.md b/docs/setup_installation/admin/monitoring/export-metrics.md similarity index 100% rename from docs/admin/monitoring/export-metrics.md rename to docs/setup_installation/admin/monitoring/export-metrics.md diff --git a/docs/admin/monitoring/grafana.md b/docs/setup_installation/admin/monitoring/grafana.md similarity index 100% rename from docs/admin/monitoring/grafana.md rename to docs/setup_installation/admin/monitoring/grafana.md diff --git a/docs/admin/monitoring/services-logs.md b/docs/setup_installation/admin/monitoring/services-logs.md similarity index 100% rename from docs/admin/monitoring/services-logs.md rename to docs/setup_installation/admin/monitoring/services-logs.md diff --git a/docs/admin/oauth2/create-azure-client.md b/docs/setup_installation/admin/oauth2/create-azure-client.md similarity index 100% rename from docs/admin/oauth2/create-azure-client.md rename to docs/setup_installation/admin/oauth2/create-azure-client.md diff --git a/docs/admin/oauth2/create-client.md b/docs/setup_installation/admin/oauth2/create-client.md similarity index 100% rename from docs/admin/oauth2/create-client.md rename to docs/setup_installation/admin/oauth2/create-client.md diff --git a/docs/admin/oauth2/create-okta-client.md b/docs/setup_installation/admin/oauth2/create-okta-client.md similarity index 100% rename from docs/admin/oauth2/create-okta-client.md rename to docs/setup_installation/admin/oauth2/create-okta-client.md diff --git a/docs/admin/project.md b/docs/setup_installation/admin/project.md similarity index 100% rename from docs/admin/project.md rename to docs/setup_installation/admin/project.md diff --git a/docs/admin/roleChaining.md b/docs/setup_installation/admin/roleChaining.md similarity index 100% rename from docs/admin/roleChaining.md rename to docs/setup_installation/admin/roleChaining.md diff --git a/docs/admin/services.md b/docs/setup_installation/admin/services.md similarity index 100% rename from docs/admin/services.md rename to docs/setup_installation/admin/services.md diff --git a/docs/admin/user.md b/docs/setup_installation/admin/user.md similarity index 100% rename from docs/admin/user.md rename to docs/setup_installation/admin/user.md diff --git a/docs/admin/variables.md b/docs/setup_installation/admin/variables.md similarity index 100% rename from docs/admin/variables.md rename to docs/setup_installation/admin/variables.md diff --git a/mkdocs.yml b/mkdocs.yml index 127bf13b0..b1bbc90f3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -204,85 +204,55 @@ nav: - Vector Database: user_guides/mlops/vector_database/index.md - Migration: - 2.X to 3.0: user_guides/migration/30_migration.md - - Setup and Installation: + - Setup and Administration: - setup_installation/index.md - Client Installation: - user_guides/client_installation/index.md - - AWS: - - Getting Started: setup_installation/aws/getting_started.md - - Cluster Creation: setup_installation/aws/cluster_creation.md - - EKS integration: setup_installation/aws/eks_ecr_integration.md - - Limiting Permissions: setup_installation/aws/restrictive_permissions.md - - Custom Domain: setup_installation/aws/custom_domain_name.md - - Cluster upgrade: - - Version 3.0 or newer: setup_installation/aws/upgrade_3.0.md - - Version 2.4 or newer: setup_installation/aws/upgrade_2.4.md - - Version 2.2 or older: setup_installation/aws/upgrade.md - - Troubleshooting: setup_installation/aws/troubleshooting.md - - Azure: - - Getting Started: setup_installation/azure/getting_started.md - - Cluster Creation: setup_installation/azure/cluster_creation.md - - AKS integration: setup_installation/azure/aks_acr_integration.md - - Limiting Permissions: setup_installation/azure/restrictive_permissions.md - - Cluster upgrade: - - Version 3.0 or newer: setup_installation/azure/upgrade_3.0.md - - Version 2.4 or newer: setup_installation/azure/upgrade_2.4.md - - Version 2.2 or older: setup_installation/azure/upgrade.md - - GCP: - - Getting Started: setup_installation/gcp/getting_started.md - - Cluster Creation: setup_installation/gcp/cluster_creation.md - - Limiting Permissions: setup_installation/gcp/restrictive_permissions.md - - GKE integration: setup_installation/gcp/gke_integration.md - - Common: - - The dashboard: setup_installation/common/dashboard.md - - Settings: setup_installation/common/settings.md - - Services: setup_installation/common/services.md - - Adding and Removing workers: setup_installation/common/adding_removing_workers.md - - Autoscaling: setup_installation/common/autoscaling.md - - Backup: setup_installation/common/backup.md - - Scaling up: setup_installation/common/scalingup.md - - User management: setup_installation/common/user_management.md - - Managed RonDB: setup_installation/common/rondb.md - - ArrowFlight Server with DuckDB: setup_installation/common/arrow_flight_duckdb.md - - Single Sign On: - - Hopsworks: - - OAuth2: setup_installation/common/sso/oauth.md - - LDAP: setup_installation/common/sso/ldap.md - - API Key: setup_installation/common/api_key.md - - Terraform: setup_installation/common/terraform.md + - Cloud Installation: + - AWS - Getting Started: setup_installation/aws/getting_started.md + - Azure - Getting Started: setup_installation/azure/getting_started.md + - GCP - Getting Started: setup_installation/gcp/getting_started.md - On-Prem: - Hopsworks Installer: setup_installation/on_prem/hopsworks_installer.md - External Kafka cluster: setup_installation/on_prem/external_kafka_cluster.md - - Administration: - - Introduction: admin/index.md - - Cluster Configuration: admin/variables.md - - User Management: admin/user.md - - Project Management: admin/project.md - - Configure Alerts: admin/alert.md - - Manage Services: admin/services.md - - IAM Role Chaining: admin/roleChaining.md - - Monitoring: - - Services Dashboards: admin/monitoring/grafana.md - - Export metrics: admin/monitoring/export-metrics.md - - Services Logs: admin/monitoring/services-logs.md - - Authentication: - - Configure Authentication: admin/auth.md - - Configure OAuth2: - - Register an Identity Provider: admin/oauth2/create-client.md - - Create Okta Client: admin/oauth2/create-okta-client.md - - Create Azure Client: admin/oauth2/create-azure-client.md - - Configure LDAP/Kerberos: - - Configure LDAP: admin/ldap/configure-ldap.md - - Configure Kerberos: admin/ldap/configure-krb.md - - Configure server for LDAP and Kerberos: admin/ldap/configure-server.md - - Configure Project Mapping: admin/ldap/configure-project-mapping.md - - High availability / Disaster Recovery: - - Overview: admin/ha-dr/intro.md - - High Availability: admin/ha-dr/ha.md - - Disaster Recovery: admin/ha-dr/dr.md - - Audit: - - Access Audit Logs: admin/audit/audit-logs.md - - Export Audit Logs: admin/audit/export-audit-logs.md + - Administration: + - Cluster Configuration: setup_installation/admin/variables.md + - Introduction: setup_installation/admin/index.md + - User Management: setup_installation/admin/user.md + - Project Management: setup_installation/admin/project.md + - Configure Alerts: setup_installation/admin/alert.md + - Manage Services: setup_installation/admin/services.md + - IAM Role Chaining: setup_installation/admin/roleChaining.md + - Monitoring: + - Services Dashboards: setup_installation/admin/monitoring/grafana.md + - Export metrics: setup_installation/admin/monitoring/export-metrics.md + - Services Logs: setup_installation/admin/monitoring/services-logs.md + - Authentication: + - Configure Authentication: setup_installation/admin/auth.md + - Configure OAuth2: + - Register an Identity Provider: setup_installation/admin/oauth2/create-client.md + - Create Okta Client: setup_installation/admin/oauth2/create-okta-client.md + - Create Azure Client: setup_installation/admin/oauth2/create-azure-client.md + - Configure LDAP/Kerberos: + - Configure LDAP: setup_installation/admin/ldap/configure-ldap.md + - Configure Kerberos: setup_installation/admin/ldap/configure-krb.md + - Configure server for LDAP and Kerberos: setup_installation/admin/ldap/configure-server.md + - Configure Project Mapping: setup_installation/admin/ldap/configure-project-mapping.md + - High availability / Disaster Recovery: + - Overview: setup_installation/admin/ha-dr/intro.md + - High Availability: setup_installation/admin/ha-dr/ha.md + - Disaster Recovery: setup_installation/admin/ha-dr/dr.md + - Audit: + - Access Audit Logs: setup_installation/admin/audit/audit-logs.md + - Export Audit Logs: setup_installation/admin/audit/export-audit-logs.md + - Managed: + - The dashboard: common/dashboard.md + - Settings: common/settings.md + - Services: common/services.md + - Adding and Removing workers: common/adding_removing_workers.md + - Autoscaling: common/autoscaling.md + - Backup: common/backup.md + - Scaling up: common/scalingup.md - : https://docs.hopsworks.ai - Community ↗: https://community.hopsworks.ai/ From 71fe59500784e8c4144979620d9e212ebf9b1815 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Fri, 4 Oct 2024 17:05:19 +0100 Subject: [PATCH 03/24] Change VM content to kubernetes in index.md --- docs/setup_installation/index.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/setup_installation/index.md b/docs/setup_installation/index.md index 0c5594768..fe4b9544d 100644 --- a/docs/setup_installation/index.md +++ b/docs/setup_installation/index.md @@ -1,12 +1,12 @@ -# Setup and Installation +# Setup and Administration -This section contains installation guides for the **Hopsworks Platform**, on +This section contains installation guides for the **Hopsworks Platform** using kubernetes, on - [AWS](aws/getting_started.md) - [Azure](azure/getting_started.md) - [GCP](gcp/getting_started.md) - [On-Prem](on_prem/hopsworks_installer.md) environments -and [common](common/dashboard.md) setup instructions. +and [common](admin/index.md) administration instructions. -For instructions on installing the **Hopsworks Client** libraries, see the [Client Installation](../user_guides/client_installation/index.md) guide. \ No newline at end of file +For instructions on installing the **Hopsworks Client** libraries, see the [Client Installation](../user_guides/client_installation/index.md) guide. From 1cf7922637ad5713676b51c780a20a9130318882 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Fri, 4 Oct 2024 17:38:08 +0100 Subject: [PATCH 04/24] Fix broken assets after admin move. --- docs/setup_installation/admin/alert.md | 18 +++++++-------- docs/setup_installation/admin/auth.md | 4 ++-- .../admin/ldap/configure-krb.md | 8 +++---- .../admin/ldap/configure-ldap.md | 8 +++---- .../admin/ldap/configure-project-mapping.md | 18 +++++++-------- .../admin/ldap/configure-server.md | 4 ++-- .../admin/monitoring/grafana.md | 6 ++--- .../admin/monitoring/services-logs.md | 4 ++-- .../admin/oauth2/create-azure-client.md | 22 +++++++++---------- .../admin/oauth2/create-client.md | 8 +++---- .../admin/oauth2/create-okta-client.md | 14 ++++++------ docs/setup_installation/admin/project.md | 4 ++-- docs/setup_installation/admin/roleChaining.md | 8 +++---- docs/setup_installation/admin/services.md | 8 +++---- docs/setup_installation/admin/user.md | 18 +++++++-------- docs/setup_installation/admin/variables.md | 4 ++-- 16 files changed, 78 insertions(+), 78 deletions(-) diff --git a/docs/setup_installation/admin/alert.md b/docs/setup_installation/admin/alert.md index eaa1e3ab4..4c9962b04 100644 --- a/docs/setup_installation/admin/alert.md +++ b/docs/setup_installation/admin/alert.md @@ -14,7 +14,7 @@ Cluster Settings from the dropdown menu. In the Cluster Settings' Alerts tab you manager to send alerts via email, slack or pagerduty.
- Configure alerts + Configure alerts
Configure alerts
@@ -23,7 +23,7 @@ To send alerts via email you need to configure an SMTP server. Click on the _Con button on the left side of the **email** row and fill out the form that pops up.
- Configure Email Alerts + Configure Email Alerts
Configure Email Alerts
@@ -34,7 +34,7 @@ button on the left side of the **email** row and fill out the form that pops up. CRAM-MD5, LOGIN or PLAIN. Optionally cluster wide Email alert receivers can be added in _Default receiver emails_. -These receivers will be available to all users when they create event triggered [alerts](../../user_guides/fs/feature_group/data_validation_best_practices#setup-alerts). +These receivers will be available to all users when they create event triggered [alerts](../../../user_guides/fs/feature_group/data_validation_best_practices#setup-alerts). ### Step 3: Configure Slack Alerts Alerts can also be sent via Slack messages. To be able to send Slack messages you first need to configure @@ -42,19 +42,19 @@ a Slack webhook. Click on the _Configure_ button on the left side of the **slack [Slack webhook](https://api.slack.com/messaging/webhooks) in _Webhook_.
- Configure slack Alerts + Configure slack Alerts
Configure slack Alerts
Optionally cluster wide Slack alert receivers can be added in _Slack channel/user_. -These receivers will be available to all users when they create event triggered [alerts](../../user_guides/fs/feature_group/data_validation_best_practices/#setup-alerts). +These receivers will be available to all users when they create event triggered [alerts](../../../user_guides/fs/feature_group/data_validation_best_practices/#setup-alerts). ### Step 4: Configure Pagerduty Alerts Pagerduty is another way you can send alerts from Hopsworks. Click on the _Configure_ button on the left side of the **pagerduty** row and fill out the form that pops up.
- Configure Pagerduty Alerts + Configure Pagerduty Alerts
Configure Pagerduty Alerts
@@ -76,7 +76,7 @@ If you are familiar with Prometheus' [Alert manager](https://prometheus.io/docs/ you can also configure alerts by editing the _yaml/json_ file directly.
- Advanced configuration + Advanced configuration
Advanced configuration
@@ -93,7 +93,7 @@ global: ... ``` -To test the alerts by creating triggers from Jobs and Feature group validations see [Alerts](../../user_guides/fs/feature_group/data_validation_best_practices/#setup-alerts). +To test the alerts by creating triggers from Jobs and Feature group validations see [Alerts](../../../user_guides/fs/feature_group/data_validation_best_practices/#setup-alerts). The yaml syntax in the UI is slightly different in that it does not allow double quotes (it will ignore the values but give no error). Below is an example configuration, that can be used in the UI, with both email and slack receivers configured for system alerts. @@ -144,4 +144,4 @@ receivers: ``` ## Conclusion -In this guide you learned how to configure alerts in Hopsworks. \ No newline at end of file +In this guide you learned how to configure alerts in Hopsworks. diff --git a/docs/setup_installation/admin/auth.md b/docs/setup_installation/admin/auth.md index 20b6c57c2..abedd6d7f 100644 --- a/docs/setup_installation/admin/auth.md +++ b/docs/setup_installation/admin/auth.md @@ -35,7 +35,7 @@ In the **Cluster Settings** _Authentication_ tab you can configure how users aut [Configure LDAP](../ldap/configure-ldap) and [Configure Kerberos](../ldap/configure-krb).
- Authentication config + Authentication config
Setup Authentication Methods
@@ -43,4 +43,4 @@ In the figure above we see a cluster with Two-factor authentication disabled, OA identity provider and LDAP authentication enabled. ## Conclusion -In this guide you learned how to configure authentication methods in Hopsworks. \ No newline at end of file +In this guide you learned how to configure authentication methods in Hopsworks. diff --git a/docs/setup_installation/admin/ldap/configure-krb.md b/docs/setup_installation/admin/ldap/configure-krb.md index 8b7a8ed41..cc5b20e81 100644 --- a/docs/setup_installation/admin/ldap/configure-krb.md +++ b/docs/setup_installation/admin/ldap/configure-krb.md @@ -19,7 +19,7 @@ If LDAP/Kerberos checkbox is not checked, make sure that you configured your app clicking on the checkbox.
- Authentication config + Authentication config
Setup Authentication Methods
@@ -27,7 +27,7 @@ clicking on the checkbox. Finally, click on edit configuration and fill in the attributes.
- Kerberos config + Kerberos config
Configure Kerberos
@@ -59,7 +59,7 @@ All defaults are taken from [OpenLDAP](https://www.openldap.org/). The login page will now have the choice to use Kerberos for authentication.
- Log in using Kerberos + Log in using Kerberos
Log in using Kerberos
@@ -69,4 +69,4 @@ The login page will now have the choice to use Kerberos for authentication. Kerberos support must also be configured on the browser to use Kerberos for authentication. ## Conclusion -In this guide you learned how to configure Kerberos for authentication. \ No newline at end of file +In this guide you learned how to configure Kerberos for authentication. diff --git a/docs/setup_installation/admin/ldap/configure-ldap.md b/docs/setup_installation/admin/ldap/configure-ldap.md index 272de44a0..cda815d19 100644 --- a/docs/setup_installation/admin/ldap/configure-ldap.md +++ b/docs/setup_installation/admin/ldap/configure-ldap.md @@ -20,7 +20,7 @@ If LDAP/Kerberos checkbox is not checked make sure that you configured your appl clicking on the checkbox.
- Authentication config + Authentication config
Setup Authentication Methods
@@ -28,7 +28,7 @@ clicking on the checkbox. Finally, click on edit configuration and fill in the attributes.
- LDAP config + LDAP config
Configure LDAP
@@ -53,7 +53,7 @@ All defaults are taken from [OpenLDAP](https://www.openldap.org/). The login page will now have the choice to use LDAP for authentication.
- Log in using LDAP + Log in using LDAP
Log in using LDAP
@@ -65,4 +65,4 @@ The login page will now have the choice to use LDAP for authentication. ## Conclusion -In this guide you learned how to configure LDAP for authentication. \ No newline at end of file +In this guide you learned how to configure LDAP for authentication. diff --git a/docs/setup_installation/admin/ldap/configure-project-mapping.md b/docs/setup_installation/admin/ldap/configure-project-mapping.md index 34ce48f0d..fdeca6231 100644 --- a/docs/setup_installation/admin/ldap/configure-project-mapping.md +++ b/docs/setup_installation/admin/ldap/configure-project-mapping.md @@ -21,16 +21,16 @@ corner of the navigation bar and choosing *Cluster Settings* from the dropdown m In the _Project mapping_ tab, you can create a new mapping by clicking on _Create new mapping_.
- - Project mapping tab + + Project mapping tab
Project mapping
This will take you to the create mapping page shown below
- - Create mapping + + Create mapping
Create mapping
@@ -41,8 +41,8 @@ You can also choose the _Project role_ users will be assigned when they are adde Finally, click on _Create mapping_ and go back to mappings. You should see the newly created mapping(s) as shown below.
- - Project mappings + + Project mappings
Project mappings
@@ -63,8 +63,8 @@ From the list of mappings click on the edit button (:material-pencil:). This wil the _remote group_, _project name_, and _project role_ of a mapping.
- - Edit mapping + + Edit mapping
Edit mapping
@@ -88,4 +88,4 @@ add or remove users from projects. ## Conclusion -In this guide you learned how to configure LDAP group to project mapping. \ No newline at end of file +In this guide you learned how to configure LDAP group to project mapping. diff --git a/docs/setup_installation/admin/ldap/configure-server.md b/docs/setup_installation/admin/ldap/configure-server.md index 52d15ddb1..501b3af09 100644 --- a/docs/setup_installation/admin/ldap/configure-server.md +++ b/docs/setup_installation/admin/ldap/configure-server.md @@ -39,7 +39,7 @@ An already deployed instance can be configured to connect to LDAP without re-run Go to the payara admin UI and create a new JNDI external resource. The name of the resource should be __ldap/LdapResource__.
- LDAP Resource + LDAP Resource
LDAP Resource
@@ -95,4 +95,4 @@ Both Kerberos and LDAP attributes need to be specified to configure Kerberos. Th Initiator should be set to false. ## Conclusion -In this guide you learned how to configure the application server for LDAP and Kerberos. \ No newline at end of file +In this guide you learned how to configure the application server for LDAP and Kerberos. diff --git a/docs/setup_installation/admin/monitoring/grafana.md b/docs/setup_installation/admin/monitoring/grafana.md index c06a9370b..f915affc9 100644 --- a/docs/setup_installation/admin/monitoring/grafana.md +++ b/docs/setup_installation/admin/monitoring/grafana.md @@ -17,7 +17,7 @@ You can access the admin page of your Hopsworks cluster by clicking on your name You can then navigate to the _Monitoring_ tab. The _Monitoring_ tab gives you access to several of the observability tools that are already deployed to help you manage the health of the cluster.
- monitoring tab + monitoring tab
Monitoring tab
@@ -36,7 +36,7 @@ Dashboards are organized into three folders: - **Kubernetes**: If you have integrated Hopsworks with a Kubernetes cluster, this folder contains the dashboards to monitor the health of the Kubernetes cluster.
- Grafana view + Grafana view
Grafana view
@@ -50,4 +50,4 @@ The default dashboards are read only and cannot be edited. Additional dashboards In this guide you learned how to access the Grafana dashboards to monitor the health and performance of the Hopsworks services. -You can find additional documentation on Grafana itself at: [https://grafana.com/docs/](https://grafana.com/docs/) \ No newline at end of file +You can find additional documentation on Grafana itself at: [https://grafana.com/docs/](https://grafana.com/docs/) diff --git a/docs/setup_installation/admin/monitoring/services-logs.md b/docs/setup_installation/admin/monitoring/services-logs.md index 09ca46dad..4ccdafab5 100644 --- a/docs/setup_installation/admin/monitoring/services-logs.md +++ b/docs/setup_installation/admin/monitoring/services-logs.md @@ -17,7 +17,7 @@ You can access the admin page of your Hopsworks cluster by clicking on your name You can then navigate to the _Monitoring_ tab. The _Monitoring_ tab gives you access to several of the observability tools that are already deployed to help you manage the health of the cluster.
- monitoring tab + monitoring tab
Monitoring tab
@@ -32,7 +32,7 @@ You can filter the logs of a specific service by searching for the term `service Currently only the logs of the following services are collected and indexed: Hopsworks web application (called `domain1` in the log entries), namenodes, resource managers, datanodes, nodemanagers, Kafka brokers, Hive services and RonDB. These are the core component of the platform, additional logs will be added in the future.
- OpenSearch Dashboards with services logs + OpenSearch Dashboards with services logs
OpenSearch Dashboards displaying the logs
diff --git a/docs/setup_installation/admin/oauth2/create-azure-client.md b/docs/setup_installation/admin/oauth2/create-azure-client.md index f0112e1ad..beb4c513a 100644 --- a/docs/setup_installation/admin/oauth2/create-azure-client.md +++ b/docs/setup_installation/admin/oauth2/create-azure-client.md @@ -15,7 +15,7 @@ Navigate to the [Microsoft Azure Portal](https://portal.azure.com) and authentic

- Create application + Create application
Create application

@@ -24,7 +24,7 @@ Enter a name for the client such as *hopsworks_oauth_client*. Verify the Support

- Name application + Name application
Name application

@@ -35,7 +35,7 @@ In the Overview section, copy the *Application (client) ID field*. We will use i

- Copy client ID + Copy client ID
Copy client ID

@@ -45,7 +45,7 @@ We will use it in [Identity Provider registration](../create-client) under the n

- Endpoint + Endpoint
Endpoint

@@ -54,7 +54,7 @@ Click on *Certificates & secrets*, then Click on *New client secret*.

- New client secret + New client secret
New client secret

@@ -63,7 +63,7 @@ Add a *description* of the secret. Select an expiration period. And, Click *Add*

- Client secret creation + Client secret creation
Client secret creation

@@ -73,7 +73,7 @@ Copy the secret. This will be used in [Identity Provider registration](../create

- Client secret creation + Client secret creation
Client secret creation

@@ -82,7 +82,7 @@ Click on *Authentication*. Then click on *Add a platform*

- Add a platform + Add a platform
Add a platform

@@ -91,7 +91,7 @@ In *Configure platforms* click on *Web*.

- Configure platform: Web + Configure platform: Web
Configure platform: Web

@@ -100,7 +100,7 @@ Enter the *Redirect URI* and click on *Configure*. The redirect URI is *HOPSWORK

- Configure platform: Redirect + Configure platform: Redirect
Configure platform: Redirect

@@ -114,4 +114,4 @@ Enter the *Redirect URI* and click on *Configure*. The redirect URI is *HOPSWORK ## Conclusion In this guide you learned how to create a client in your Azure identity provider and -acquire a _client id_ and a _client secret_. \ No newline at end of file +acquire a _client id_ and a _client secret_. diff --git a/docs/setup_installation/admin/oauth2/create-client.md b/docs/setup_installation/admin/oauth2/create-client.md index cd00fabe7..8199a545c 100644 --- a/docs/setup_installation/admin/oauth2/create-client.md +++ b/docs/setup_installation/admin/oauth2/create-client.md @@ -17,7 +17,7 @@ in the login page as an alternative login method) and set the _client id_ and _c fields, as shown in the figure below.
- Application overview + Application overview
Application overview
@@ -41,7 +41,7 @@ top right corner of the navigation bar and choosing *Cluster Settings* from the Settings* _Configuration_ tab search for _oauth\_group\_mapping_ and click on the edit button.
- Set variables + Set variables
Set Configuration variables
@@ -61,7 +61,7 @@ Users will now see a new button on the login page. The button has the name you s redirect to your identity provider.
- OAuth2 login + OAuth2 login
Login with OAuth2
@@ -73,4 +73,4 @@ redirect to your identity provider. `https://dev-86723251.okta.com/.well-known/openid-configuration`. ## Conclusion -In this guide you learned how to register an identity provider in Hopsworks. \ No newline at end of file +In this guide you learned how to register an identity provider in Hopsworks. diff --git a/docs/setup_installation/admin/oauth2/create-okta-client.md b/docs/setup_installation/admin/oauth2/create-okta-client.md index ce3986300..9eea97c62 100644 --- a/docs/setup_installation/admin/oauth2/create-okta-client.md +++ b/docs/setup_installation/admin/oauth2/create-okta-client.md @@ -11,14 +11,14 @@ Okta development account. To create a developer account go to [Okta developer](h After creating a developer account register a client by going to _Applications_ and click on **Create App Integration**.
- Okta Applications + Okta Applications
Okta Applications
This will open a popup as shown in the figure below. Select **OIDC** as _Sign-in-method_ and **Web Application** as _Application type_ and click next.
- Create New Application + Create New Application
Create new Application
@@ -27,7 +27,7 @@ that is your Hopsworks cluster domain name (including the port number if needed) redirect URI_ that is Hopsworks cluster domain name (including the port number if needed) with no path.
- New Application + New Application
New Application
@@ -35,7 +35,7 @@ If you want to limit who can access your Hopsworks cluster select _Limit access select group(s) you want to give access to. Here we will allow everyone in the organization to access the cluster.
- Group assignment + Group assignment
Group assignment
@@ -48,7 +48,7 @@ filter_ add **groups** as the claim name, select **Match Regex** from the dropdo match all groups. See [Group mapping](../create-client/#group-mapping) on how to do the mapping in Hopsworks.
- Group claim + Group claim
Group claim
@@ -58,7 +58,7 @@ _Okta domain_ (_Connection URL_), _client id_ and _client secret_ generated for [Identity Provider registration](../create-client) in Hopsworks.
- Application overview + Application overview
Application overview
@@ -69,4 +69,4 @@ _Okta domain_ (_Connection URL_), _client id_ and _client secret_ generated for ## Conclusion In this guide you learned how to create a client in your Okta identity provider and -acquire a _client id_ and a _client secret_. \ No newline at end of file +acquire a _client id_ and a _client secret_. diff --git a/docs/setup_installation/admin/project.md b/docs/setup_installation/admin/project.md index 443243c11..9a6818e61 100644 --- a/docs/setup_installation/admin/project.md +++ b/docs/setup_installation/admin/project.md @@ -17,14 +17,14 @@ You need to be an administrator on a Hopsworks cluster. You can find the Project management page by clicking on your name, in the top right corner of the navigation bar, and choosing _Cluster Settings_ from the dropdown menu and going to the _Project_ tab.
- Project page + Project page
Project page
This page will list all the projects in a cluster, their name, owner and when its quota was last updated. By clicking on the _edit configuration_ link of a project you will be able to edit the quotas of that project.
- Project quotas + Project quotas
Project quotas
diff --git a/docs/setup_installation/admin/roleChaining.md b/docs/setup_installation/admin/roleChaining.md index 9b9e72a3a..8815acfd7 100644 --- a/docs/setup_installation/admin/roleChaining.md +++ b/docs/setup_installation/admin/roleChaining.md @@ -64,20 +64,20 @@ In Hopsworks, click on your name in the top right corner of the navigation bar a In the Cluster Settings' _IAM Role Chaining_ tab you can configure the mappings between projects and IAM roles.
- Role Chaining + Role Chaining
Role Chaining
Add mappings by clicking on *New role chaining*. Enter the project name. Select the type of user that can assume the role. Enter the role ARN. And click on *Create new role chaining*
- Create Role Chaining + Create Role Chaining
Create Role Chaining
Project member can now create connectors using *temporary credentials* to assume the role you configured. More detail about using temporary credentials can be found [here](../user_guides/fs/storage_connector/creation/s3.md#temporary-credentials). -Project member can see the list of role they can assume by going the _Project Settings_ -> [Assuming IAM Roles](../../user_guides/projects/iam_role/iam_role_chaining) page. +Project member can see the list of role they can assume by going the _Project Settings_ -> [Assuming IAM Roles](../../../user_guides/projects/iam_role/iam_role_chaining) page. ## Conclusion -In this guide you learned how to configure and map AWS IAM roles to project roles in Hopsworks. \ No newline at end of file +In this guide you learned how to configure and map AWS IAM roles to project roles in Hopsworks. diff --git a/docs/setup_installation/admin/services.md b/docs/setup_installation/admin/services.md index 950a53a56..e5de44fa4 100644 --- a/docs/setup_installation/admin/services.md +++ b/docs/setup_installation/admin/services.md @@ -12,7 +12,7 @@ You can find the Services page by clicking on your name, in the top right corner _Cluster Settings_ from the dropdown menu and going to the _Services_ tab.
- services page + services page
Services page
@@ -29,14 +29,14 @@ Services are divided into groups, and you can search for a service by its name o by their host name.
- services + services
Services
### Step 3: Manage a service After you find the correct service you will be able to **start**, **stop** or **restart** it, by clicking on its status.
- start services + start services
Start, Stop and Restart a service
@@ -46,4 +46,4 @@ After you find the correct service you will be able to **start**, **stop** or ** access the machine running the service and start it with ```systemctl start glassfish_domain1```. ## Conclusion -In this guide you learned how to manage services in Hopsworks. \ No newline at end of file +In this guide you learned how to manage services in Hopsworks. diff --git a/docs/setup_installation/admin/user.md b/docs/setup_installation/admin/user.md index a5243322f..883a81d6a 100644 --- a/docs/setup_installation/admin/user.md +++ b/docs/setup_installation/admin/user.md @@ -15,7 +15,7 @@ Settings_ from the dropdown menu and going to the _Users_ tab (You need to have _Cluster Settings_ page).
- active users + active users
Active Users
@@ -37,7 +37,7 @@ First, a user with an admin role needs to validate their account. By clicking on the _Review Requests_ button you can open a _user request review_ popup as shown in the image below.
- request + request
Review user request
@@ -50,7 +50,7 @@ deleted manually in the cluster using the command line. You can block a user by clicking on the block icon on the right side of the user in the list.
- blocked users + blocked users
Blocked Users
@@ -67,7 +67,7 @@ cluster click on the _select dropdown_ to the right of the search box and choose If you want to allow users to login without registering you can pre-create them by clicking on _New user_.
- New user + New user
Create new user
@@ -80,7 +80,7 @@ A temporary password will be generated and displayed when you click on _Create n it securely to the user.
- create user + create user
Copy temporary password
@@ -92,21 +92,21 @@ In the case where a user loses her/his password and can not recover it with the On the bottom of the _Users_ page click on the _Reset a user password_ link. A popup window with a dropdown for searching users by name or email will open. Find the user and click on _Reset new password_.
- reset password + reset password
Reset user password
A temporary password will be displayed. Copy the password and pass it to the user securely.
- temp password + temp password
Copy temporary password
A user with a temporary password will see a warning message when going to _Account settings_ **Authentication** tab.
- change password + change password
Change password
@@ -115,4 +115,4 @@ A user with a temporary password will see a warning message when going to _Accou A temporary password should be changed as soon as possible. ## Conclusion -In this guide you learned how to manage users in Hopsworks. \ No newline at end of file +In this guide you learned how to manage users in Hopsworks. diff --git a/docs/setup_installation/admin/variables.md b/docs/setup_installation/admin/variables.md index 4ebf814ff..d68b4e989 100644 --- a/docs/setup_installation/admin/variables.md +++ b/docs/setup_installation/admin/variables.md @@ -18,7 +18,7 @@ You can find the configuration page by navigating in the UI: 2. Among the cluster settings, you will find a tab *Configuration*
- Configuration Settings + Configuration Settings
Configuration settings
@@ -39,6 +39,6 @@ To do so, click on *New Variable*, where you can then configure the new setting Once you have set the desired properties, you can persist them by clicking *Create Configuration*
- Adding a new configuration property + Adding a new configuration property
Adding a new configuration property
From 9f26d5e52343304be38410fc4decf1a9dc27cf36 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Fri, 4 Oct 2024 17:42:42 +0100 Subject: [PATCH 05/24] Remove references to Karamel in kerberos/ldap config. --- docs/setup_installation/admin/ldap/configure-server.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/setup_installation/admin/ldap/configure-server.md b/docs/setup_installation/admin/ldap/configure-server.md index 501b3af09..32ab46e12 100644 --- a/docs/setup_installation/admin/ldap/configure-server.md +++ b/docs/setup_installation/admin/ldap/configure-server.md @@ -1,7 +1,7 @@ # Configure Server for LDAP and Kerberos ## Introduction -LDAP and Kerberos integration need some configuration in the [Karamel](https://github.com/logicalclocks/karamel-chef) +LDAP and Kerberos integration need some configuration in the helm charts for your cluster definition used to deploy your Hopsworks cluster. This tutorial shows an administrator how to configure the application server for LDAP and Kerberos integration. @@ -34,8 +34,7 @@ ldap: - security_credentials: contains the password of the user that will be used to query LDAP. - referral: whether to follow or ignore an alternate location in which an LDAP Request may be processed. -### Without Karamel/Chef -An already deployed instance can be configured to connect to LDAP without re-running Karamel/Chef. +An already deployed instance can be configured to connect to LDAP. Go to the payara admin UI and create a new JNDI external resource. The name of the resource should be __ldap/LdapResource__.
From 21b17f7deab716856502dd6ee5fa97e4f8c5c0d9 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Fri, 4 Oct 2024 18:59:20 +0100 Subject: [PATCH 06/24] Change Azure instructions to use kubernetes. - Remove other files in subdirectory that reference managed.hopsworks.ai. --- .../azure/aks_acr_integration.md | 53 --- .../azure/azure_permissions.json | 44 --- .../azure/cluster_creation.md | 243 ------------- .../azure/getting_started.md | 318 +++++------------ .../azure/restrictive_permissions.md | 188 ---------- docs/setup_installation/azure/upgrade.md | 334 ------------------ docs/setup_installation/azure/upgrade_2.4.md | 177 ---------- docs/setup_installation/azure/upgrade_3.0.md | 127 ------- 8 files changed, 86 insertions(+), 1398 deletions(-) delete mode 100644 docs/setup_installation/azure/aks_acr_integration.md delete mode 100644 docs/setup_installation/azure/azure_permissions.json delete mode 100644 docs/setup_installation/azure/cluster_creation.md delete mode 100644 docs/setup_installation/azure/restrictive_permissions.md delete mode 100644 docs/setup_installation/azure/upgrade.md delete mode 100644 docs/setup_installation/azure/upgrade_2.4.md delete mode 100644 docs/setup_installation/azure/upgrade_3.0.md diff --git a/docs/setup_installation/azure/aks_acr_integration.md b/docs/setup_installation/azure/aks_acr_integration.md deleted file mode 100644 index e4625fe7d..000000000 --- a/docs/setup_installation/azure/aks_acr_integration.md +++ /dev/null @@ -1,53 +0,0 @@ -# Integration with Azure AKS - -This guide shows how to create a cluster in [managed.hopsworks.ai](https://managed.hopsworks.ai) with integrated support for Azure Kubernetes Service (AKS). This enables Hopsworks to launch Python jobs, Jupyter servers, and serve models on top of AKS. - -This guide provides an example setup with a private AKS cluster. - -!!!Note - If you prefer to use Terraform over command lines you can refer to our Terraform example [here](https://github.com/logicalclocks/terraform-provider-hopsworksai/tree/main/examples/complete/azure/aks). - -## Step 1: Create a Virtual network and a subnet -First, you need to create the virtual network and the subnet in which Hopsworks and the AKS nodes will run. To do this run the following commands, replacing *\$RESOURCE_GROUP* with the resource group in which you will run your cluster. - -```bash -az network vnet create --resource-group $RESOURCE_GROUP --name hopsworks-vnet --address-prefixes 172.18.0.0/16 -az network vnet subnet create --resource-group $RESOURCE_GROUP --name hopsworks-subnet --vnet-name hopsworks-vnet --address-prefixes 172.18.0.0/24 -``` - -## Step 2: Create the AKS cluster. -Run the following command to create the AKS cluster. Replace *\$RESOURCE_GROUP* with the resource group in which you will run your cluster and *\$SUBSCRIPTION_ID* . - -```bash -aksidentity=$(az aks create --resource-group $RESOURCE_GROUP --name hopsworks-aks --network-plugin azure --enable-private-cluster --enable-managed-identity --vnet-subnet-id /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.Network/virtualNetworks/hopsworks-vnet/subnets/hopsworks-subnet --query identityProfile.kubeletidentity.objectId -o tsv) -``` - -## Step 3: Add permissions to the managed identity -You need to add permission to [the managed identity you will assign to your Hopsworks cluster](getting_started.md#step-4-create-a-managed-identity) to access the AKS cluster. To do it run the following command, replacing *\$RESOURCE_GROUP* with the resource group in which you will run your cluster and $identityId with the *id* of the identity you will assign to your Hopsworks cluster. - -```bash -az role assignment create --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP --role "Azure Kubernetes Service Cluster User Role" --assignee $identityId -``` - -You also need to grant permission to pull images from the [ACR](getting_started.md#step-3-create-an-acr-container-registry) to the AKS nodes. To do it run the following command, replacing *\$RESOURCE_GROUP* with the resource group in which you will run your cluster - -```bash -az role assignment create --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP --role "AcrPull" --assignee $aksidentity -``` - - -## Step 4: Create the Hopsworks cluster - -Create the hopsworks cluster by following the same steps as in the [getting started](getting_started.md#step-6-deploy-a-hopsworks-cluster) until the backup tab. Then click on *next* to get to the *Managed containers* tab. Set *Use Azure AKS* as enabled. One new field will pop up. Fill it with the name of the AKS you created above (hopsworks-aks). Click *next* and in the *Virtual Network* tab select the virtual network we created above (hopsworks-vnet). Finally, click *next* and select the subnet we created above (hopsworks-subnet) in the *Subnet* tab. You can now click on *Review and Submit* and create your cluster. - -

-

- -
Hopsworks AKS configuration
-
-

- - -## Going further - -You can also deploy the AKS cluster and the Hopsworks cluster in two different virtual networks by using network peering. For this we recommend to use our [terraform provider](../common/terraform.md) and to refer to our [example](https://github.com/logicalclocks/terraform-provider-hopsworksai/tree/main/examples/complete/azure/advanced/aks-with-peering) \ No newline at end of file diff --git a/docs/setup_installation/azure/azure_permissions.json b/docs/setup_installation/azure/azure_permissions.json deleted file mode 100644 index 47568c5ec..000000000 --- a/docs/setup_installation/azure/azure_permissions.json +++ /dev/null @@ -1,44 +0,0 @@ -{ - "Name": "hopsworks.ai", - "IsCustom": true, - "Description": "Allows hopsworks.ai to start and manage clusters", - "Actions": [ - "Microsoft.Compute/virtualMachines/write", - "Microsoft.Compute/virtualMachines/start/action", - "Microsoft.Compute/virtualMachines/delete", - "Microsoft.Compute/virtualMachines/read", - "Microsoft.Compute/virtualMachines/deallocate/action", - "Microsoft.Compute/disks/write", - "Microsoft.Compute/disks/delete", - "Microsoft.Network/networkInterfaces/read", - "Microsoft.Network/networkInterfaces/join/action", - "Microsoft.Network/networkInterfaces/write", - "Microsoft.Network/networkInterfaces/delete", - "Microsoft.Network/networkSecurityGroups/read", - "Microsoft.Network/networkSecurityGroups/join/action", - "Microsoft.Network/networkSecurityGroups/write", - "Microsoft.Network/networkSecurityGroups/delete", - "Microsoft.Network/publicIPAddresses/join/action", - "Microsoft.Network/publicIPAddresses/read", - "Microsoft.Network/publicIPAddresses/write", - "Microsoft.Network/publicIPAddresses/delete", - "Microsoft.Network/virtualNetworks/write", - "Microsoft.Network/virtualNetworks/delete", - "Microsoft.Network/virtualNetworks/read", - "Microsoft.Network/virtualNetworks/subnets/read", - "Microsoft.Network/virtualNetworks/subnets/join/action", - "Microsoft.Resources/subscriptions/resourceGroups/read", - "Microsoft.Compute/sshPublicKeys/read", - "Microsoft.ManagedIdentity/userAssignedIdentities/assign/action", - "Microsoft.ManagedIdentity/userAssignedIdentities/read", - "Microsoft.Storage/storageAccounts/read", - "Microsoft.Compute/snapshots/write", - "Microsoft.Compute/snapshots/read", - "Microsoft.Compute/snapshots/delete", - "Microsoft.Compute/disks/beginGetAccess/action", - "Microsoft.Compute/disks/read", - "Microsoft.Authorization/roleAssignments/read" - ], - "NotActions": [], - "AssignableScopes": ["/subscriptions/SUBSCRIPTION_ID/resourceGroups/RESOURCE_GROUP"] - } \ No newline at end of file diff --git a/docs/setup_installation/azure/cluster_creation.md b/docs/setup_installation/azure/cluster_creation.md deleted file mode 100644 index 77caef425..000000000 --- a/docs/setup_installation/azure/cluster_creation.md +++ /dev/null @@ -1,243 +0,0 @@ -# Getting started with managed.hopsworks.ai (Azure) -This guide goes into detail for each of the steps of the cluster creation in [managed.hopsworks.ai](https://managed.hopsworks.ai) - -### Step 1 starting to create a cluster - -In [managed.hopsworks.ai](https://managed.hopsworks.ai), select *Create cluster*: - -

-

- Create a Hopsworks cluster -
Create a Hopsworks cluster
-
-

- -### Step 2 setting the General information - -Select the *Resource Group* (1) you want to use. - -!!! note - If the *Resource Group* does not appear in the drop-down, make sure that you properly [created and set the custom role](#step-12-creating-a-custom-role-for-hopsworksai) for this resource group. - -Name your cluster (2). Your cluster will be deployed in the *Location* of your *Resource Group* (3). - -Select the *Instance type* (4) and *Local storage* (5) size for the cluster *Head node*. - -Optional (6): Enable customer-managed encryption keys and specify a [disk encryption set](https://docs.microsoft.com/en-us/azure/virtual-machines/disks-enable-customer-managed-keys-portal#set-up-your-disk-encryption-set) to be used for encryption of local storage. The disk encryption set has to be specified using the format: `/subscriptions/[SUBSCRIPTION_ID]/resourceGroups/[RESOURCE_GROUP]/providers/Microsoft.Compute/diskEncryptionSets/[DISK_ENCRYPTION_SET]`. Note that you have to grant the service principal of managed.hopsworks.ai `Reader` access to the disk encryption set and `Key Vault Reader` and ` Key Vault Secrets User` on the key vault used with the disk encryption set. Refer to the Azure documentation for more details: [Server-side encryption of Azure Disk Storage](https://docs.microsoft.com/en-us/azure/virtual-machines/disk-encryption). - -To provide the capacity of adding and removing workers on demand, the Hopsworks clusters deployed by [managed.hopsworks.ai](https://managed.hopsworks.ai) store their data in an Azure storage container. In this step, you select which storage account and container to use for this purpose. Select the *storage account* (7) you want to use in *Azure Storage account name*. The name of the container in which the data will be stored is displayed in *Azure Container name* (8). You can change this name. For more details on how to create and configure a storage in Azure refer to [Creating and configuring a storage](getting_started.md#step-2-creating-and-configuring-a-storage) - -!!! note - You can choose to use a container already existing in your *storage account* by using the name of this container, but you need to first make sure that this container is empty. - -Enter the *Azure container registry name* (9) to be used as the managed docker registry for the cluster. - -

-

- General configuration -
General configuration
-
-

- -On this page, you can also choose to opt out of [managed.hopsworks.ai](https://managed.hopsworks.ai) log collection. [Managed.hopsworks.ai](https://managed.hopsworks.ai) collects logs about the services running in your cluster to help us improve our system and to provide support. If you choose to opt-out from the log collection and need support you will have to provide us the log by yourself, which will slow down the support process. - -### Step 3 workers configuration - -In this step, you configure the workers. There are two possible setups static or autoscaling. In the static setup, the cluster has a fixed number of workers that you decide. You can then add and remove workers manually, for more details: [the documentation](../common/adding_removing_workers.md). In the autoscaling setup, you configure conditions to add and remove workers and the cluster will automatically add and remove workers depending on the demand. - -#### Static workers configuration -You can set the static configuration by selecting *Disabled* in the first drop-down (1). Then you select the number of workers you want to start the cluster with (2). And, select the *Instance type* (3) and *Local storage* size (4) for the *worker nodes*. - -

-

- Create a Hopsworks cluster, static workers configuration -
Create a Hopsworks cluster, static workers configuration
-
-

- -#### Autoscaling workers configuration -You can set the autoscaling configuration by selecting enabled in the first drop-down (1). You can configure: - -1. The instance type you want to use. -2. The size of the instances' disk. -3. The minimum number of workers. -4. The maximum number of workers. -5. The targeted number of standby workers. Setting some resources in standby ensures that there are always some free resources in your cluster. This ensures that requests for new resources are fulfilled promptly. You configure the standby by setting the amount of workers you want to be in standby. For example, if you set a value of *0.5* the system will start a new worker every time the aggregated free cluster resources drop below 50% of a worker's resources. If you set this value to 0 new workers will only be started when a job or notebook request the resources. -6. The time to wait before removing unused resources. One often starts a new computation shortly after finishing the previous one. To avoid having to wait for workers to stop and start between each computation it is recommended to wait before shutting down workers. Here you set the amount of time in seconds resources need to be unused before they get removed from the system. - -!!! note - The standby will not be taken into account if you set the minimum number of workers to 0 and no resources are used in the cluster. This ensures that the number of nodes can fall to 0 when no resources are used. The standby will start to take effect as soon as you start using resources. - -

-

- Create a Hopsworks cluster, autoscale workers configuration -
Create a Hopsworks cluster, autoscale workers configuration
-
-

- -### Step 4 select a SSH key - -When deploying clusters, [managed.hopsworks.ai](https://managed.hopsworks.ai) installs a ssh key on the cluster's instances so that you can access them if necessary. -Select the *SSH key* that you want to use to access cluster instances. For more detail on how to add a shh key in Azure refer to [Adding a ssh key to your resource group](getting_started.md#step-4-adding-a-ssh-key-to-your-resource-group) - -

-

- Choose SSH key -
Choose SSH key
-
-

- -### Step 5 select the User assigned managed identity: - -In order to let the cluster instances access to the Azure storage we need to attach a *User assigned managed identity* to the virtual machines. In this step you choose which identity to use. This identity need to have access right to the *storage account* you selected in [Step 2](#step-2-setting-the-general-information). For more information about how to create this identity and give it access to the storage account refer to [Creating and configuring a storage](getting_started.md#step-2-creating-and-configuring-a-storage): - -

-

- Choose the User assigned managed identity -
Choose the User assigned managed identity
-
-

- -### Step 6 set the backup retention policy: - -To backup the Azure blob storage data when taking a cluster backups we need to set a retention policy for the blob storage. In this step, you choose the retention period in days. You can deactivate the retention policy by setting this value to 0 but this will block you from taking any backup of your cluster. - -

-

- Choose the backup retention policy -
Choose the backup retention policy
-
-

- -### Step 7 Virtual network selection -In this step, you can select the virtual network which will be used by the Hopsworks cluster. You can either select an existing virtual network or let [managed.hopsworks.ai](https://managed.hopsworks.ai) create one for you. If you decide to let [managed.hopsworks.ai](https://managed.hopsworks.ai) create the virtual network for you, you can choose the CIDR block for this virtual network. -Refer to [Create a virtual network and subnet](restrictive_permissions.md#step-1-create-a-virtual-network-and-subnet) for more details on how to create your own virtual network in Azure. - -

-

- Choose virtual network -
Choose virtual network
-
-

- -### Step 8 Subnet selection -If you selected an existing virtual network in the previous step, this step lets you select which subnet of this virtual network to use. For more information about creating your own subnet refer to [Create a virtual network and subnet](restrictive_permissions.md#step-1-create-a-virtual-network-and-subnet). - -If you did not select an existing virtual network in the previous step [managed.hopsworks.ai](https://managed.hopsworks.ai) will create the subnet for you. You can choose the CIDR block this subnet will use. - -

-

- Choose subnet -
Choose subnet
-
-

- -### Step 9 Network Security group selection -In this step, you can select the network security group you want to use to manage the inbound and outbound network rules. You can either let [managed.hopsworks.ai](https://managed.hopsworks.ai) create a network security group for you or select an existing security group. For more information about how to create your own network security group in Azure refer to [Create a network security group](restrictive_permissions.md#step-2-create-a-network-security-group). - -!!! note - [Managed.hopsworks.ai](https://managed.hopsworks.ai) require some rules for inbound and outbound traffic in your security group, for more details refer to [inbound traffic rules](restrictive_permissions.md#inbound-traffic) and [outbound traffic rules](restrictive_permissions.md#outbound-traffic). - -!!! note - [Managed.hopsworks.ai](https://managed.hopsworks.ai) attaches a public ip to your cluster by default. However, you can disable this behavior by unchecking the *Attach Public IP* checkbox. - -

-

- Choose security group -
Choose security group
-
-

- -#### Limiting outbound traffic to managed.hopsworks.ai - -Clusters created on [managed.hopsworks.ai](https://managed.hopsworks.ai) need to be able to send http requests to api.hopsworks.ai. If you have strict regulation regarding outbound traffic, you can enable the *Use static IPs to communicate with [managed.hopsworks.ai](https://managed.hopsworks.ai)* checkbox to get the list of IPs to be allowed as shown below: - -

-

- Enable static IPs -
Enable static IPs
-
-

- - -### Step 10 User management selection -In this step, you can choose which user management system to use. You have four choices: - -* *Managed*: [managed.hopsworks.ai](https://managed.hopsworks.ai) automatically adds and removes users from the Hopsworks cluster when you add and remove users from your organization (more details [here](../common/user_management.md)). -* *OAuth2*: integrate the cluster with your organization's OAuth2 identity provider. See [Use OAuth2 for user management](../common/sso/oauth.md) for more detail. -* *LDAP*: integrate the cluster with your organization's LDAP/ActiveDirectory server. See [Use LDAP for user management](../common/sso/ldap.md) for more detail. -* *Disabled*: let you manage users manually from within Hopsworks. - -

-

- Choose user management type -
Choose user management type
-
-

- -### Step 12 Managed RonDB -Hopsworks uses [RonDB](https://www.rondb.com/) as a database engine for its online Feature Store. By default database will run on its -own VM. Premium users can scale-out database services to multiple VMs -to handle increased workload. - -For details on how to configure RonDB check our guide [here](../common/rondb.md). - -

-

- Configure RonDB -
Configure RonDB
-
-

- -If you need to deploy a RonDB cluster instead of a single node please contact [us](mailto:sales@logicalclocks.com). - -### Step 13 add tags to your instances. -In this step, you can define tags that will be added to the cluster virtual machines. - -

-

- Add tags -
Add tags
-
-

- -### Step 14 add an init script to your instances. -In this step, you can enter an initialization script that will be run at startup on every instance. - -!!! note - this init script must be a bash script starting with *#!/usr/bin/env bash* - -

-

- Add initialization script -
Add initialization script
-
-

- -### Step 15 Review and create -Review all information and select *Create*: - -

-

- Review cluster information -
Review cluster information
-
-

- -The cluster will start. This will take a few minutes: - -

-

- Booting Hopsworks cluster -
Booting Hopsworks cluster
-
-

- -As soon as the cluster has started, you will be able to log in to your new Hopsworks cluster. You will also be able to stop, restart, or terminate the cluster. - -

-

- Running Hopsworks cluster -
Running Hopsworks cluster
-
-

diff --git a/docs/setup_installation/azure/getting_started.md b/docs/setup_installation/azure/getting_started.md index 715e64b65..fd82dbc55 100644 --- a/docs/setup_installation/azure/getting_started.md +++ b/docs/setup_installation/azure/getting_started.md @@ -1,310 +1,164 @@ -# Getting started with managed.hopsworks.ai (Azure) +# Azure - Getting started with AKS -[Managed.hopsworks.ai](https://managed.hopsworks.ai) is our managed platform for running Hopsworks and the Feature Store -in the cloud. It integrates seamlessly with third-party platforms such as Databricks, -SageMaker and KubeFlow. This guide shows how to set up [managed.hopsworks.ai](https://managed.hopsworks.ai) with your organization's Azure account. +Kubernetes and Helm are used to install & run Hopsworks and the Feature Store +in the cloud. They both integrate seamlessly with third-party platforms such as Databricks, +SageMaker and KubeFlow. This guide shows how to set up the Hopsworks platform in your organization's Azure account. ## Prerequisites To follow the instruction on this page you will need the following: +- Kubernetes Version: Hopsworks can be deployed on AKS clusters running Kubernetes >= 1.27.0. - An Azure resource group in which the Hopsworks cluster will be deployed. - The [azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) installed and [logged in](https://docs.microsoft.com/en-us/cli/azure/authenticate-azure-cli). +- kubectl (to manage the AKS cluster) +- helm (to deploy Hopsworks) ### Permissions -To run all the commands on this page the user needs to have at least the following permissions on the Azure resource group: -```json -Microsoft.Authorization/roleDefinitions/write -Microsoft.Authorization/roleAssignments/write -Microsoft.Compute/sshPublicKeys/generateKeyPair/action -Microsoft.Compute/sshPublicKeys/read -Microsoft.Compute/sshPublicKeys/write -Microsoft.ContainerRegistry/registries/operationStatuses/read -Microsoft.ContainerRegistry/registries/read -Microsoft.ContainerRegistry/registries/write -Microsoft.ManagedIdentity/userAssignedIdentities/write -Microsoft.Resources/subscriptions/resourcegroups/read -Microsoft.Storage/storageAccounts/write -``` +The deployment requires cluster admin access to create ClusterRoles, ServiceAccounts, and ClusterRoleBindings in AKS. + +A namespace is also required for deploying the Hopsworks stack. If you don’t have permissions to create a namespace, ask your AKS administrator to provision one for you. + +To run all the commands on this page the user needs to have at least the following permissions on the Azure resource group: You will also need to have a role such as *Application Administrator* on the Azure Active Directory to be able to create the hopsworks.ai service principal. -### Resource providers -For [managed.hopsworks.ai](https://managed.hopsworks.ai) to deploy a cluster the following resource providers need to be registered on your Azure subscription. +## Step 1: Azure AKS Setup -```json -Microsoft.Network -Microsoft.Compute -Microsoft.Storage -Microsoft.ManagedIdentity -Microsoft.ContainerRegistry -``` -This can be done by running the following commands: +### Step 1.1: Create an Azure Blob Storage Account -!!!note - To run these commands you need to have the following permission on your subscription: *Microsoft.Network/register/action* +Create a storage account to host project data. Ensure that the storage account is in the same region as the AKS cluster for performance and cost reasons: ```bash -az provider register --namespace 'Microsoft.Network' -az provider register --namespace 'Microsoft.Compute' -az provider register --namespace 'Microsoft.Storage' -az provider register --namespace 'Microsoft.ManagedIdentity' -az provider register --namespace 'Microsoft.ContainerRegistry' +az storage account create --name $storage_account_name --resource-group $resource_group --location $region ``` -### Other -All the commands have been written for a Unix system. These commands will need to be adapted to your terminal if it is not directly compatible. - -All the commands use your default location. Add the *--location* parameter if you want to run your cluster in another location. Make sure to create the resources in the same location as you are going to run your cluster. - -## Step 1: Connect your Azure account - -[Managed.hopsworks.ai](https://managed.hopsworks.ai) deploys Hopsworks clusters to your Azure account. To enable this, you have to -create a service principal and a custom role for [managed.hopsworks.ai](https://managed.hopsworks.ai) granting access to your resource group. - -

- -

- - -### Step 1.1: Connect your Azure account - -In [managed.hopsworks.ai](https://managed.hopsworks.ai/) click on *Connect to Azure* or go to *Settings* and click on *Configure* next to *Azure*. This will direct you to a page with the instructions needed to create the service principal and set up the connection. Follow the instructions. +Also create a corresponding container: -!!! note - it is possible to limit the permissions that are set up during this phase. For more details see [restrictive-permissions](restrictive_permissions.md). -

-

- Cloud account settings -
Cloud account settings
-
-

+```bash +az storage container create --account-name $storage_account_name --name $container_name +``` -## Step 2: Create a storage -!!! note - If you prefer using terraform, you can skip this step and the remaining steps, and instead, follow [this guide](../common/terraform.md#getting-started-with-azure). +### Step 1.2: Create an Azure Container Registry (ACR) -The Hopsworks clusters deployed by [managed.hopsworks.ai](https://managed.hopsworks.ai) store their data in a storage container in your Azure account. To enable this you need to create a storage account. -This is done by running the following command, replacing *$RESOURCE_GROUP* with the name of your resource group. +Create an ACR to store the images used by Hopsworks: ```bash -az storage account create --resource-group $RESOURCE_GROUP --name hopsworksstorage$RANDOM +az acr create --resource-group $resource_group --name $registry_name --sku Basic --location $region ``` -## Step 3: Create an ACR Container Registry +### Step 1.3: Create an AKS Kubernetes Cluster -The Hopsworks clusters deployed by [managed.hopsworks.ai](https://managed.hopsworks.ai) store their docker images in a container registry in your Azure account. -To create this storage account run the following command, replacing *$RESOURCE_GROUP* with the name of your resource group. +Provision an AKS cluster with a number of nodes: ```bash -az acr create --resource-group $RESOURCE_GROUP --name hopsworksecr --sku Premium +az aks create --resource-group $resource_group --name $cluster_name --enable-cluster-autoscaler --min-count 1 --max-count 4 --node-count 3 --node-vm-size Standard_D16_v4 --network-plugin azure --enable-managed-identity --generate-ssh-keys ``` -To prevent the registry from filling up with unnecessary images and artifacts you can enable a retention policy. A retention policy will automatically remove untagged manifests after a specified number of days. To enable a retention policy, run the following command, replacing *$RESOURCE_GROUP* with the name of your resource group. +### Step 1.4: Retrieve setup Identifiers + +Create a set of environment variables for use in later steps. ```bash -az acr config retention update --resource-group $RESOURCE_GROUP --registry hopsworksecr --status Enabled --days 7 --type UntaggedManifests -``` +export managed_id=`az aks show --resource-group $resource_group --name $cluster_name --query "identity.principalId" --output tsv` -## Step 4: Create a managed identity -To allow the hopsworks cluster instances to access the storage account and the container registry, [managed.hopsworks.ai](https://managed.hopsworks.ai) assigns a managed identity to the cluster nodes. To enable this you need to: +export storage_id=`az storage account show --name $storage_account_name --resource-group $resource_group --query "id" --output tsv` -- Create a managed identity -- Create a role with appropriate permission and assign it to the managed identity +export acr_id=`az acr show --name $registry_name --resource-group $resource_group --query "id" --output tsv` +``` -### Step 4.1: Create a managed identity -You create a managed identity by running the following command, replacing *$RESOURCE_GROUP* with the name of your resource group. +### Step 1.5: Assign Roles to Managed Identity ```bash -identityId=$(az identity create --name hopsworks-instance --resource-group $RESOURCE_GROUP --query principalId -o tsv) -``` +az role assignment create --assignee $managed_id --role "Storage Blob Data Contributor" --scope $storage_id -### Step 4.2: Create a role for the managed identity -To create a new role for the managed identity, first, create a file called *instance-role.json* with the following content. Replace *SUBSCRIPTION_ID* by your subscription id and *RESOURCE_GROUP* by your resource group - -```json -{ - "Name": "hopsworks-instance", - "IsCustom": true, - "Description": "Allow the hopsworks instance to access the storage and the docker repository", - "Actions": [ - "Microsoft.Storage/storageAccounts/blobServices/containers/write", - "Microsoft.Storage/storageAccounts/blobServices/containers/read", - "Microsoft.Storage/storageAccounts/blobServices/write", - "Microsoft.Storage/storageAccounts/blobServices/read", - "Microsoft.Storage/storageAccounts/listKeys/action", - "Microsoft.ContainerRegistry/registries/artifacts/delete", - "Microsoft.ContainerRegistry/registries/pull/read", - "Microsoft.ContainerRegistry/registries/push/write" - ], - "NotActions": [ - - ], - "DataActions": [ - "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete", - "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read", - "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action", - "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write" - ], - "AssignableScopes": [ - "/subscriptions/SUBSCRIPTION_ID/resourceGroups/RESOURCE_GROUP" - ] -} +az role assignment create --assignee $managed_id --role AcrPull --scope $acr_id +az role assignment create --assignee $managed_id --role "AcrPush" --scope $acr_id +az role assignment create --assignee $managed_id --role "AcrDelete" --scope $acr_id ``` -Then run the following command, to create the new role. + +### Step 1.6: Allow AKS cluster access to ACR repository. ```bash -az role definition create --role-definition instance-role.json +az aks update --resource-group $resource_group --name $cluster_name --attach-acr $registry_name ``` -Finally assign the role to the managed identity by running the following command, replacing *$RESOURCE_GROUP* with the name of your resource group. +## Step 2: Configure kubectl ```bash -az role assignment create --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP --role hopsworks-instance --assignee $identityId +az aks get-credentials --resource-group $resource_group --name $cluster_name --file ~/my-aks-kubeconfig.yaml +export KUBECONFIG=~/my-aks-kubeconfig.yaml +kubectl config current-context ``` -!!!note - It takes several minutes between the time you create the managed identity and the time a role can be assigned to it. So if we get an error message starting by the following wait and retry: *Cannot find user or service principal in graph database* - -## Step 5: Add an ssh key to your resource group - -When deploying clusters, [managed.hopsworks.ai](https://managed.hopsworks.ai) installs an ssh key on the cluster's instances so that you can access them if necessary. For this purpose, you need to add an ssh key to your resource group. +## Step 3: Setup Hopsworks for Deployment -To create an ssh key in your resource group run the following command, replacing *$RESOURCE_GROUP* with the name of your resource group. +### Step 3.1: Add the Hopsworks Helm repository ```bash -az sshkey create --resource-group $RESOURCE_GROUP --name hopsworksKey +helm repo add hopsworks-dev https://nexus.hops.works/repository/hopsworks-helm-dev --username $NEXUS_USER --password $NEXUS_PASS +helm repo update hopsworks-dev ``` -!!!note - the command returns the path to the private and public keys associated with this ssh key. You can also create a key from an existing public key as indicated in the [Azure documentation](https://learn.microsoft.com/en-us/cli/azure/sshkey?view=azure-cli-latest#az-sshkey-create) - -## Step 6: Deploy a Hopsworks cluster - -In [managed.hopsworks.ai](https://managed.hopsworks.ai), select *Create cluster*: - -

-

- Create a Hopsworks cluster -
Create a Hopsworks cluster
-
-

+### Step 3.3: Create Hopsworks namespace & secrets -Select the *Resource Group* (1) in which you created your *storage account* and *managed identity* (see above). - -!!! note - If the *Resource Group* does not appear in the drop-down, make sure that the custom role you created in step [1.1](#step-11-connect-your-azure-account) has the *Microsoft.Resources/subscriptions/resourceGroups/read* permission and is assigned to the *hopsworks.ai* user. - -Name your cluster (2). Your cluster will be deployed in the *Location* of your *Resource Group* (3). - -Select the *Instance type* (4) and *Local storage* (5) size for the cluster *Head node*. - -Check if you want to *Use customer-managed encryption key* (6) - -Select the *storage account* (7) you created above in *Azure Storage account name*. The name of the container in which the data will be stored is displayed in *Azure Container name* (8), you can modify it if needed. - -!!! note - You can choose to use a container already existing in your *storage account* by using the name of this container, but you need to first make sure that this container is empty. - -Enter the *Azure container registry name* (9) of the ACR registry created in [Step 3.1](#step-3-create-an-acr-container-registry) - -Press *Next*: - -

-

- General configuration -
General configuration
-
-

- -Select the number of workers you want to start the cluster with (2). -Select the *Instance type* (3) and *Local storage* size (4) for the *worker nodes*. - -!!! note - It is possible to [add or remove workers](../common/adding_removing_workers.md) or to [enable autoscaling](../common/autoscaling.md) once the cluster is running. - -Press *Next*: - -

-

- Create a Hopsworks cluster, static workers configuration -
Create a Hopsworks cluster, static workers configuration
-
-

+```bash +kubectl create namespace hopsworks -Select the *SSH key* that you want to use to access cluster instances: +kubectl create secret docker-registry regcred --namespace=hopsworks --docker-server=docker.hops.works --docker-username=$NEXUS_USER --docker-password=$NEXUS_PASS --docker-email=$NEXUS_EMAIL_ADDRESS +``` -

-

- Choose SSH key -
Choose SSH key
-
-

+### Step 3.3: Create helm values file -Select the *User assigned managed identity* that you created above: +Below is a simplifield values.azure.yaml file to get started which can be updated for improved performance and further customisation. -

-

- Choose the User assigned managed identity -
Choose the User assigned managed identity
-
-

+```bash +global: + _hopsworks: + storageClassName: null + cloudProvider: "AWS" + managedDockerRegistry: + enabled: true + domain: "rchopsworksrepo.azurecr.io" + namespace: "hopsworks" + + managedObjectStorage: + enabled: true + endpoint: "https://rchopsworksbucket.blob.core.windows.net" + minio: + enabled: false +``` +## Step 4: Deploy Hopsworks -To backup the Azure blob storage data when taking a cluster backup we need to set a retention policy for the blob storage. You can deactivate the retention policy by setting this value to 0 but this will block you from taking any backup of your cluster. Choose the retention period in days and click on *Review and submit*. +Deploy Hopsworks in the created namespace. -

-

- Choose the backup retention policy -
Choose the backup retention policy
-
-

+```bash +helm install hopsworks hopsworks-dev/hopsworks --devel --namespace hopsworks --values values.azure.yaml --timeout=600s +``` -Review all information and select *Create*: +Check that Hopsworks is installing on your provisioned AKS cluster. -

-

- Review cluster information -
Review cluster information
-
-

+```bash +kubectl get pods --namespace=hopsworks -!!! note - We skipped cluster creation steps that are not mandatory. You can find more details about these steps [here](cluster_creation.md) +kubectl get svc -n hopsworks -o wide +``` -The cluster will start. This will take a few minutes: +Upon completion (circa 20 minutes), setup a load balancer to access Hopsworks: -

-

- Booting Hopsworks cluster -
Booting Hopsworks cluster
-
-

+```bash +kubectl expose deployment hopsworks --type=LoadBalancer --name=hopsworks-service --namespace +``` -As soon as the cluster has started, you will be able to log in to your new Hopsworks cluster. You will also be able to stop, restart or terminate the cluster. -

-

- Running Hopsworks cluster -
Running Hopsworks cluster
-
-

## Step 7: Next steps Check out our other guides for how to get started with Hopsworks and the Feature Store: -* Make Hopsworks services [accessible from outside services](../common/services.md) * Get started with the [Hopsworks Feature Store](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/quickstart.ipynb){:target="_blank"} * Follow one of our [tutorials](../../tutorials/index.md) * Follow one of our [Guide](../../user_guides/index.md) diff --git a/docs/setup_installation/azure/restrictive_permissions.md b/docs/setup_installation/azure/restrictive_permissions.md deleted file mode 100644 index 08ccff320..000000000 --- a/docs/setup_installation/azure/restrictive_permissions.md +++ /dev/null @@ -1,188 +0,0 @@ -# Limiting Azure permissions - -[Managed.hopsworks.ai](https://managed.hopsworks.ai) requires a set of permissions to be able to manage resources in the user’s Azure resource group. -By default, these permissions are set to easily allow a wide range of different configurations and allow -us to automate as many steps as possible. While we ensure to never access resources we shouldn’t, -we do understand that this might not be enough for your organization or security policy. -This guide explains how to lock down access permissions following the IT security policy principle of least privilege. - -## Default permissions -This is the list of default permissions that are required by [managed.hopsworks.ai](https://managed.hopsworks.ai). If you prefer to limit these permissions, then proceed to the [next section](# Limiting the cross-account role permissions). - -```json -{!setup_installation/azure/azure_permissions.json!} -``` - -## Limiting the cross-account role permissions - -### Step 1: Create a virtual network and subnet - -To restrict [managed.hopsworks.ai](https://managed.hopsworks.ai) from having write and delete access on virtual networks and subnet you need to create them manually. -This can be achieved in the Azure portal following this guide: [Create a virtual network](https://docs.microsoft.com/en-us/azure/virtual-network/quick-create-portal). -Make sure to use the resource group and location in which you intend to deploy your Hopsworks cluster. For the remaining of the configuration, the default options proposed by the portal should work out of the box. -Note the names of the virtual network and subnet you want to use for the following steps. - -### Step 2: Create a network security group - -To restrict [managed.hopsworks.ai](https://managed.hopsworks.ai) from having write and delete access on network security groups you need to create it manually. -This can be achieved in the Azure portal following this guide: [Create a network security group](https://docs.microsoft.com/en-us/azure/virtual-network/manage-network-security-group#create-a-network-security-group). -Make sure to use the resource group and location in which you intend to deploy your Hopsworks cluster. - -#### Inbound traffic - -For [managed.hopsworks.ai](https://managed.hopsworks.ai) to create the SSL certificates the network security group needs to allow inbound traffic on port 80. -For this, you need to add an inbound security rule to your network security group. -This can be achieved in the Azure portal following this guide: [Create a security rule](https://docs.microsoft.com/en-us/azure/virtual-network/manage-network-security-group#create-a-security-rule>). -Setting the destination port ranges to 80 and letting the default values for the other fields should work out of the box. - -!!! note - If you intend to use the managed users option on your Hopsworks cluster you should also add a rule to open port 443. - -#### Outbound traffic - -Clusters created on [managed.hopsworks.ai](https://managed.hopsworks.ai) need to be able to send http requests to *api.hopsworks.ai*. The *api.hopsworks.ai* domain use a content delivery network for better performance. This result in the impossibility to predict which IP the request will be sent to. If you require a list of static IPs to allow outbound traffic from your security group, use the *static IPs* option during [cluster creation](../cluster_creation/#limiting-outbound-traffic-to-hopsworksai). - -!!! note - If you intend to use the managed users option on your Hopsworks cluster you should also allow outbound traffic to [cognito-idp.us-east-2.amazonaws.com](https://cognito-idp.us-east-2.amazonaws.com) and [managedhopsworks-prod.auth.us-east-2.amazoncognito.com](https://managedhopsworks-prod.auth.us-east-2.amazoncognito.com). - -### Step 3: Set permissions of the cross-account role -During the account setup for [managed.hopsworks.ai](https://managed.hopsworks.ai), you were asked to create create a custom role for your resource group. -Edit this role in the Azure portal by going to your resource group, clicking on *Access control (IAM)*, opening the tab *Roles*, searching for the role you created, clicking on the three dots at the end of the role line and clicking on edit. -You can then navigate to the JSON tab and overwrite the "action" field with the following: - -```json -"actions": [ - "Microsoft.Compute/virtualMachines/write", - "Microsoft.Compute/virtualMachines/start/action", - "Microsoft.Compute/virtualMachines/delete", - "Microsoft.Compute/virtualMachines/read", - "Microsoft.Compute/virtualMachines/deallocate/action", - "Microsoft.Compute/disks/write", - "Microsoft.Compute/disks/delete", - "Microsoft.Network/networkInterfaces/read", - "Microsoft.Network/networkInterfaces/join/action", - "Microsoft.Network/networkInterfaces/write", - "Microsoft.Network/networkInterfaces/delete", - "Microsoft.Network/networkSecurityGroups/read", - "Microsoft.Network/networkSecurityGroups/join/action", - "Microsoft.Network/publicIPAddresses/join/action", - "Microsoft.Network/publicIPAddresses/read", - "Microsoft.Network/publicIPAddresses/write", - "Microsoft.Network/publicIPAddresses/delete", - "Microsoft.Network/virtualNetworks/read", - "Microsoft.Network/virtualNetworks/subnets/read", - "Microsoft.Network/virtualNetworks/subnets/join/action", - "Microsoft.Resources/subscriptions/resourceGroups/read", - "Microsoft.Compute/sshPublicKeys/read", - "Microsoft.ManagedIdentity/userAssignedIdentities/assign/action", - "Microsoft.ManagedIdentity/userAssignedIdentities/read", - "Microsoft.Storage/storageAccounts/read", - "Microsoft.Compute/snapshots/write", - "Microsoft.Compute/snapshots/read", - "Microsoft.Compute/snapshots/delete", - "Microsoft.Compute/disks/beginGetAccess/action", - "Microsoft.Compute/disks/read" - - ] -``` - -### Step 4: Create your Hopsworks instance - -You can now create a new Hopsworks instance in [managed.hopsworks.ai](https://managed.hopsworks.ai) by selecting the virtual network, subnet, and network security group during the instance configuration. - -### Backup permissions - -The following permissions are only needed for the backup feature: - -```json -"actions": [ - "Microsoft.Compute/snapshots/write", - "Microsoft.Compute/snapshots/read", - "Microsoft.Compute/snapshots/delete", - "Microsoft.Compute/disks/beginGetAccess/action", - ] -``` - -If you are not going to create backups or if you do not have access to this Enterprise feature, you can further limit the permission of the cross-account role to the following: - -```json -"actions": [ - "Microsoft.Compute/virtualMachines/write", - "Microsoft.Compute/virtualMachines/start/action", - "Microsoft.Compute/virtualMachines/delete", - "Microsoft.Compute/virtualMachines/read", - "Microsoft.Compute/virtualMachines/deallocate/action", - "Microsoft.Compute/disks/write", - "Microsoft.Compute/disks/delete", - "Microsoft.Network/networkInterfaces/read", - "Microsoft.Network/networkInterfaces/join/action", - "Microsoft.Network/networkInterfaces/write", - "Microsoft.Network/networkInterfaces/delete", - "Microsoft.Network/networkSecurityGroups/read", - "Microsoft.Network/networkSecurityGroups/join/action", - "Microsoft.Network/publicIPAddresses/join/action", - "Microsoft.Network/publicIPAddresses/read", - "Microsoft.Network/publicIPAddresses/write", - "Microsoft.Network/publicIPAddresses/delete", - "Microsoft.Network/virtualNetworks/read", - "Microsoft.Network/virtualNetworks/subnets/read", - "Microsoft.Network/virtualNetworks/subnets/join/action", - "Microsoft.Resources/subscriptions/resourceGroups/read", - "Microsoft.Compute/sshPublicKeys/read", - "Microsoft.ManagedIdentity/userAssignedIdentities/assign/action", - "Microsoft.ManagedIdentity/userAssignedIdentities/read", - "Microsoft.Storage/storageAccounts/read", - ] -``` - -### Public IP Addresses permissions - -The following permissions are used to create and attach a public IP Address to the head node. If you do not want to use a public IP Address for the head node, you can remove them: - -```json -"actions": [ - "Microsoft.Network/publicIPAddresses/join/action", - "Microsoft.Network/publicIPAddresses/read", - "Microsoft.Network/publicIPAddresses/write", - "Microsoft.Network/publicIPAddresses/delete", - ] -``` -You then have to make sure that you uncheck the *Attach Public IP* check box in the *Security Group* section of the cluster creation: -

-

- Attach Public IP -
Attach Public IP
-
-

- -### Other removable permissions - -The following permission is only needed to select the Azure Storage account through a drop-down during cluster creation. You can remove it from the cross-account role and enter the value manually - -```json -"actions": [ - "Microsoft.Storage/storageAccounts/read", - ] -``` - -The following permission is only needed, during cluster creation, to check that the managed identity has the proper permission. If you remove it, this check will not be done and the deployment may fail later if the managed identity does not have the proper permissions - -```json -"actions": [ - "Microsoft.Authorization/roleAssignments/read" -] -``` - -## Limiting the User Assigned Managed Identity permissions - -### Backups - -If you do not intend to take backups or if you do not have access to this Enterprise feature you can remove the permissions that are only used by the backup feature when configuring your managed identity storage permissions. -For this remove the following actions from [your user assigned managed identity](getting_started.md#step-21-creating-a-restrictive-role-for-accessing-storage): - -```json - "actions": [ - "Microsoft.Storage/storageAccounts/blobServices/write", - "Microsoft.Storage/storageAccounts/listKeys/action" - ] -``` diff --git a/docs/setup_installation/azure/upgrade.md b/docs/setup_installation/azure/upgrade.md deleted file mode 100644 index 71704ae8b..000000000 --- a/docs/setup_installation/azure/upgrade.md +++ /dev/null @@ -1,334 +0,0 @@ -# Upgrade existing clusters on managed.hopsworks.ai from version 2.2 or older (Azure) -This guide shows you how to upgrade your existing Hopsworks cluster to a newer version of Hopsworks. - -## Step 1: Make sure your cluster is running - -It is important that your cluster is **Running**. Otherwise you will not be able to upgrade. As soon as a new version is available an upgrade notification will appear: - -

-

- New version notification -
A new Hopsworks version is available
-
-

- -## Step 2: Add upgrade permissions to your user assigned managed identity - -!!! note - You can skip this step if you already have the following permissions in [your user assigned managed identity](../getting_started/#step-4-create-a-managed-identity): - ```json - [ - "Microsoft.Compute/virtualMachines/read", - "Microsoft.Compute/virtualMachines/write", - "Microsoft.Compute/disks/read", - "Microsoft.Compute/disks/write", - "Microsoft.Storage/storageAccounts/listKeys/action" - ] - ``` - Make sure that the scope of these permissions is your resource group. - -We require extra permissions to be added to the user assigned managed identity attached to your cluster to proceed with the upgrade. First to get the name of your user assigned managed identity and the resource group of your cluster, click on the *Details* tab as shown below: - -

-

- Azure details tab -
Get the resource group name (1) and the user assigned managed identity (2) of your cluster
-
-

- -### Step 2.1: Add custom role for upgrade permissions - -Once you get the names of the resource group and user-assigned managed identity, navigate to [Azure portal](https://portal.azure.com/#home), then click on *Resource groups* and then search for your resource group and click on it. Go to the *Access control (IAM)* tab, select *Add*, and click on *Add custom role* - -

-

- Azure add custom role -
Add a custom role for upgrade
-
-

- -Name the custom role and then click on next till you reach the *JSON* tab. - -

-

- Azure add custom role -
Name the custom role for upgrade
-
-

- -Once you reach the *JSON* tab, click on *Edit* to edit the role permissions: - -

-

- Azure add custom role -
Edit the JSON permissions for the custom role for upgrade
-
-

- -Once you have clicked on *Edit*, replace the *permissions* array with the following snippet: - -```json -"permissions": [ - { - "actions": [ - "Microsoft.Compute/virtualMachines/read", - "Microsoft.Compute/virtualMachines/write", - "Microsoft.Compute/disks/read", - "Microsoft.Compute/disks/write", - "Microsoft.Storage/storageAccounts/listKeys/action" - ], - "notActions": [], - "dataActions": [], - "notDataActions": [] - } -] -``` - -Then, click on *Save* to save the updated permissions - -

-

- Azure add custom role -
Save permissions for the custom role for upgrade
-
-

- -Click on *Review and create* and then click on *Create* to create the custom role: - -

-

- Azure add custom role -
Save permissions for the custom role for upgrade
-
-

- -### Step 2.2: Assign the custom role to your user-assigned managed identity - -Navigate back to the your Resource group home page at [Azure portal](https://portal.azure.com/#home), click on *Add* and then click on *Add role assignment* - -

-

- Azure add custom role -
Assign upgrade role to your user assigned managed identity
-
-

- -(1) choose the upgrade role that you have just created in [Step 2.1](#step-21-add-custom-role-for-upgrade-permissions), (2) choose *User Assigned Managed Identity*, (3) search for the user assigned managed identity attached to your cluster and select it. Finally, (4) click on *Save* to save the role assignment. - -

-

- Azure add custom role -
Assign upgrade role to your user assigned managed identity
-
-

- - -!!! warning - [When you assign roles or remove role assignments, it can take up to 30 minutes for changes to take effect.](https://docs.microsoft.com/en-us/azure/role-based-access-control/troubleshooting#role-assignment-changes-are-not-being-detected) - -## Step 3: Add disk read permissions to your role connected to managed.hopsworks.ai - -We require extra permission ("Microsoft.Compute/disks/read") to be added to the role you used to connect to [managed.hopsworks.ai](https://managed.hopsworks.ai), the one that you have created when [connecting your azure account](../getting_started/#step-11-connect-your-azure-account). -If you don't remember the name of the role that you have created when [connecting your azure account](../getting_started/#step-11-connect-your-azure-account), you can navigate to your Resource group, (1) click on *Access Control*, (2) navigate to the *Check Access* tab, (3) search for *hopsworks.ai*, (4) click on it, (5) now you have the name of your custom role used to connect to [managed.hopsworks.ai](https://managed.hopsworks.ai). - -

-

- Get your connected role to hopswork.ai -
Get your role connected to hopswork.ai
-
-

- -To edit the permissions associated with your role, stay on the same *Access Control* page, (1) click on the *Roles* tab, (2) search for your role name (the one you obtained above), (3) click on **...**, (4) click on *Edit*. - - -

-

- Edit your connected role to hopswork.ai -
Edit your role connected to hopswork.ai
-
-

- -You will arrive at the *Update a custom role* page as shown below: - -

-

- Edit your connected role to hopswork.ai 1 -
Edit your role connected to hopswork.ai
-
-

- -Navigate to the *JSON* tab, then click on *Edit*, as shown below: - -

-

- Edit your connected role to hopswork.ai 2 -
Edit your role connected to hopswork.ai
-
-

- -Now, add the missing permission *"Microsoft.Compute/disks/read"* to the list of actions, then click on *Save*, click on *Review + update*, and finally click on *Update*. - -

-

- Edit your connected role to hopswork.ai 3 -
Add missing permissions to your role connected to hopswork.ai
-
-

- -## Step 4: Run the upgrade process - -You need to click on *Upgrade* to start the upgrade process. You will be prompted with the screen shown below to confirm your intention to upgrade: - -!!! note - No need to worry about the following message since this is done already in [Step 2](#step-2-add-upgrade-permissions-to-your-user-assigned-managed-identity) - - **Make sure that your user assigned managed identity (hopsworks-doc-identity) includes the following permissions: - [ "Microsoft.Compute/virtualMachines/read", "Microsoft.Compute/virtualMachines/write", "Microsoft.Compute/disks/read", "Microsoft.Compute/disks/write", "Microsoft.Storage/storageAccounts/listKeys/action" ]** - -

-

- Azure Upgrade Prompt -
Upgrade confirmation
-
-

- -Check the *Yes, upgrade cluster* checkbox to proceed, then the *Upgrade* button will be activated as shown below: - -!!! warning - Currently, we only support upgrade for the head node and you will need to recreate your workers once the upgrade is successfully completed. - - -

-

- Azure Upgrade Prompt -
Upgrade confirmation
-
-

- - -Depending on how big your current cluster is, the upgrade process may take from 1 hour to a few hours until completion. - -!!! note - We don't delete your old cluster until the upgrade process is successfully completed. - - -

-

- Azure Upgrade starting -
Upgrade is running
-
-

- -Once the upgrade is completed, you can confirm that you have the new Hopsworks version by checking the *Details* tab of your cluster as below: - -

-

- Azure Upgrade complete -
Upgrade is complete
-
-

- -## Error handling -There are two categories of errors that you may encounter during an upgrade. First, a permission error due to a missing permission in your role connected to [managed.hopsworks.ai](https://managed.hopsworks.ai), see [Error 1](#error-1-missing-permissions-error). Second, an error during the upgrade process running on your cluster, see [Error 2](#error-2-upgrade-process-error). - -### Error 1: Missing permissions error -If you encounter the following permission error right after starting an upgrade, then you need to make sure that the role you used to connect to [managed.hopsworks.ai](https://managed.hopsworks.ai), the one that you have created when [connecting your azure account](../getting_started/#step-11-connect-your-azure-account), have permissions to read and write disks -*("Microsoft.Compute/disks/read", "Microsoft.Compute/disks/write")*. - -

-

- Azure upgrade permission error -
Missing permission error
-
-

- - -If you don't remember the name of the role that you have created when [connecting your azure account](../getting_started/#step-11-connect-your-azure-account), you can navigate to your Resource group, (1) click on *Access Control*, (2) navigate to the *Check Access* tab, (3) search for *hopsworks.ai*, (4) click on it, (5) now you have the name of your custom role used to connect to [managed.hopsworks.ai](https://managed.hopsworks.ai). - -

-

- Get your connected role to hopswork.ai -
Get your role connected to hopswork.ai
-
-

- -To edit the permissions associated with your role, stay on the same *Access Control* page, (1) click on the *Roles* tab, (2) search for your role name (the one you obtained above), (3) click on **...**, (4) click on *Edit*. - - -

-

- Edit your connected role to hopswork.ai -
Edit your role connected to hopswork.ai
-
-

- -You will arrive at the *Update a custom role* page as shown below: - -

-

- Edit your connected role to hopswork.ai 1 -
Edit your role connected to hopswork.ai
-
-

- -Navigate to the *JSON* tab, then click on *Edit*, as shown below: - -

-

- Edit your connected role to hopswork.ai 2 -
Edit your role connected to hopswork.ai
-
-

- -In our example, we were missing only the read permission ("Microsoft.Compute/disks/read"). First, add the missing permission, then click on *Save*, click on *Review + update*, and finally click on *Update*. - -

-

- Edit your connected role to hopswork.ai 3 -
Add missing permissions to your role connected to hopswork.ai
-
-

- -Once you have updated your role, click on *Retry* to retry the upgrade process. - -

-

- Azure upgrade permission error -
Retry the upgrade process
-
-

- -### Error 2: Upgrade process error - -If an error occurs during the upgrade process, you will have the option to rollback to your old cluster as shown below: - -

-

- Error during upgrade -
Error occurred during upgrade
-
-

- -Click on *Rollback* to recover your old cluster before upgrade. - -

-

- Rollback prompt -
Upgrade rollback confirmation
-
-

- -Check the *Yes, rollback cluster* checkbox to proceed, then the *Rollback* button will be activated as shown below: - -

-

- Rollback prompt -
Upgrade rollback confirmation
-
-

- -Once the rollback is completed, you will be able to continue working as normal with your old cluster. - -!!! note - The old cluster will be **stopped** after the rollback. You have to click on the *Start* button. - diff --git a/docs/setup_installation/azure/upgrade_2.4.md b/docs/setup_installation/azure/upgrade_2.4.md deleted file mode 100644 index e49e9c8df..000000000 --- a/docs/setup_installation/azure/upgrade_2.4.md +++ /dev/null @@ -1,177 +0,0 @@ -# Upgrade existing clusters on managed.hopsworks.ai from version 2.4 or newer (Azure) -This guide shows you how to upgrade your existing Hopsworks cluster to a newer version of Hopsworks. - -## Step 1: Make sure your cluster is running - -It is important that your cluster is **Running**. Otherwise you will not be able to upgrade. As soon as a new version is available an upgrade notification will appear: - -

-

- New version notification -
A new Hopsworks version is available
-
-

- -## Step 2: Add backup permissions to your role connected to managed.hopsworks.ai - -We require extra permission to be added to the role you used to connect to [managed.hopsworks.ai](https://managed.hopsworks.ai), the one that you have created when [connecting your Azure account](../getting_started/#step-11-connect-your-azure-account). These permissions are required to create a snapshot of your cluster before proceeding with the upgrade. - -```json -"actions": [ - "Microsoft.Compute/snapshots/write", - "Microsoft.Compute/snapshots/read", - "Microsoft.Compute/snapshots/delete", - "Microsoft.Compute/disks/beginGetAccess/action", - ] -``` - -If you don't remember the name of the role that you have created when [connecting your Azure account](../getting_started/#step-11-connect-your-azure-account), you can navigate to your Resource group, (1) click on *Access Control*, (2) navigate to the *Check Access* tab, (3) search for *hopsworks.ai*, (4) click on it, (5) now you have the name of your custom role used to connect to [managed.hopsworks.ai](https://managed.hopsworks.ai). - -

-

- Get your connected role to managed.hopsworks.ai -
Get your role connected to managed.hopsworks.ai
-
-

- -To edit the permissions associated with your role, stay on the same *Access Control* page, (1) click on the *Roles* tab, (2) search for your role name (the one you obtained above), (3) click on **...**, (4) click on *Edit*. - - -

-

- Edit your connected role to managed.hopsworks.ai -
Edit your role connected to managed.hopsworks.ai
-
-

- -You will arrive at the *Update a custom role* page as shown below: - -

-

- Edit your connected role to managed.hopsworks.ai 1 -
Edit your role connected to managed.hopsworks.ai
-
-

- -Navigate to the *JSON* tab, then click on *Edit*, as shown below: - -

-

- Edit your connected role to managed.hopsworks.ai 2 -
Edit your role connected to managed.hopsworks.ai
-
-

- -Now, add the following permissions to the list of actions, then click on *Save*, click on *Review + update*, and finally click on *Update*. - -```json - "Microsoft.Compute/snapshots/write", - "Microsoft.Compute/snapshots/read", - "Microsoft.Compute/snapshots/delete", - "Microsoft.Compute/disks/beginGetAccess/action", -``` - -

-

- Edit your connected role to managed.hopsworks.ai 3 -
Add missing permissions to your role connected to managed.hopsworks.ai
-
-

- -## Step 3: Run the upgrade process - -You need to click on *Upgrade* to start the upgrade process. You will be prompted with the screen shown below to confirm your intention to upgrade: - -!!! note - No need to worry about the following message since this is done already in [Step 2](#step-2-add-backup-permissions-to-your-role-connected-to-hopsworksai) - - **Make sure that your custom role which you have connected to [managed.hopsworks.ai](https://managed.hopsworks.ai) has the following permissions: - [ "Microsoft.Compute/snapshots/write", "Microsoft.Compute/snapshots/read", "Microsoft.Compute/snapshots/delete", "Microsoft.Compute/disks/beginGetAccess/action", ]** - -

-

- Azure Upgrade Prompt -
Upgrade confirmation
-
-

- -Check the *Yes, upgrade cluster* checkbox to proceed, then the *Upgrade* button will be activated as shown below: - -!!! warning - Currently, we only support upgrade for the head node and you will need to recreate your workers once the upgrade is successfully completed. - - -

-

- Azure Upgrade Prompt -
Upgrade confirmation
-
-

- - -Depending on how big your current cluster is, the upgrade process may take from 1 hour to a few hours until completion. - -!!! note - We don't delete your old cluster until the upgrade process is successfully completed. - - -

-

- Azure Upgrade starting -
Upgrade is running
-
-

- -Once the upgrade is completed, you can confirm that you have the new Hopsworks version by checking the version number on the *Details* tab of your cluster. - -## Error handling -There are two categories of errors that you may encounter during an upgrade. First, a permission error due to a missing permission in your role connected to [managed.hopsworks.ai](https://managed.hopsworks.ai), see [Error 1](#error-1-missing-permissions-error). Second, an error during the upgrade process running on your cluster, see [Error 2](#error-2-upgrade-process-error). - -### Error 1: Missing permissions error - -If one or more backup permissions are missing, or if the resource is not set correctly, you will be notified with an error message as shown below: - -

-

- Azure upgrade permission error -
Missing permission error
-
-

- - -Update your cross custom role as described in [Step 2](#step-2-add-backup-permissions-to-your-role-connected-to-hopsworksai), then click *Start*. Once the cluster is up and running, you can try running the upgrade again. - -### Error 2: Upgrade process error - -If an error occurs during the upgrade process, you will have the option to rollback to your old cluster as shown below: - -

-

- Error during upgrade -
Error occurred during upgrade
-
-

- -Click on *Rollback* to recover your old cluster before upgrade. - -

-

- Rollback prompt -
Upgrade rollback confirmation
-
-

- -Check the *Yes, rollback cluster* checkbox to proceed, then the *Rollback* button will be activated as shown below: - -

-

- Rollback prompt -
Upgrade rollback confirmation
-
-

- -Once the rollback is completed, you will be able to continue working as normal with your old cluster. - -!!! note - The old cluster will be **stopped** after the rollback. You have to click on the *Start* button. - diff --git a/docs/setup_installation/azure/upgrade_3.0.md b/docs/setup_installation/azure/upgrade_3.0.md deleted file mode 100644 index c59797476..000000000 --- a/docs/setup_installation/azure/upgrade_3.0.md +++ /dev/null @@ -1,127 +0,0 @@ -# Upgrade existing clusters on managed.hopsworks.ai from version 3.0 or newer (Azure) -This guide shows you how to upgrade your existing Hopsworks cluster to a newer version of Hopsworks. - -## Step 1: Make sure your cluster is running - -It is important that your cluster is **Running**. Otherwise you will not be able to upgrade. As soon as a new version is available an upgrade notification will appear: - -

-

- New version notification -
A new Hopsworks version is available
-
-

- -## Step 2: Add backup permissions to your role connected to managed.hopsworks.ai - -We require extra permission to be added to the role you used to connect to [managed.hopsworks.ai](https://managed.hopsworks.ai), the one that you have created when [connecting your Azure account](../getting_started/#step-11-connect-your-azure-account). These permissions are required to create a snapshot of your cluster before proceeding with the upgrade. - -```json -"actions": [ - "Microsoft.Compute/snapshots/write", - "Microsoft.Compute/snapshots/read", - "Microsoft.Compute/snapshots/delete", - "Microsoft.Compute/disks/beginGetAccess/action", - ] -``` - -If you don't remember the name of the role that you have created when [connecting your Azure account](../getting_started/#step-11-connect-your-azure-account), you can navigate to your Resource group, (1) click on *Access Control*, (2) navigate to the *Check Access* tab, (3) search for *hopsworks.ai*, (4) click on it, (5) now you have the name of your custom role used to connect to [managed.hopsworks.ai](https://managed.hopsworks.ai). - -

-

- Get your connected role to managed.hopsworks.ai -
Get your role connected to managed.hopsworks.ai
-
-

- -To edit the permissions associated with your role, stay on the same *Access Control* page, (1) click on the *Roles* tab, (2) search for your role name (the one you obtained above), (3) click on **...**, (4) click on *Edit*. - - -

-

- Edit your connected role to managed.hopsworks.ai -
Edit your role connected to managed.hopsworks.ai
-
-

- -You will arrive at the *Update a custom role* page as shown below: - -

-

- Edit your connected role to managed.hopsworks.ai 1 -
Edit your role connected to managed.hopsworks.ai
-
-

- -Navigate to the *JSON* tab, then click on *Edit*, as shown below: - -

-

- Edit your connected role to managed.hopsworks.ai 2 -
Edit your role connected to managed.hopsworks.ai
-
-

- -Now, add the following permissions to the list of actions, then click on *Save*, click on *Review + update*, and finally click on *Update*. - -```json - "Microsoft.Compute/snapshots/write", - "Microsoft.Compute/snapshots/read", - "Microsoft.Compute/snapshots/delete", - "Microsoft.Compute/disks/beginGetAccess/action", -``` - -

-

- Edit your connected role to managed.hopsworks.ai 3 -
Add missing permissions to your role connected to managed.hopsworks.ai
-
-

- -## Step 3: Create an ACR Container Registry -We have enforced using managed docker registry (ACR) starting from Hopsworks version 3.1.0, so you need to create an ACR container registry and configure your managed identity to allow access to the container registry. First, get the name of the managed identity used in your cluster by clicking on the *Details* tab and check the name shown infront of *Managed Identity*. Then, follow [this guide](../getting_started/#step-3-create-an-acr-container-registry) to create and configure an ACR container registry. - -## Step 4: Run the upgrade process - -You need to click on *Upgrade* to start the upgrade process. You will be prompted with the screen shown below to confirm your intention to upgrade: - -!!! note - No need to worry about the steps shown below if you have already completed [Step 2](#step-2-add-backup-permissions-to-your-role-connected-to-hopsworksai) and [Step 3](#step-3-create-an-acr-container-registry) - -

-

- Azure Upgrade Prompt -
Upgrade confirmation
-
-

- -Enter the name of your ACR container registry that you have created in [Step 3](#step-3-create-an-acr-container-registry) and check the *Yes, upgrade cluster* checkbox to proceed, then the *Upgrade* button will be activated as shown below: - -!!! warning - Currently, we only support upgrade for the head node and you will need to recreate your workers once the upgrade is successfully completed. - - -

-

- Azure Upgrade Prompt -
Upgrade confirmation
-
-

- - -Depending on how big your current cluster is, the upgrade process may take from 1 hour to a few hours until completion. - -!!! note - We don't delete your old cluster until the upgrade process is successfully completed. - - -

-

- Azure Upgrade starting -
Upgrade is running
-
-

- -Once the upgrade is completed, you can confirm that you have the new Hopsworks version by checking the version number on the *Details* tab of your cluster. - -For more details about error handling check [this guide](../upgrade_2.4/#error-handling) \ No newline at end of file From 77d235214d283aa35c6c95e807cfbcc0cfe9728e Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Fri, 4 Oct 2024 19:42:45 +0100 Subject: [PATCH 07/24] Change AWS instructions to use kubernetes. - Remove all references to managed.hopsworks.ai --- .../aws/cluster_creation.md | 277 ---------- .../aws/custom_domain_name.md | 136 ----- .../aws/eks_ecr_integration.md | 310 ------------ .../setup_installation/aws/getting_started.md | 475 +++++++++++------- .../aws/restrictive_permissions.md | 328 ------------ .../setup_installation/aws/troubleshooting.md | 46 -- docs/setup_installation/aws/upgrade.md | 168 ------- docs/setup_installation/aws/upgrade_2.4.md | 164 ------ docs/setup_installation/aws/upgrade_3.0.md | 176 ------- 9 files changed, 298 insertions(+), 1782 deletions(-) delete mode 100644 docs/setup_installation/aws/cluster_creation.md delete mode 100644 docs/setup_installation/aws/custom_domain_name.md delete mode 100644 docs/setup_installation/aws/eks_ecr_integration.md delete mode 100644 docs/setup_installation/aws/restrictive_permissions.md delete mode 100644 docs/setup_installation/aws/troubleshooting.md delete mode 100644 docs/setup_installation/aws/upgrade.md delete mode 100644 docs/setup_installation/aws/upgrade_2.4.md delete mode 100644 docs/setup_installation/aws/upgrade_3.0.md diff --git a/docs/setup_installation/aws/cluster_creation.md b/docs/setup_installation/aws/cluster_creation.md deleted file mode 100644 index afb190005..000000000 --- a/docs/setup_installation/aws/cluster_creation.md +++ /dev/null @@ -1,277 +0,0 @@ -# Getting started with managed.hopsworks.ai (AWS) -This guide goes into detail for each of the steps of the cluster creation in [managed.hopsworks.ai](https://managed.hopsworks.ai) - -### Step 1 starting to create a cluster - -In [managed.hopsworks.ai](https://managed.hopsworks.ai), select *Create cluster*: - -

-

- Create a Hopsworks cluster -
Create a Hopsworks cluster
-
-

- -### Step 2 setting the General information - -

-

- Create a Hopsworks cluster, general Information -
Create a Hopsworks cluster, general information
-
-

- -Select the *Region* in which you want your cluster to run (1), name your cluster (2). - -Select the *Instance type* (3) and *Local storage* (4) size for the cluster *Head node*. - -#### Enable EBS encryption - -Select the checkbox (5) to enable encryption of EBS drives and shapshots. After enabling, the KMS key to be used for encryption can be specified by its alias, ID or ARN. Leaving the KMS key unspecified results in the EC2 default encryption key being used. - -#### S3 bucket configuration -Enter the name of the *S3 bucket* (6) you want the cluster to store its data in. - -!!! note - The S3 bucket you are using must be empty. - -Premium users have the option to use **encrypted S3 buckets**. To configure -an encrypted bucket click on the *Advanced* tab and select the -appropriate encryption type. - -!!! note - Encryption must have been **already** enabled for the bucket - -We support the following encryption schemes: - -1. SSE-S3 -2. SSE-KMS - 1. S3 managed key - 2. User managed key - -Users can also select an AWS *canned* **ACLs** for the objects: - -1. `bucket-owner-full-control` - -#### ECR AWS Account Id -Enter the *AWS account Id* (7) to setup ECR repositories for the cluster. It is set by default to the AWS account id where you set the cross account role. - -On the main page, you can also choose to aggregate logs in CloudWatch and to opt out of [managed.hopsworks.ai](https://managed.hopsworks.ai) log collection. The first one is to aggregate the logs of services running in your cluster in CloudWatch service in your configured AWS account. This can be useful if you want to understand what is happening on your cluster, without having to ssh into the instances. The second one is for [managed.hopsworks.ai](https://managed.hopsworks.ai) to collect logs about the services running in your cluster. These logs will help us improve our system and provide support. If you choose to opt-out from the log collection and need support you will have to provide us the log by yourself, which will slow down the support process. - -### Step 3 workers configuration - -In this step, you configure the workers. There are two possible setups static or autoscaling. In the static setup, the cluster has a fixed number of workers that you decide. You can then add and remove workers manually, for more details: [the documentation](../common/adding_removing_workers.md). In the autoscaling setup, you configure conditions to add and remove workers and the cluster will automatically add and remove workers depending on the demand. - -#### Static workers configuration -You can set the static configuration by selecting *Disabled* in the first drop-down (1). Then you select the number of workers you want to start the cluster with (2). And, select the *Instance type* (3) and *Local storage* size (4) for the *worker nodes*. - -

-

- Create a Hopsworks cluster, static workers configuration -
Create a Hopsworks cluster, static workers configuration
-
-

- -#### Autoscaling workers configuration -You can set the autoscaling configuration by selecting enabled in the first drop-down (1). You can configure: - -1. The instance type you want to use. -2. The size of the instances' disk. -3. The minimum number of workers. -4. The maximum number of workers. -5. The targeted number of standby workers. Setting some resources in standby ensures that there are always some free resources in your cluster. This ensures that requests for new resources are fulfilled promptly. You configure the standby by setting the amount of workers you want to be in standby. For example, if you set a value of *0.5* the system will start a new worker every time the aggregated free cluster resources drop below 50% of a worker's resources. If you set this value to 0 new workers will only be started when a job or notebook request the resources. -6. The time to wait before removing unused resources. One often starts a new computation shortly after finishing the previous one. To avoid having to wait for workers to stop and start between each computation it is recommended to wait before shutting down workers. Here you set the amount of time in seconds resources need to be unused before they get removed from the system. - -!!! note - The standby will not be taken into account if you set the minimum number of workers to 0 and no resources are used in the cluster. This ensures that the number of nodes can fall to 0 when no resources are used. The standby will start to take effect as soon as you start using resources. - -

-

- Create a Hopsworks cluster, autoscale workers configuration -
Create a Hopsworks cluster, autoscale workers configuration
-
-

- -### Step 4 select an SSH key - -When deploying clusters, [managed.hopsworks.ai](https://managed.hopsworks.ai) installs an ssh key on the cluster's instances so that you can access them if necessary. -Select the *SSH key* that you want to use to access cluster instances. For more detail on how to add a shh key in AWS refer to [Create an ssh key](getting_started.md#step-4-create-an-ssh-key) - -

-

- Choose SSH key -
Choose SSH key
-
-

- -### Step 5 select the Instance Profile - -To let the cluster instances access the S3 bucket we need to attach an *instance profile* to the virtual machines. In this step, you choose which profile to use. This profile needs to have access right to the *S3 bucket* you selected in [Step 2](#step-2-setting-the-general-information). For more details on how to create the instance profile and give it access to the S3 bucket refer to [Creating an instance profile and giving it access to the bucket](getting_started.md#step-3-creating-instance-profile) - -If you want to use [role chaining](../../admin/roleChaining.md), it is recommended to use a different *instance profile* for the head node and the other cluster's nodes. You do this by clicking the *Advanced configuration* check box and selecting instance profile for the head node. This profile should have the same permission as the profile you selected above, plus the extra permissions for the role chaining. - -

-

- Choose the instance profile -
Choose the instance profile
-
-

- -### Step 6 set the backup retention policy - -To backup the S3 bucket data when taking a cluster backup we need to set a retention policy for S3. In this step, you choose the retention period in days. You can deactivate the retention policy by setting this value to 0 but this will block you from taking any backup of your cluster. - -

-

- Choose the backup retention policy -
Choose the backup retention policy
-
-

- -### Step 7 Managed Containers -Hopsworks can integrate with Amazon Elastic Kubernetes Service (EKS) to launch Python jobs, Jupyter servers, and ML model servings on top of Amazon EKS. For more detail on how to set up this integration refer to [Integration with Amazon EKS](eks_ecr_integration.md). -

-

- Add EKS cluster name -
Add EKS cluster name
-
-

- -### Step 8 VPC selection -In this step, you can select the VPC which will be used by the Hopsworks cluster. You can either select an existing VPC or let [managed.hopsworks.ai](https://managed.hopsworks.ai) create one for you. If you decide to let [managed.hopsworks.ai](https://managed.hopsworks.ai) create the VPC for you, you can choose the CIDR block for this virtual network. -Refer to [Create a VPC](restrictive_permissions.md#step-1-create-a-vpc) for more details on how to create your own VPC in AWS. - -

-

- Choose VPC -
Choose a VPC
-
-

- -!!! note - If the VPC uses a custom domain read our [guide](./custom_domain_name.md) on how to set this up - -### Step 9 Availability Zone selection -If you selected an existing VPC in the previous step, this step lets you select which availability zone of this VPC to use. - -If you did not select an existing virtual network in the previous step [managed.hopsworks.ai](https://managed.hopsworks.ai) will create an availability zone for you. You can choose the CIDR block this subnet will use. - -

-

- Choose subnet -
Choose an availability zone
-
-

- -### Step 10 Security group selection -If you selected an existing VPC in the previous step, this step lets you select which security group to use. - -!!! note - [Managed.hopsworks.ai](https://managed.hopsworks.ai) require some rules for inbound and outbound traffic in your security group, for more details refer to [inbound traffic rules](restrictive_permissions.md#inbound-traffic) and [outbound traffic rules](restrictive_permissions.md#outbound-traffic). - -!!! note - [Managed.hopsworks.ai](https://managed.hopsworks.ai) attaches a public ip to your cluster by default. However, you can disable this behavior by unchecking the *Attach Public IP* checkbox. - -

-

- Choose security group -
Choose security group
-
-

- -#### Limiting outbound traffic to managed.hopsworks.ai - -Clusters created on [managed.hopsworks.ai](https://managed.hopsworks.ai) need to be able to send http requests to api.hopsworks.ai. If you have strict regulation regarding outbound traffic, you can enable the *Use static IPs to communicate with [managed.hopsworks.ai](https://managed.hopsworks.ai)* checkbox to get a list of IPs to be allowed as shown below: - -

-

- Enable static IPs -
Enable static IPs
-
-

- -### Step 11 User management selection -In this step, you can choose which user management system to use. You have four choices: - -* *Managed*: [Managed.hopsworks.ai](https://managed.hopsworks.ai) automatically adds and removes users from the Hopsworks cluster when you add and remove users from your organization (more details [here](../common/user_management.md)). -* *OAuth2*: integrate the cluster with your organization's OAuth2 identity provider. See [Use OAuth2 for user management](../common/sso/oauth.md) for more detail. -* *LDAP*: integrate the cluster with your organization's LDAP/ActiveDirectory server. See [Use LDAP for user management](../common/sso/ldap.md) for more detail. -* *Disabled*: let you manage users manually from within Hopsworks. - -

-

- Choose user management type -
Choose user management type
-
-

- -### Step 12 Managed RonDB -Hopsworks uses [RonDB](https://www.rondb.com/) as a database engine for its online Feature Store. By default database will run on its -own VM. Premium users can scale-out database services to multiple VMs -to handle increased workload. - -For details on how to configure RonDB check our guide [here](../common/rondb.md). - -

-

- Configure RonDB -
Configure RonDB
-
-

- -If you need to deploy a RonDB cluster instead of a single node please contact [us](mailto:sales@logicalclocks.com). - -### Step 13 add tags to your instances. -In this step, you can define tags that will be added to the cluster virtual machines. - -

-

- Add tags -
Add tags
-
-

- -### Step 14 add an init script to your instances. -In this step, you can enter an initialization script that will be run at startup on every instance. - -You can select whether this script will run before or after the VM -configuration. **Be cautious** if you select to run it before the VM -configuration as this might affect Cluster creation. - -!!! note - this init script must be a bash script starting with *#!/usr/bin/env bash* - -

-

- Add initialization script -
Add initialization script
-
-

- -### Step 15 Review and create -Review all information and select *Create*: - -

-

- Review cluster information -
Review cluster information
-
-

- -The cluster will start. This will take a few minutes: - -

-

- Booting Hopsworks cluster -
Booting Hopsworks cluster
-
-

- -As soon as the cluster has started, you will be able to log in to your new Hopsworks cluster. You will also be able to stop, restart, or terminate the cluster. - -

-

- Running Hopsworks cluster -
Running Hopsworks cluster
-
-

diff --git a/docs/setup_installation/aws/custom_domain_name.md b/docs/setup_installation/aws/custom_domain_name.md deleted file mode 100644 index d57de0082..000000000 --- a/docs/setup_installation/aws/custom_domain_name.md +++ /dev/null @@ -1,136 +0,0 @@ -# Deploy in a VPC with a custom domain name - -Some organizations follow network patterns which impose a specific domain -name for Instances. In that case, the instance's hostname instead of `ip-10-0-0-175.us-east-2.compute.internal` would be `ip-10-0-0-175.bar.foo` - -The control plane at [managed.hopsworks.ai](https://managed.hopsworks.ai) needs to be -aware of such case in order to properly initialize the Cluster. - -!!! note - This feature is enabled **only** upon request. If you want this feature to be enable for your account please contact [sales](mailto:sales@logicalclocks.com) - -There are multiple ways to use custom domain names in your organization -which are beyond the scope of this guide. We assume your cloud/network -team has already setup the infrastructure. - -If you are using a resolver such as [Amazon Route 53](https://aws.amazon.com/route53/), it is advised to update the record sets automatically. See our -guide below for more information. - -## Set cluster domain name - -If this feature is enabled for your account, then in the [VPC selection](../cluster_creation/#step-8-vpc-selection) -step you will have the option to specify the custom domain name as -shown in the figure below. - -

-

- Custom domain name -
VPC with custom domain name
-
-

- -In this case, the hostname of the Instance would be `INSTANCE_ID.dev.hopsworks.domain` - -Hostnames **must** be resolvable by all Virtual Machines in the cluster. For -that reason we suggest, if possible, to automatically register the hostnames -with your DNS. - -In the following section we present an example of automatic -name registration in Amazon Route 53 - -## Auto registration with Amazon Route 53 - -It is quite common for organizations in AWS to use Route 53 for DNS or for hosted zones. -You can configure a cluster in [managed.hopsworks.ai](https://managed.hopsworks.ai) to execute some custom -initialization script **before** any other action. This script will -be executed on all nodes of the cluster. - -Since the hostname of the VM is in the form of `INSTANCE_ID.DOMAIN_NAME` -it is easy to automate the zone update. The script below creates an A record -in a configured Route 53 hosted zone. - -!!! warning - If you want the VM to register itself with Route 53 you **must** amend the - [Instance Profile](../cluster_creation) with the following permissions - -```json -{ - "Sid": "Route53RecordSet", - "Effect": "Allow", - "Action": [ - "route53:ChangeResourceRecordSets" - ], - "Resource": "arn:aws:route53:::hostedzone/YOUR_HOSTED_ZONE_ID" -}, -{ - "Sid": "Route53GetChange", - "Effect": "Allow", - "Action": [ - "route53:GetChange" - ], - "Resource": "arn:aws:route53:::change/*" -} -``` - -The following script will get the instance ID from the EC2 metadata server -and add an A record to the hosted zone in Route53. **Update** the -`YOUR_HOSTED_ZONE_ID` and `YOUR_CUSTOM_DOMAIN_NAME` to match yours. - -```bash -#!/usr/bin/env bash -set -e - -HOSTED_ZONE_ID=YOUR_HOSTED_ZONE_ID -ZONE=YOUR_CUSTOM_DOMAIN_NAME - -record_set_file=/tmp/record_set.json - -instance_id=$(curl --silent http://169.254.169.254/2016-09-02/meta-data/instance-id) - -domain_name="${instance_id}.${ZONE}" - -local_ip=$(curl --silent http://169.254.169.254/2016-09-02/meta-data/local-ipv4) - -cat << EOC | tee $record_set_file -{ - "Changes": [ - { - "Action": "UPSERT", - "ResourceRecordSet": { - "Name": "${domain_name}", - "Type": "A", - "TTL": 60, - "ResourceRecords": [ - { - "Value": "${local_ip}" - } - ] - } - } - ] -} -EOC - -echo "Adding A record ${domain_name} -> ${local_ip} to Hosted Zone ${HOSTED_ZONE_ID}" -change_resource_id=$(aws route53 change-resource-record-sets --hosted-zone-id ${HOSTED_ZONE_ID} --change-batch file://${record_set_file} | jq -r '.ChangeInfo.Id') - -echo "Change resource ID: ${change_resource_id}" -aws route53 wait resource-record-sets-changed --id ${change_resource_id} -echo "Added resource record set" - -rm -f ${record_set_file} -``` - -## Set VM initialization script -As a final step you need to configure the Cluster to use the script above -during VM creation with the [user init script](../cluster_creation/#step-14-add-an-init-script-to-your-instances) option. - -Paste the script to the text box and **make sure** you select this script -to be executed before anything else on the VM. - -

-

- Automatic domain name registration -
Automatic domain name registration with Route53
-
-

diff --git a/docs/setup_installation/aws/eks_ecr_integration.md b/docs/setup_installation/aws/eks_ecr_integration.md deleted file mode 100644 index e8b7e7040..000000000 --- a/docs/setup_installation/aws/eks_ecr_integration.md +++ /dev/null @@ -1,310 +0,0 @@ -# Integration with Amazon EKS - -This guide shows how to create a cluster in [managed.hopsworks.ai](https://managed.hopsworks.ai) with integrated support for Amazon Elastic Kubernetes Service (EKS). So that Hopsworks can launch Python jobs, Jupyter servers, and ML model servings on top of Amazon EKS. - -!!! warning - In the current version, we don't support sharing EKS clusters between Hopsworks clusters. That is, an EKS cluster can be only used by one Hopsworks cluster. - -## Step 1: Create an EKS cluster on AWS - -If you have an existing EKS cluster, skip this step and go directly to Step 2. - -Amazon provides two getting started guides using [AWS management console](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html) or [`eksctl`](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) to help you create an EKS cluster. -The easiest way is to use the eksctl command. - -### Step 1.1: Installing eksctl, aws, and kubectl - -Follow the prerequisites section in [getting started with `eksctl`](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) to install aws, eksctl, and kubectl. - -### Step 1.2: Create an EKS cluster using eksctl - -You can create a sample EKS cluster with the name *my-eks-cluster* using Kubernetes version *1.20* with *2* managed nodes in the *us-east-2* region by running the following command. For more details on the eksctl usage, check the [`eksctl` documentation](https://eksctl.io/usage/creating-and-managing-clusters/). - -```bash -eksctl create cluster --name my-eks-cluster --version 1.20 --region us-east-2 --nodegroup-name my-nodes --nodes 2 --managed -``` - -Output: - -```bash -[ℹ] eksctl version 0.26.0 -[ℹ] using region us-east-2 -[ℹ] setting availability zones to [us-east-2b us-east-2a us-east-2c] -[ℹ] subnets for us-east-2b - public:192.168.0.0/19 private:192.168.96.0/19 -[ℹ] subnets for us-east-2a - public:192.168.32.0/19 private:192.168.128.0/19 -[ℹ] subnets for us-east-2c - public:192.168.64.0/19 private:192.168.160.0/19 -[ℹ] using Kubernetes version 1.20 -[ℹ] creating EKS cluster "my-eks-cluster" in "us-east-2" region with managed nodes -[ℹ] will create 2 separate CloudFormation stacks for cluster itself and the initial managed nodegroup -[ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-2 --cluster=my-eks-cluster' -[ℹ] CloudWatch logging will not be enabled for cluster "my-eks-cluster" in "us-east-2" -[ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=us-east-2 --cluster=my-eks-cluster' -[ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "my-eks-cluster" in "us-east-2" -[ℹ] 2 sequential tasks: { create cluster control plane "my-eks-cluster", 2 sequential sub-tasks: { no tasks, create managed nodegroup "my-nodes" } } -[ℹ] building cluster stack "eksctl-my-eks-cluster-cluster" -[ℹ] deploying stack "eksctl-my-eks-cluster-cluster" -[ℹ] building managed nodegroup stack "eksctl-my-eks-cluster-nodegroup-my-nodes" -[ℹ] deploying stack "eksctl-my-eks-cluster-nodegroup-my-nodes" -[ℹ] waiting for the control plane availability... -[✔] saved kubeconfig as "/Users/maism/.kube/config" -[ℹ] no tasks -[✔] all EKS cluster resources for "my-eks-cluster" have been created -[ℹ] nodegroup "my-nodes" has 2 node(s) -[ℹ] node "ip-192-168-21-142.us-east-2.compute.internal" is ready -[ℹ] node "ip-192-168-62-117.us-east-2.compute.internal" is ready -[ℹ] waiting for at least 2 node(s) to become ready in "my-nodes" -[ℹ] nodegroup "my-nodes" has 2 node(s) -[ℹ] node "ip-192-168-21-142.us-east-2.compute.internal" is ready -[ℹ] node "ip-192-168-62-117.us-east-2.compute.internal" is ready -[ℹ] kubectl command should work with "/Users/maism/.kube/config", try 'kubectl get nodes' -[✔] EKS cluster "my-eks-cluster" in "us-east-2" region is ready -``` - -Once the cluster is created, eksctl will write the cluster credentials for the newly created cluster to your local kubeconfig file (~/.kube/config). -To test the cluster credentials, you can run the following command to get the list of nodes in the cluster. - -```bash -kubectl get nodes -``` - -Output: - -```bash -NAME STATUS ROLES AGE VERSION -ip-192-168-21-142.us-east-2.compute.internal Ready 2m35s v1.17.9-eks-4c6976 -ip-192-168-62-117.us-east-2.compute.internal Ready 2m34s v1.17.9-eks-4c6976 -``` - -## Step 2: Create an instance profile role on AWS - -You need to add permission to [the instance profile you use for instances deployed by managed.hopsworks.ai](getting_started.md#step-3-creating-instance-profile) to give them access to EKS. -Go to the [*IAM service*](https://console.aws.amazon.com/iam) in the *AWS management console*, click *Roles*, search for your role, and click on it. Click on *Add inline policy*. Go to the *JSON* tab and replace the existing JSON permissions with the JSON permissions below.. - -```json -{ - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "AllowDescirbeEKS", - "Effect": "Allow", - "Action": "eks:DescribeCluster", - "Resource": "arn:aws:eks:*:*:cluster/*" - } - ] -} -``` - -Click on *Review policy*. Give a name to your policy and click on *Create policy*. - -Copy the *Role ARN* of your profile (not to be confused with the *Instance Profile ARNs* two lines below). - -

-

- Copy the Role ARN -
Copy the *Role ARN*
-
-

- -## Step 3: Allow your role to use your EKS cluster - -You need to configure your EKS cluster to accept connections from the role you created above. This is done by using the following kubectl command. For more details, check [Managing users or IAM roles for your cluster](https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html). - -!!! note - The kubectl edit command uses the *vi* editor by default, however, you can [override this behavior by setting *KUBE_EDITOR* to your preferred editor](https://kubernetes.io/docs/reference/kubectl/cheatsheet/#editing-resources). - -```bash -KUBE_EDITOR="vi" kubectl edit configmap aws-auth -n kube-system -``` - -Output: - -```bash -# Please edit the object below. Lines beginning with a '#' will be ignored, -# and an empty file will abort the edit. If an error occurs while saving this file will be -# reopened with the relevant failures. -# -apiVersion: v1 -data: -mapRoles: | - - groups: - - system:bootstrappers - - system:nodes - rolearn: arn:aws:iam::xxxxxxxxxxxx:role/eksctl-my-eks-cluster-nodegroup-m-NodeInstanceRole-FQ7L0HQI4NCC - username: system:node:{{EC2PrivateDNSName}} -kind: ConfigMap -metadata: -creationTimestamp: "2020-08-24T07:42:31Z" -name: aws-auth -namespace: kube-system -resourceVersion: "770" -selfLink: /api/v1/namespaces/kube-system/configmaps/aws-auth -uid: c794b2d8-9f10-443d-9072-c65d0f2eb552 -``` - -Follow the example below (lines 13-16) to add your role to *mapRoles* and assign the *system:masters* group to your role. Make sure to replace 'YOUR ROLE RoleARN' with the *Role ARN* you copied in the previous step before saving. - -!!! warning - Make sure to keep the same formatting as in the example below. The configuration format is sensitive to indentation and copy-pasting does not always keep the correct indentation. - - -```bash hl_lines="13 14 15 16" -# Please edit the object below. Lines beginning with a '#' will be ignored, -# and an empty file will abort the edit. If an error occurs while saving this file will be -# reopened with the relevant failures. -# -apiVersion: v1 -data: - mapRoles: | - - groups: - - system:bootstrappers - - system:nodes - rolearn: arn:aws:iam::xxxxxxxxxxxx:role/eksctl-my-eks-cluster-nodegroup-m-NodeInstanceRole-FQ7L0HQI4NCC - username: system:node:{{EC2PrivateDNSName}} - - groups: - - system:masters - rolearn: - username: hopsworks -kind: ConfigMap -metadata: -creationTimestamp: "2020-08-24T07:42:31Z" -name: aws-auth -namespace: kube-system -resourceVersion: "770" -selfLink: /api/v1/namespaces/kube-system/configmaps/aws-auth -uid: c794b2d8-9f10-443d-9072-c65d0f2eb552 -``` - -Once you are done with editing the configmap, save it and exit the editor. The output should be: - -```bash -configmap/aws-auth edited -``` - -## Step 4: Setup network connectivity - -For Hopsworks to be able to start containers in the EKS cluster and for these containers to be able to use Hopsworks we need to establish network connectivity between Hopsworks and EKS. For this, we have two solutions. The first option (*A*) is to run Hopsworks and EKS in the same virtual network and security group. The second option (*B*) is to pair the EKS and Hopsworks virtual networks. If you choose this option, make sure to create the peering before starting the Hopsworks cluster as it connects to EKS at startup. - -### Option *A*: run Hopsworks and EKS in the same virtual network. -Running EKS and Hopsworks in the same security group is the simplest of the two solutions when it comes to setting up the system. All you need to do is to open the ports needed by Hopsworks in the security group created by the EKS cluster. Then you can just select this security group during the Hopsworks cluster creation. We will now see how to open the ports for HTTP (80) and HTTPS (443) to allow Hopsworks to run with all its functionalities. - -!!! Note - It is possible not to open ports 80 and 443 at the cost of some features. See [Limiting permissions](restrictive_permissions.md#step-1-create-a-vpc) for more details. - -First, you need to get the name of the security group of your EKS cluster by using the following eksctl command. Notice that you need to replace *my-eks-cluster* with the name of your cluster. - -```bash -eksctl utils describe-stacks --region=us-east-2 --cluster=my-eks-cluster | grep 'OutputKey: "ClusterSecurityGroupId"' -a1 -``` - -Check the output for *OutputValue*, which will be the id of your EKS security group. - -```bash -ExportName: "eksctl-my-eks-cluster-cluster::ClusterSecurityGroupId", -OutputKey: "ClusterSecurityGroupId", -OutputValue: "YOUR_EKS_SECURITY_GROUP_ID" -``` - -Go to the [*Security Groups* section of *EC2* in the *AWS management console*](https://us-east-2.console.aws.amazon.com/ec2/v2/home?#SecurityGroups:) and search for your security group using the id obtained above. Note the *VPC ID*, you will need it when creating the Hopsworks cluster. Then, click on it then go to the *Inbound rules* tab and click on *Edit inbound rules*. You should now see the following screen. - -

-

- Edit inbound rules -
Edit inbound rules
-
-

- -Add two rules for HTTP and HTTPS as follows: - -

-

- Edit inbound rules -
Edit inbound rules
-
-

- -Click *Save rules* to save the updated rules to the security group. - -### Option *B*: create a pairing between Hopsworks and EKS -To establish virtual peering between the Kubernetes cluster and Hopsworks, you need to select or create a virtual network for Hopsworks. You can create the virtual network by following the [AWS documentation steps to create the virtual private network](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-getting-started.html#getting-started-create-vpc). Make sure to configure your subnet to use an address space that does not overlap with the address space in the Kubernetes network. - -You then need to select or create a security group for the Hopsworks VPC. You can create the security group by following the steps in the [AWS documentation](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html#creating-security-groups). Remember to open the port for HTTP (80) and HTTPS (443) to allow Hopsworks to run with all its functionalities. - -!!! Note - It is possible not to open ports 80 and 443 at the cost of some features. See [Limiting permissions](restrictive_permissions.md#step-1-create-a-vpc) for more details. - -Once the Hopsworks VPC and security group are created you need to create a peering between the Hopsworks VPC and the EKS VPC. For this follow the AWS documentation [here](https://docs.aws.amazon.com/vpc/latest/peering/create-vpc-peering-connection.html). - -Finally, you need to edit the security groups for the EKS cluster and for Hopsworks to allow full communication between both VPC. This can be done following the instruction in the [AWS documentation](https://docs.aws.amazon.com/vpc/latest/peering/vpc-peering-security-groups.html). - -You can get the name of the security group of your EKS cluster by using the following eksctl command. Notice that you need to replace *my-eks-cluster* with the name of your cluster. - -```bash -eksctl utils describe-stacks --region=us-east-2 --cluster=my-eks-cluster | grep 'OutputKey: "ClusterSecurityGroupId"' -a1 -``` - -Check the output for *OutputValue*, which will be the id of your EKS security group. - -```bash -ExportName: "eksctl-my-eks-cluster-cluster::ClusterSecurityGroupId", -OutputKey: "ClusterSecurityGroupId", -OutputValue: "YOUR_EKS_SECURITY_GROUP_ID" -``` - -## Step 6: Create a Hopsworks cluster with EKS support - -In [managed.hopsworks.ai](https://managed.hopsworks.ai), follow the same instructions as in [the cluster creation guide](cluster_creation.md) except when setting *Managed Containers*, *VPC*, *Subnet*, and *Security group*. - -Choose **Enabled** to enable the use of Amazon EKS: - -

-

- Choose Enabled -
Choose Enabled
-
-

- -Add your EKS cluster name, then click Next: - -

-

- Add EKS cluster name -
Add EKS cluster name
-
-

- -If you followed option *A* when setting up the network Choose the VPC of your EKS cluster. Its name should have the form *eksctl-YOUR-CLUSTER-NAME-cluster* (click on the refresh button if the VPC is not in the list). - -If you followed option *B* choose the VPC you created during [Step 4](#step-4-setup-network-connectivity). - -

-

- Choose VPC -
Choose VPC
-
-

- -Choose any of the subnets in the VPC, then click Next. - -!!! note - Avoid private subnets if you want to enjoy [all the Hopsworks features](restrictive_permissions.md). - -

-

- Choose Subnet -
Choose Subnet
-
-

- -Choose the security group that you have updated/created in [Step 4](#step-4-setup-network-connectivity), then click Next: - -!!! note - If you followed option *A* select the Security Group with the same id as in [Step 4](#step-4-setup-network-connectivity) and NOT the ones containing ControlPlaneSecurity or ClusterSharedNode in their name. - -

-

- Choose Security Group -
Choose Security Group
-
-

- -Click *Review and submit*, then Create. Once the cluster is created, Hopsworks will use EKS to launch Python jobs, Jupyter servers, and ML model servings. diff --git a/docs/setup_installation/aws/getting_started.md b/docs/setup_installation/aws/getting_started.md index 36722406e..0b6a69f2c 100644 --- a/docs/setup_installation/aws/getting_started.md +++ b/docs/setup_installation/aws/getting_started.md @@ -1,265 +1,386 @@ -# Getting started with managed.hopsworks.ai (AWS) +# AWS - Getting started -[Managed.hopsworks.ai](https://managed.hopsworks.ai) is our managed platform for running Hopsworks and the Feature Store -in the cloud. It integrates seamlessly with third-party platforms such as Databricks, -SageMaker and KubeFlow. This guide shows how to set up [managed.hopsworks.ai](https://managed.hopsworks.ai) with your organization's AWS account. +Kubernetes and Helm are used to install & run Hopsworks and the Feature Store +in the cloud. They both integrate seamlessly with third-party platforms such as Databricks, +SageMaker and KubeFlow. This guide shows how to set up the Hopsworks platform in your organization's AWS account. ## Prerequisites -To run the commands in this guide, you must have the AWS CLI installed and configured and your user must have at least the set of permission listed below. See the [Getting started guide](https://docs.aws.amazon.com/cli/v1/userguide/cli-chap-install.html) in the AWS CLI User Guide for more information about installing and configuring the AWS CLI. + +To follow the instruction on this page you will need the following: + +- Kubernetes Version: Hopsworks can be deployed on AKS clusters running Kubernetes >= 1.27.0. +- [aws-cli](https://aws.amazon.com/cli/) to provision the AWS resources +- [eksctl](https://eksctl.io/) to interact with the AWS APIs and provision the EKS cluster +- [helm](https://helm.sh/) to deploy Hopsworks + +## ECR Registry + +Hopsworks allows users to customize the images used by Python jobs, Jupyter Notebooks and (Py)Spark applications running in their projects. The images are stored in ECR. Hopsworks needs access to an ECR repository to push the project images. + +## Permissions + +By default, the deployment requires cluster admin level access to be able to create a set of ClusterRoles, ServiceAccounts and ClusterRoleBindings. If you don’t have cluster admin level access, you can ask your administrator to provision the necessary ClusterRoles, ServiceAccounts and ClusterRoleBindings as described in the section below. + +A namespace is required to deploy the Hopsworks stack. If you don’t have permissions to create a namespace you should ask your K8s administrator to provision one for you. + + +## EKS Deployment + +The following steps describe how to deploy an EKS cluster and related resources so that it’s compatible with Hopsworks. + + +## Step 1: AWS EKS Setup + +### Step 1.1: Create S3 Bucket + +```bash +aws s3 mb s3://BUCKET_NAME --region REGION --profile PROFILE +``` + +### Step 1.2: Create ECR Repository + +Create the repository to host the projects images. + +```bash +aws --profile PROFILE ecr create-repository --repository-name NAMESPACE/hopsworks-base --region REGION +``` + +### Step 1.3: Create IAM Policies ```json { - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "VisualEditor0", - "Effect": "Allow", - "Action": [ - "iam:CreateInstanceProfile", - "iam:PassRole", - "iam:CreateRole", - "iam:PutRolePolicy", - "iam:AddRoleToInstanceProfile", - "ec2:ImportKeyPair", - "ec2:CreateKeyPair", - "s3:CreateBucket" - ], - "Resource": "*" - } - ] + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "hopsworksaiInstanceProfile", + "Effect": "Allow", + "Action": [ + "S3:PutObject", + "S3:ListBucket", + "S3:GetObject", + "S3:DeleteObject", + "S3:AbortMultipartUpload", + "S3:ListBucketMultipartUploads", + "S3:PutLifecycleConfiguration", + "S3:GetLifecycleConfiguration", + "S3:PutBucketVersioning", + "S3:GetBucketVersioning", + "S3:ListBucketVersions", + "S3:DeleteObjectVersion" + ], + "Resource": [ + "arn:aws:s3:::BUCKET_NAME/*", + "arn:aws:s3:::BUCKET_NAME" + ] + }, + { + "Sid": "AllowPushandPullImagesToUserRepo", + "Effect": "Allow", + "Action": [ + "ecr:GetDownloadUrlForLayer", + "ecr:BatchGetImage", + "ecr:CompleteLayerUpload", + "ecr:UploadLayerPart", + "ecr:InitiateLayerUpload", + "ecr:BatchCheckLayerAvailability", + "ecr:PutImage", + "ecr:ListImages", + "ecr:BatchDeleteImage", + "ecr:GetLifecyclePolicy", + "ecr:PutLifecyclePolicy", + "ecr:TagResource" + ], + "Resource": [ + "arn:aws:ecr:REGION:ECR_AWS_ACCOUNT_ID:repository/*/hopsworks-base" + ] + } + ] } ``` -All the commands have unix-like quotation rules. These commands will need to be adapted to your terminal's quoting rules. See [Using quotation marks with strings](https://docs.aws.amazon.com/cli/v1/userguide/cli-usage-parameters-quoting-strings.html) in the AWS CLI User Guide. +```bash +aws --profile PROFILE iam create-policy --policy-name POLICY_NAME --policy-document file://policy.json +``` -All the commands use the default AWS profile. Add the *--profile* parameter to use another profile. +### Step 1.4: Create EKS cluster using eksctl -## Step 1: Connecting your AWS account +When creating the cluster using eksctl the following parameters are required in the cluster configuration YAML file (eksctl.yaml): -[Managed.hopsworks.ai](https://managed.hopsworks.ai) deploys Hopsworks clusters to your AWS account. To enable this you have to permit us to do so. This is done using an AWS cross-account role. +- amiFamily should either be AmazonLinux2023 or Ubuntu2404 -

- -

+- Instance type should be Intel based or AMD (i.e not ARM) -In [managed.hopsworks.ai](https://managed.hopsworks.ai/) click on *Connect to AWS* or go to *Settings* and click on *Configure* next to *AWS*. This will direct you to a page with the instructions needed to create the Cross account role and set up the connection. Follow the instructions. +- The following policies are required: [IAM policies - eksctl](https://eksctl.io/usage/iam-policies/#attaching-policies-by-arn) -!!! note - it is possible to limit the permissions that are set up during this phase. For more details see [restrictive-permissions](restrictive_permissions.md). +```bash +- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy +- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy +- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly +``` -

-

- Screenshot of the instruction to create the cross account role -
Instructions to create the cross account role
-
-

+The following is required if you are using the EKS AWS Load Balancer Controller to grant permissions to the controller to provision the necessary load balancers [Welcome: AWS Load Balancer Controller](https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/) -## Step 2: Creating storage +```bash + withAddonPolicies: + awsLoadBalancerController: true +``` -!!! note - If you prefer using terraform, you can skip this step and the remaining steps, and instead, follow [this guide](../common/terraform.md#getting-started-with-aws). +You need to update the CLUSTER NAME and the POLICY ARN generated above -The Hopsworks clusters deployed by [managed.hopsworks.ai](https://managed.hopsworks.ai) store their data in an S3 bucket in your AWS account. +```bash +apiVersion: eksctl.io/v1alpha5 +kind: ClusterConfig + +metadata: + name: CLUSTER_NAME + region: REGION + version: "1.29" + +iam: + withOIDC: true + +managedNodeGroups: + - name: ng-1 + amiFamily: AmazonLinux2023 + instanceType: m6i.2xlarge + minSize: 1 + maxSize: 4 + desiredCapacity: 4 + volumeSize: 100 + ssh: + allow: true # will use ~/.ssh/id_rsa.pub as the default ssh key + iam: + attachPolicyARNs: + - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy + - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy + - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly + - arn:aws:iam::827555229956:policy/POLICYNAME + withAddonPolicies: + awsLoadBalancerController: true +addons: + - name: aws-ebs-csi-driver + wellKnownPolicies: # add IAM and service account + ebsCSIController: true +``` +You can create the EKS cluster using the following eksctl command: -To create the bucket run the following command, replacing *BUCKET_NAME* with the name you want for your bucket and setting the region to the aws region in which you want to run your cluster. +```bash +eksctl create cluster -f eksctl.yaml --profile PROFILE +``` -!!! warning - The bucket must be in the same region as the hopsworks cluster you are going to run +Once the creation process is completed, you should be able to access the cluster using the kubectl CLI tool: ```bash -aws s3 mb s3://BUCKET_NAME --region us-east-2 +kubectl get nodes ``` +You should see the list of nodes provisioned for the cluster. + +### Step 1.4: Install the AWS LoadBalancer Addon -## Step 3: Creating Instance profile +For Hopsworks to provision the necessary network and application load balancers, we need to install the AWS LoadBalancer plugin (See [AWS Documentation](https://docs.aws.amazon.com/eks/latest/userguide/lbc-helm.html) ) +```bash +helm repo add eks https://aws.github.io/eks-charts +helm repo update eks +helm install aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=CLUSTER_NAME +``` -Hopsworks cluster nodes need access to certain resources such as the S3 bucket you created above, an ecr repository, and CloudWatch. +### Step 1.5: (Optional) Create GP3 Storage Class +By default EKS comes with GP2 as storage class. GP3 is more cost effective, we can use it with Hopsworks by creating the storage class -First, create an instance profile by running: ```bash -aws iam create-instance-profile --instance-profile-name hopsworksai-instances +kubectl apply -f - < -
- Create a Hopsworks cluster -
Create a Hopsworks cluster
-
-

+```bash +kubectl -n hopsworks get pods +``` -Select the *Region* in which you want your cluster to run (1), name your cluster (2). -Select the *Instance type* (3) and *Local storage* (4) size for the cluster *Head node*. +## Step 3: Resources Created -Check if you want to *Enable EBS encryption* (5) +Using the Helm chart and the values files the following resources are created: -Enter the name of the *S3 bucket* (6) you created in [step 2](#step-2-creating-storage). +Load Balancers: +```bash + externalLoadBalancers: + enabled: true + class: null + annotations: + service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing +``` -!!! note - The S3 bucket you are using must be empty. +Enabling the external load balancer in the values.yml file provisions the following load balancers for the following services: -Make sure that the *ECR AWS Account Id* (7) is correct. It is set by default to the AWS account id where you set the cross-account role and need to match the permissions you set in [step 3](#step-3-creating-instance-profile). -Press *Next*: +- arrowflight : This load balancer is used to send queries from external clients to the Hopsworks Query Service -

-

- Create a Hopsworks cluster, general Information -
Create a Hopsworks cluster, general information
-
-

+- kafka : This load balancer is used to send data to the Apache Kafka brokers for ingestion to the online feature store. +- rdrs: This load balancer is used to query online feature store data using the REST APIs -Select the number of workers you want to start the cluster with (2). -Select the *Instance type* (3) and *Local storage* size (4) for the *worker nodes*. +- mysql: This load balancer is used to query online feature store data using the MySQL APIs -!!! note - It is possible to [add or remove workers](../common/adding_removing_workers.md) or to [enable autoscaling](../common/autoscaling.md) once the cluster is running. +- opensearch : This load balancer is used to query the Hopsworks vector database -Press *Next*: -

-

- Create a Hopsworks cluster, static workers configuration -
Create a Hopsworks cluster, static workers configuration
-
-

+On EKS using the AWS Load Balancers, the AWS controller deployed above will be responsible to provision the necessary load balancers. You can configure the load balancers using the annotations documented in the [AWS Load Balancer controller guide](https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/ingress/annotations/) -Select the *SSH key* you created in [step 4](#step-4-create-an-ssh-key): +You can enable/disable individual load balancers provisioning using the following values in the values.yml file: -

-

- Choose SSH key -
Choose SSH key
-
-

+- kafka.externalLoadBalancer.enabled -Select the *Instance Profile* that you created in [step 3](#step-3-creating-instance-profile): +- opensearch.externalLoadBalancer.enabled -

-

- Choose the instance profile -
Choose the instance profile
-
-

+- rdrs.externalLoadBalancer.enabled -To backup the S3 bucket data when taking a cluster backup we need to set a retention policy for S3. You can deactivate the retention policy by setting this value to 0 but this will block you from taking any backup of your cluster. Choose the retention period in days and click on *Review and submit*: +- mysql.externalLoadBalancer.enabled -

-

- Choose the backup retention policy -
Choose the backup retention policy
-
-

+Other load balancer providers are also supported by providing the appropriate controller, class and annotations. -Review all information and select *Create*: +Ingress: -

-

- Review cluster information -
Review cluster information
-
-

+```bash + ingress: + enabled: true + ingressClassName: alb + annotations: + alb.ingress.kubernetes.io/scheme: internet-facing +``` + +Hopsworks UI and REST interface is available outside the K8s cluster using an Ingress. On AWS this is implemented by provisioning an application load balancer using the AWS load balancer controller. +As per the load balancer above, the controller checks for the following annotations: [Annotations - AWS Load Balancer Controller](https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/ingress/annotations/) -The cluster will start. This will take a few minutes: +HTTPS is required to access the Hopsworks UI, therefore you need to add the following annotation: + +```bash +alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:xxxxx:certificate/xxxxxxx +``` -

-

- Booting Hopsworks cluster -
Booting Hopsworks cluster
-
-

+To configure the TLS certificate the Application Load Balancer should use to terminate the connection. The certificate should be available in the [AWS Certificate Manager](https://aws.amazon.com/certificate-manager/) -As soon as the cluster has started, you will be able to log in to your new Hopsworks cluster. You will also be able to stop, restart, or terminate the cluster. +Cluster Roles and Cluster Role Bindings: +By default a set of cluster roles are provisioned, if you don’t have permissions to provision cluster roles or cluster role bindings, you should reach out to your K8s administrator. You should then provide the appropriate resource names as value in the values.yml file. -

-

- Running Hopsworks cluster -
Running Hopsworks cluster
-
-

## Step 6: Next steps Check out our other guides for how to get started with Hopsworks and the Feature Store: -* Make Hopsworks services [accessible from outside services](../common/services.md) * Get started with the [Hopsworks Feature Store](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/quickstart.ipynb){:target="_blank"} * Follow one of our [tutorials](../../tutorials/index.md) * Follow one of our [Guide](../../user_guides/index.md) diff --git a/docs/setup_installation/aws/restrictive_permissions.md b/docs/setup_installation/aws/restrictive_permissions.md deleted file mode 100644 index d711a999b..000000000 --- a/docs/setup_installation/aws/restrictive_permissions.md +++ /dev/null @@ -1,328 +0,0 @@ -# Limiting AWS permissions - -[Managed.hopsworks.ai](https://managed.hopsworks.ai) requires a set of permissions to be able to manage resources in the user’s AWS account. -By default, [these permissions](#default-permissions) are set to easily allow a wide range of different configurations and allow -us to automate as many steps as possible. While we ensure to never access resources we shouldn’t, -we do understand that this might not be enough for your organization or security policy. -This guide explains how to lock down AWS permissions following the IT security policy principle of least privilege allowing -[managed.hopsworks.ai](https://managed.hopsworks.ai) to only access resources in a specific VPC. - -## Default permissions -This is the list of default permissions that are required by [managed.hopsworks.ai](https://managed.hopsworks.ai). If you prefer to limit these permissions, then proceed to the [next section](#limiting-the-cross-account-role-permissions). - -```json -{!setup_installation/aws/aws_permissions.json!} -``` - -## Limiting the cross-account role permissions - -### Step 1: Create a VPC - -To restrict [managed.hopsworks.ai](https://managed.hopsworks.ai) from accessing resources outside of a specific VPC, you need to create a new VPC -connected to an Internet Gateway. This can be achieved in the AWS Management Console following this guide: -[Create the VPC](https://docs.aws.amazon.com/vpc/latest/userguide/working-with-vpcs.html#Create-VPC). -Follow the steps of `Create a VPC, subnets, and other VPC resources`, naming your vpc and changing the `Number of Availability Zones` to 1 are the only changes you need to make to the default configuration. -Alternatively, an existing VPC such as the default VPC can be used and [managed.hopsworks.ai](https://managed.hopsworks.ai) will be restricted to this VPC. -Note the VPC ID of the VPC you want to use for the following steps. - -!!! note - Make sure you enable `DNS hostnames` for your VPC - -!!! note - If you use VPC endpoints to managed access to services such as S3 and ECR you need to ensure that the endpoints provide the same permissions as set in the [instance profile](../getting_started/#step-3-creating-instance-profile) - -After you have created the VPC either [Create a Security Group](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html#CreatingSecurityGroups) or use VPC's default. Make sure that the VPC allow the following traffic. - -#### Inbound traffic - -It is _**imperative**_ that the [Security Group](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html#AddRemoveRules) allows Inbound traffic from any Instance within the same Security Group in any (TCP) port. All VMs of the Cluster should be able to communicate with each other. It is also recommended to open TCP port `80` to sign the certificates. If you do not open port `80` you will have to use a self-signed certificate in your Hopsworks cluster. This can be done by checking the `Continue with self-signed certificate` check box in the `Security Group` step of the cluster creation. - - -We recommend configuring the [Network ACLs](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-network-acls.html#Rules) to be open to all inbound traffic and let the security group handle the access restriction. But if you want to set limitations at the Network ACLs level, they must be configured so that at least the TCP ephemeral port `32768 - 65535` are open to the internet (this is so that outbound traffic can receive answers). It is also recommended to open TCP port `80` to sign the certificates. If you do not open port `80` you will have to use a self-signed certificate in your Hopsworks cluster. This can be done by checking the `Continue with self-signed certificate` check box in the `Security Group` step of the cluster creation. - -#### Outbound traffic - -Clusters created on [managed.hopsworks.ai](https://managed.hopsworks.ai) need to be able to send http requests to *api.hopsworks.ai*. The *api.hopsworks.ai* domain use a content delivery network for better performance. This result in the impossibility to predict which IP the request will be sent to. If you require a list of static IPs to allow outbound traffic from your security group, use the *static IPs* option during [cluster creation](../cluster_creation/#limiting-outbound-traffic-to-hopsworksai). - -Similar to Inbound traffic, the Security Group in place _**must**_ allow Outbound traffic in any (TCP) port towards any VM withing the same Security Group. - -!!! note - If you intend to use the managed users option on your Hopsworks cluster you should also allow outbound traffic to [cognito-idp.us-east-2.amazonaws.com](https://cognito-idp.us-east-2.amazonaws.com) and [managedhopsworks-prod.auth.us-east-2.amazoncognito.com](https://managedhopsworks-prod.auth.us-east-2.amazoncognito.com). - -### Step 2: Create an instance profile - -You need to create an instance profile that will identify all instances started by [managed.hopsworks.ai](https://managed.hopsworks.ai). -Follow this guide to create a role to be used by EC2 with no permissions attached: -[Creating a Role for an AWS Service (Console)](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.html). -Take note of the ARN of the role you just created. - -You will need to add permissions to the instance profile to give access to the S3 bucket where Hopsworks will store its data. For more details about these permissions check [our guide here](../getting_started/#step-3-creating-instance-profile). -Check [below](#limiting-the-instance-profile-permissions) for more information on restricting the permissions given the instance profile. - -### Step 3: Set permissions of the cross-account role - -During the account setup for [managed.hopsworks.ai](https://managed.hopsworks.ai), you were asked to create and provide a cross-account role. -If you don’t remember which role you used then you can find it in Settings/Account Settings in [managed.hopsworks.ai](https://managed.hopsworks.ai). -Edit this role in the AWS Management Console and overwrite the existing inline policy with the following policy. - -Note that you have to replace `[INSTANCE_PROFILE_NAME]` and `[VPC_ID]` for multiple occurrences in the given policy. - -If you want to learn more about how this policy works check out: -[How to Help Lock Down a User’s Amazon EC2 Capabilities to a Single VPC](https://aws.amazon.com/blogs/security/how-to-help-lock-down-a-users-amazon-ec2-capabilities-to-a-single-vpc/). - -```json -{ - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "NonResourceBasedPermissions", - "Effect": "Allow", - "Action": [ - "ec2:DescribeInstances", - "ec2:DescribeVpcs", - "ec2:DescribeVolumes", - "ec2:DescribeSubnets", - "ec2:DescribeKeyPairs", - "ec2:DescribeInstanceStatus", - "iam:ListInstanceProfiles", - "ec2:DescribeSecurityGroups", - "ec2:DescribeVpcAttribute", - "ec2:DescribeRouteTables" - ], - "Resource": "*" - }, - { - "Sid": "IAMPassRoleToInstance", - "Effect": "Allow", - "Action": "iam:PassRole", - "Resource": "arn:aws:iam::*:role/[INSTANCE_PROFILE_NAME]" - }, - { - "Sid": "EC2RunInstancesOnlyWithGivenRole", - "Effect": "Allow", - "Action": "ec2:RunInstances", - "Resource": "arn:aws:ec2:*:*:instance/*", - "Condition": { - "ArnLike": { - "ec2:InstanceProfile": "arn:aws:iam::*:instance-profile/[INSTANCE_PROFILE_NAME]" - } - } - }, - { - "Sid": "EC2RunInstancesOnlyInGivenVpc", - "Effect": "Allow", - "Action": "ec2:RunInstances", - "Resource": "arn:aws:ec2:*:*:subnet/*", - "Condition": { - "ArnLike": { - "ec2:vpc": "arn:aws:ec2:*:*:vpc/[VPC_ID]" - } - } - }, - { - "Sid": "AllowInstanceActions", - "Effect": "Allow", - "Action": [ - "ec2:StopInstances", - "ec2:TerminateInstances", - "ec2:StartInstances", - "ec2:CreateTags", - "ec2:AssociateIamInstanceProfile" - ], - "Resource": "arn:aws:ec2:*:*:instance/*", - "Condition": { - "ArnLike": { - "ec2:InstanceProfile": "arn:aws:iam::*:instance-profile/[INSTANCE_PROFILE_NAME]" - } - } - }, - { - "Sid": "RemainingRunInstancePermissions", - "Effect": "Allow", - "Action": [ - "ec2:RunInstances", - "ec2:CreateTags" - ], - "Resource": [ - "arn:aws:ec2:*:*:volume/*", - "arn:aws:ec2:*::image/*", - "arn:aws:ec2:*::snapshot/*", - "arn:aws:ec2:*:*:network-interface/*", - "arn:aws:ec2:*:*:key-pair/*", - "arn:aws:ec2:*:*:security-group/*" - ] - }, - { - "Sid": "EC2VpcNonResourceSpecificActions", - "Effect": "Allow", - "Action": [ - "ec2:AuthorizeSecurityGroupIngress", - "ec2:RevokeSecurityGroupIngress" - ], - "Resource": "*", - "Condition": { - "ArnLike": { - "ec2:vpc": "arn:aws:ec2:*:*:vpc/[VPC_ID]" - } - } - }, - { - "Sid": "EC2BackupCreation", - "Effect": "Allow", - "Action": [ - "ec2:RegisterImage", - "ec2:DeregisterImage", - "ec2:DescribeImages", - "ec2:CreateSnapshot", - "ec2:DeleteSnapshot", - "ec2:DescribeSnapshots" - ], - "Resource": "*" - } - ] - } -``` - -### Step 4: Create your Hopsworks instance - -You can now create a new Hopsworks instance in [managed.hopsworks.ai](https://managed.hopsworks.ai) by selecting the configured instance profile, -VPC and security group during instance configuration. Selecting any other VPCs or instance profiles will result in permissions errors. - -### Step 5: Supporting multiple VPCs - -The policy can be extended to give [managed.hopsworks.ai](https://managed.hopsworks.ai) access to multiple VPCs. -See: [Creating a Condition with Multiple Keys or Values](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_multi-value-conditions.html). - -### Other removable permissions - -There are other permissions that are required by Hopsworks to provide certain product capabilities to the users. In this section, we go through these permissions and what are the implication or removing them. - -#### Backup permissions - -The following permissions are only needed for the backup feature. You can remove them if you are not going to create backups or if you do not have access to this Enterprise feature. - -```json - { - "Sid": "EC2BackupCreation", - "Effect": "Allow", - "Action": [ - "ec2:RegisterImage", - "ec2:DeregisterImage", - "ec2:DescribeImages", - "ec2:CreateSnapshot", - "ec2:DeleteSnapshot", - "ec2:DescribeSnapshots" - ], - "Resource": "*", - } -``` - -#### Early warnings if VPC is not configured correctly - -The following permissions are needed to give an early warning if your VPC and security groups are badly configured. You can remove them if you don't need it. -```json - "ec2:DescribeVpcAttribute", - "ec2:DescribeRouteTables" -``` - -#### Open and close ports from within Hopsworks.ai - -The following permissions are used to let you close and open ports on your cluster from hopswork.ai, you can remove them if you do not want to open ports on your cluster or if you want to manually open ports in EC2. - -```json - { - "Sid": "EC2VpcNonResourceSpecificActions", - "Effect": "Allow", - "Action": [ - "ec2:AuthorizeSecurityGroupIngress", - "ec2:RevokeSecurityGroupIngress", - ], - "Resource": "*", - "Condition": { - "ArnLike": { - "ec2:vpc": "arn:aws:ec2:*:*:vpc/[VPC_ID]" - } - } - } -``` - -#### Non resource based permissions used for listing - -If you are using terraform, then you can also remove most of the *Describe* permissions in `NonResourceBasedPermissions` and use the following permissions instead - -```json - { - "Sid": "NonResourceBasedPermissions", - "Effect": "Allow", - "Action": [ - "ec2:DescribeInstances", - "ec2:DescribeVolumes", - "ec2:DescribeSecurityGroups" - ], - "Resource": "*" - }, -``` - -#### Load balancers permissions for external access -If you plan to access your Hopsworks cluster from an external python environment, especially if you plan to use the [ArrowFlight with DuckDB](../../common/arrow_flight_duckdb), then it is required to create a network load balancer that forward requests to the ArrowFlight server(s) co-located with the RonDB MySQL Server(s). If you are not planning to use ArrowFlight server(s) or multiple mysql server(s), you can skip adding the following permissions. If you still wish to use the ArrowFlight server(s) but without adding the following permissions to your cross account role, check [this advanced terraform example for more details](https://github.com/logicalclocks/terraform-provider-hopsworksai/tree/main/examples/complete/aws/advanced/arrowflight-no-loadbalancer-permissions). - - -```json - { - "Sid": "ManageLoadBalancersForExternalAccess", - "Effect": "Allow", - "Action": [ - "elasticloadbalancing:CreateLoadBalancer", - "elasticloadbalancing:CreateListener", - "elasticloadbalancing:CreateTargetGroup", - "elasticloadbalancing:RegisterTargets", - "elasticloadbalancing:AddTags", - "elasticloadbalancing:DescribeTargetGroups", - "elasticloadbalancing:DeleteLoadBalancer", - "elasticloadbalancing:DeleteTargetGroup" - ], - "Resource": "*" - } -``` - - -## Limiting the instance profile permissions - -### Backups -If you do not intend to take backups or if you do not have access to this Enterprise feature you can remove the permissions that are only used by the backup feature when [configuring instance profile permissions](../getting_started/#step-3-creating-instance-profile) . -For this remove the following permissions from the instance profile: - -```json - "S3:PutLifecycleConfiguration", - "S3:GetLifecycleConfiguration", - "S3:PutBucketVersioning", - "S3:ListBucketVersions", - "S3:DeleteObjectVersion", -``` - -### CloudWatch Logs -Hopsworks put its logs in Amazon CloudWatch so that you can access them without having to ssh into the machine. If you are not interested in this feature you can remove the following from your instance profile policy: - -```json - { - "Effect": "Allow", - "Action": [ - "cloudwatch:PutMetricData", - "ec2:DescribeVolumes", - "ec2:DescribeTags", - "logs:PutLogEvents", - "logs:DescribeLogStreams", - "logs:DescribeLogGroups", - "logs:CreateLogStream", - "logs:CreateLogGroup" - ], - "Resource": "*" - }, - { - "Effect": "Allow", - "Action": [ - "ssm:GetParameter" - ], - "Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*" - } -``` diff --git a/docs/setup_installation/aws/troubleshooting.md b/docs/setup_installation/aws/troubleshooting.md deleted file mode 100644 index 15e1b53be..000000000 --- a/docs/setup_installation/aws/troubleshooting.md +++ /dev/null @@ -1,46 +0,0 @@ -# Troubleshooting - -A list of common problems that you might encounter during cluster creation and how to solve them. - -## Unauthorized error during cluster creation - -If you encounter the following error right after creating your cluster, then it is likely that you have either missed or misconfigured one of the permissions in the [cross account role setup](../getting_started/#step-1-connecting-your-aws-account). - -

-

- Unauthorized error during cluster creation -
Unauthorized error during cluster creation
-
-

- -In order to get more details regarding the authorization error above, you need to use the AWS Command Line Interface (AWS CLI) to decode the message using the `aws sts decode-authorization-message --encoded-message` command. - -``` -$ aws sts decode-authorization-message --encoded-message hg-Sh5-CUNT5jgB305YbOp_FDp2P70ZPw5iwoextxcdWmoc4wgm_K0pAUZEvTtCpvCk_-EwtaqaRS0act1BM-Bz-id4NOwo-OVZES5q9fLQIqk5_typL767idkb4jdzrrwNLD3h7iaaoleKGQpaW5kzI_oHEtibBRY2uWhU07oiwDHOAwb-cQ-kIA4nJIay7wVoL7QRx8nECpb56s68lMWhrdbqKj6uRQwsAILY7eoV-sDCbWWjnr98ja_olixhlV95txiV-oCR2qW6GKn4TVKl2raGbwjWRdS2GACP0fm7RUI_glPl7q65Erhrcr7Z2uF2SRF46VI5vfXkjXxv58e0x6SSRmKXF397e4QpPM6RyopmgDa9sSWAbkBxC86O9b30l47GX9w98trc76jsfU-UcdqK-Vu7Qy3-j8ehYMDpNvZRFNX64fUrsfusLJcHnhAPqUgCbvjfmEa605GkH7amlP2j23vprb94auzCVk8rgVkrSrBMek6YlWA0nzXtSjq8mVAvFE-n6x3ByLdt68Ldgc602FsFqifuzUm7CnjapIIwSAat_TXQCs-mjXyB983AEw__RwiXN -``` - -Then you will get the following message as response -```json -{ - "DecodedMessage": "{\"allowed\":false,\"explicitDeny\":false,\"matchedStatements\":{\"items\":[]},\"failures\":{\"items\":[]},\"context\":{\"principal\":{\"id\":\"AROA27VDEGQLGDB4JOSOI:1f708920-18a6-11ed-8dd4-f162dca8fc19\",\"arn\":\"arn:aws:sts::xxxxx:assumed-role/cross-acount-role/1f708920-18a6-11ed-8dd4-f162dca8fc19\"},\"action\":\"ec2:CreateVpc\",\"resource\":\"arn:aws:ec2:us-east-2:xxxxx:vpc/*\",\"conditions\":{\"items\":[{\"key\":\"aws:Region\",\"values\":{\"items\":[{\"value\":\"us-east-2\"}]}},{\"key\":\"aws:Service\",\"values\":{\"items\":[{\"value\":\"ec2\"}]}},{\"key\":\"aws:Resource\",\"values\":{\"items\":[{\"value\":\"vpc/*\"}]}},{\"key\":\"aws:Type\",\"values\":{\"items\":[{\"value\":\"vpc\"}]}},{\"key\":\"aws:Account\",\"values\":{\"items\":[{\"value\":\"xxxxxx\"}]}},{\"key\":\"ec2:VpcID\",\"values\":{\"items\":[{\"value\":\"*\"}]}},{\"key\":\"aws:ARN\",\"values\":{\"items\":[{\"value\":\"arn:aws:ec2:us-east-2:xxxx:vpc/*\"}]}}]}}}" -} -``` - -From the above response we can see that the cross-account role is missing the `ec2:CreateVpc` permission. The solution is to terminate the cluster in error and update [cross account role setup](../getting_started/#step-1-connecting-your-aws-account) with the missing permission(s) and then try to create a new cluster. - -## Missing permissions error during cluster creation - -If you encounter the following error right after creating your cluster, then the issue is with the instance profile permissions. - -

-

- Missing permissions error during cluster creation -
Missing permissions error during cluster creation
-
-

- -This issue could be caused by one of the following: - -* The [instance profile that you have chosen during cluster creation](../cluster_creation/#step-5-select-the-instance-profile) is actually missing the permissions stated in the error on your [chosen S3 bucket](../cluster_creation/#step-2-setting-the-general-information). Then in that case, update your instance profile accordingly and then click *Retry* to retry cluster creation operation. - -* Your AWS organization is using [SCPs policy](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html) that disallow policy simulation. In that case, you could do a simple test to confirm the issue by using [the AWS PolicySim on AWS console](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_testing-policies.html). If policy simulation is disallowed, you can configure managed.hopsworks.ai to skip the policy simulation step by removing the `iam:SimulatePrincipalPolicy` permission from [your cross account role](../getting_started/#step-1-connecting-your-aws-account), by navigating to the [AWS Roles console](https://us-east-1.console.aws.amazon.com/iamv2/home#/roles), search for your cross account role name and click on it, on the permissions tab click edit on hopsworks inline policy, choose JSON tab, remove `iam:SimulatePrincipalPolicy`, click *Review Policy*, and then click *Save Changes*, and finally navigate back to managed.hopsworks.ai and click *Retry* to retry the cluster creation. diff --git a/docs/setup_installation/aws/upgrade.md b/docs/setup_installation/aws/upgrade.md deleted file mode 100644 index cab11495a..000000000 --- a/docs/setup_installation/aws/upgrade.md +++ /dev/null @@ -1,168 +0,0 @@ -# Upgrade existing clusters on managed.hopsworks.ai from version 2.2 or older (AWS) - -This guide shows you how to upgrade your existing Hopsworks cluster to a newer version of Hopsworks. - -## Step 1: Make sure your cluster is running - -It is important that your cluster is **Running**. Otherwise you will not be able to upgrade. As soon as a new version is available an upgrade notification will appear. - -You can proceed by clicking the *Upgrade* button. - -

-

- New version notification -
A new Hopsworks version is available
-
-

- - -## Step 2: Add upgrade permissions to your instance profile - -!!! note - You can skip this step if you already have the following permissions in your instance profile: - ```json - [ "ec2:DescribeVolumes", "ec2:DetachVolume", "ec2:AttachVolume", "ec2:ModifyInstanceAttribute"] - ``` - -We require extra permissions to be added to the instance profile attached to your cluster to proceed with the upgrade. First to get the name of your instance profile, click on the *Details* tab as shown below: - -

-

- Getting the name of your instance profile -
Getting the name of your instance profile
-
-

- - -Once you get your instance profile name, navigate to [AWS management console](https://console.aws.amazon.com/iam/home#), then click on *Roles* and then search for your role name and click on it. Go to the *Permissions* tab, click on *Add inline policy*, and then go to the *JSON* tab. Paste the following snippet, click on *Review policy*, name it, and click *Create policy*. - -!!! note - You can restrict the upgrade permissions given to your instance profile. Refer to [this guide](restrictive_permissions.md#limiting-upgrade-permissions) for more information. - -```json -{ - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "UpgradePermissions", - "Effect": "Allow", - "Action": [ - "ec2:DescribeVolumes", - "ec2:DetachVolume", - "ec2:AttachVolume", - "ec2:ModifyInstanceAttribute" - ], - "Resource": "*" - } - ] -} -``` - -## Step 3: Run the upgrade process - -You need to click on *Upgrade* to start the upgrade process. You will be prompted with the screen shown below to confirm your intention to upgrade: - -!!! note - No need to worry about the following message since this is done already in [Step 2](#step-2-add-upgrade-permissions-to-your-instance-profile) - - **Make sure that your instance profile (hopsworks-doc) includes the following permissions: - [ "ec2:DetachVolume", "ec2:AttachVolume", "ec2:ModifyInstanceAttribute" ]** - -

-

- AWS Upgrade Prompt -
Upgrade confirmation
-
-

- -Check the *Yes, upgrade cluster* checkbox to proceed, then the *Upgrade* button will be activated as shown below: - -!!! warning - Currently, we only support upgrade for the head node and you will need to recreate your workers once the upgrade is successfully completed. - -

-

- AWS Upgrade Prompt -
Upgrade confirmation
-
-

- -Depending on how big your current cluster is, the upgrade process may take from 1 hour to a few hours until completion. - -!!! note - We don't delete your old cluster until the upgrade process is successfully completed. - - -

-

- AWS Upgrade starting -
Upgrade is running
-
-

- -Once the upgrade is completed, you can confirm that you have the new Hopsworks version by checking the *Details* tab of your cluster as below: - -

-

- AWS Upgrade complete -
Upgrade is complete
-
-

- -## Error handling -There are two categories of errors that you may encounter during an upgrade. First, a permission error due to missing permission or a misconfigured policy in your instance profile, see [Error 1](#error-1-misconfigured-upgrade-permissions). Second, an error during the upgrade process running on your cluster, see [Error 2](#error-2-upgrade-process-error). - -### Error 1: Misconfigured upgrade permissions - -During the upgrade process, [managed.hopsworks.ai](https://managed.hopsworks.ai) starts by validating your instance profile permissions to ensure that it includes the required upgrade permissions. If one or more permissions are missing, or if the resource is not set correctly, you will be notified with an error message and a *Retry* button will appear as shown below: - -

-

- AWS Upgrade Retry -
Upgrade permissions are missing
-
-

- -Update you instance profile accordingly, then click *Retry* - -

-

- AWS Upgrade starting -
Upgrade is running
-
-

- -### Error 2: Upgrade process error - -If an error occurs during the upgrade process, you will have the option to rollback to your old cluster as shown below: - -

-

- Error during upgrade -
Error occurred during upgrade
-
-

- -Click on *Rollback* to recover your old cluster before upgrade. - -

-

- Rollback prompt -
Upgrade rollback confirmation
-
-

- -Check the *Yes, rollback cluster* checkbox to proceed, then the *Rollback* button will be activated as shown below: - -

-

- Rollback prompt -
Upgrade rollback confirmation
-
-

- -Once the rollback is completed, you will be able to continue working as normal with your old cluster. - -!!! note - The old cluster will be **stopped** after the rollback. You have to click on the *Start* button. - diff --git a/docs/setup_installation/aws/upgrade_2.4.md b/docs/setup_installation/aws/upgrade_2.4.md deleted file mode 100644 index 8f27dc1ac..000000000 --- a/docs/setup_installation/aws/upgrade_2.4.md +++ /dev/null @@ -1,164 +0,0 @@ -# Upgrade existing clusters on managed.hopsworks.ai from version 2.4 or newer (AWS) - -This guide shows you how to upgrade your existing Hopsworks cluster to a newer version of Hopsworks. - -## Step 1: Make sure your cluster is running - -It is important that your cluster is **Running**. Otherwise you will not be able to upgrade. As soon as a new version is available an upgrade notification will appear. - -You can proceed by clicking the *Upgrade* button. - -

-

- New version notification -
A new Hopsworks version is available
-
-

- -## Step 2: Add backup permissions to your cross account role - -!!! note - You can skip this step if you already have the following permissions in your cross account role: - ```json - [ "ec2:RegisterImage", "ec2:DeregisterImage", "ec2:DescribeImages", "ec2:CreateSnapshot", "ec2:DeleteSnapshot", "ec2:DescribeSnapshots"] - ``` - -We require some extra permissions to be added to the role you have created when connecting your AWS account as described in [getting started guide](../getting_started/#step-1-connecting-your-aws-account). These permissions are required to create a snapshot of your cluster before proceeding with the upgrade. - - -First, check which role or access key you have added to managed.hopsworks.ai, you can go to the *Settings* tab, and then click *Edit* next to the AWS cloud account - -

-

- AWS account settings -
Cloud Accounts
-
-

- -Once you have clicked on *Edit*, you will be able to see the current assigned role - -

-

- AWS cross-account role -
AWS Cross-Account Role
-
-

- -Once you get your role name, navigate to [AWS management console](https://console.aws.amazon.com/iam/home#), then click on *Roles* and then search for your role name and click on it. Go to the *Permissions* tab, click on *Add inline policy*, and then go to the *JSON* tab. Paste the following snippet, click on *Review policy*, name it, and click *Create policy*. - -```json -{ - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "HopsworksAIBackup", - "Effect": "Allow", - "Action": [ - "ec2:RegisterImage", - "ec2:DeregisterImage", - "ec2:DescribeImages", - "ec2:CreateSnapshot", - "ec2:DeleteSnapshot", - "ec2:DescribeSnapshots" - ], - "Resource": "*" - } - ] -} -``` - -## Step 3: Run the upgrade process - -You need to click on *Upgrade* to start the upgrade process. You will be prompted with the screen shown below to confirm your intention to upgrade: - -!!! note - No need to worry about the following message since this is done already in [Step 2](#step-2-add-backup-permissions-to-your-cross-account-role) - - **Make sure that your cross-account role which you have connected to managed.hopsworks.ai has the following permissions: - [ "ec2:RegisterImage", "ec2:DeregisterImage", "ec2:DescribeImages", "ec2:CreateSnapshot", "ec2:DeleteSnapshot", "ec2:DescribeSnapshots"]** - -

-

- AWS Upgrade Prompt -
Upgrade confirmation
-
-

- -Check the *Yes, upgrade cluster* checkbox to proceed, then the *Upgrade* button will be activated as shown below: - -!!! warning - Currently, we only support upgrade for the head node and you will need to recreate your workers once the upgrade is successfully completed. - -

-

- AWS Upgrade Prompt -
Upgrade confirmation
-
-

- -Depending on how big your current cluster is, the upgrade process may take from 1 hour to a few hours until completion. - -!!! note - We don't delete your old cluster until the upgrade process is successfully completed. - - -

-

- AWS Upgrade starting -
Upgrade is running
-
-

- -Once the upgrade is completed, you can confirm that you have the new Hopsworks version by checking the version on the *Details* tab of your cluster. - -## Error handling -There are two categories of errors that you may encounter during an upgrade. First, a permission error due to missing permission or a misconfigured policy in your cross-account role, see [Error 1](#error-1-missing-backup-permissions). Second, an error during the upgrade process running on your cluster, see [Error 2](#error-2-upgrade-process-error). - -### Error 1: Missing backup permissions - -If one or more backup permissions are missing, or if the resource is not set correctly, you will be notified with an error message as shown below: - -

-

- AWS Upgrade Retry -
Upgrade permissions are missing
-
-

- -Update you cross account role as described in [Step 2](#step-2-add-backup-permissions-to-your-cross-account-role), then click *Start*. Once the cluster is up and running, you can try running the upgrade again. - - -### Error 2: Upgrade process error - -If an error occurs during the upgrade process, you will have the option to rollback to your old cluster as shown below: - -

-

- Error during upgrade -
Error occurred during upgrade
-
-

- -Click on *Rollback* to recover your old cluster before upgrade. - -

-

- Rollback prompt -
Upgrade rollback confirmation
-
-

- -Check the *Yes, rollback cluster* checkbox to proceed, then the *Rollback* button will be activated as shown below: - -

-

- Rollback prompt -
Upgrade rollback confirmation
-
-

- -Once the rollback is completed, you will be able to continue working as normal with your old cluster. - -!!! note - The old cluster will be **stopped** after the rollback. You have to click on the *Start* button. - diff --git a/docs/setup_installation/aws/upgrade_3.0.md b/docs/setup_installation/aws/upgrade_3.0.md deleted file mode 100644 index 8c95a5c67..000000000 --- a/docs/setup_installation/aws/upgrade_3.0.md +++ /dev/null @@ -1,176 +0,0 @@ -# Upgrade existing clusters on managed.hopsworks.ai from version 3.0 or newer (AWS) - -This guide shows you how to upgrade your existing Hopsworks cluster to a newer version of Hopsworks. - -## Step 1: Make sure your cluster is running - -It is important that your cluster is **Running**. Otherwise you will not be able to upgrade. As soon as a new version is available an upgrade notification will appear. - -You can proceed by clicking the *Upgrade* button. - -

-

- New version notification -
A new Hopsworks version is available
-
-

- -## Step 2: Add backup permissions to your cross account role - -!!! note - You can skip this step if you already have the following permissions in your cross account role: - ```json - [ "ec2:RegisterImage", "ec2:DeregisterImage", "ec2:DescribeImages", "ec2:CreateSnapshot", "ec2:DeleteSnapshot", "ec2:DescribeSnapshots"] - ``` - -We require some extra permissions to be added to the role you have created when connecting your AWS account as described in [getting started guide](../getting_started/#step-1-connecting-your-aws-account). These permissions are required to create a snapshot of your cluster before proceeding with the upgrade. - - -First, check which role or access key you have added to managed.hopsworks.ai, you can go to the *Settings* tab, and then click *Edit* next to the AWS cloud account - -

-

- AWS account settings -
Cloud Accounts
-
-

- -Once you have clicked on *Edit*, you will be able to see the current assigned role - -

-

- AWS cross-account role -
AWS Cross-Account Role
-
-

- -Once you get your role name, navigate to [AWS management console](https://console.aws.amazon.com/iam/home#), then click on *Roles* and then search for your role name and click on it. Go to the *Permissions* tab, click on *Add inline policy*, and then go to the *JSON* tab. Paste the following snippet, click on *Review policy*, name it, and click *Create policy*. - -```json -{ - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "HopsworksAIBackup", - "Effect": "Allow", - "Action": [ - "ec2:RegisterImage", - "ec2:DeregisterImage", - "ec2:DescribeImages", - "ec2:CreateSnapshot", - "ec2:DeleteSnapshot", - "ec2:DescribeSnapshots" - ], - "Resource": "*" - } - ] -} -``` - -## Step 3: Update the instance profile permissions - -We have enforced using managed docker registry (ECR) starting from Hopsworks version 3.1.0, so you need to update your instance profile to include extra permissions to allow access to ECR. First, get the instance profile of your cluster by clicking on the *Details* tab and check the IAM role ARN shown in front of *IAM Instance Profile*. Once you get your role name, navigate to [AWS management console](https://console.aws.amazon.com/iam/home#), then click on *Roles* and then search for your role name and click on it. Go to the *Permissions* tab, click on *Add inline policy*, and then go to the *JSON* tab. Paste the following snippet, click on *Review policy*, name it, and click *Create policy*. - - -```json -{ - "Version":"2012-10-17", - "Statement":[ - { - "Sid":"AllowPullImagesFromHopsworkAi", - "Effect":"Allow", - "Action":[ - "ecr:GetDownloadUrlForLayer", - "ecr:BatchGetImage" - ], - "Resource":[ - "arn:aws:ecr:*:822623301872:repository/filebeat", - "arn:aws:ecr:*:822623301872:repository/base", - "arn:aws:ecr:*:822623301872:repository/onlinefs", - "arn:aws:ecr:*:822623301872:repository/airflow", - "arn:aws:ecr:*:822623301872:repository/git" - ] - }, - { - "Sid":"AllowCreateRespositry", - "Effect":"Allow", - "Action":"ecr:CreateRepository", - "Resource":"*" - }, - { - "Sid":"AllowPushandPullImagesToUserRepo", - "Effect":"Allow", - "Action":[ - "ecr:GetDownloadUrlForLayer", - "ecr:BatchGetImage", - "ecr:CompleteLayerUpload", - "ecr:UploadLayerPart", - "ecr:InitiateLayerUpload", - "ecr:BatchCheckLayerAvailability", - "ecr:PutImage", - "ecr:ListImages", - "ecr:BatchDeleteImage", - "ecr:GetLifecyclePolicy", - "ecr:PutLifecyclePolicy", - "ecr:TagResource" - ], - "Resource":[ - "arn:aws:ecr:*:*:repository/*/filebeat", - "arn:aws:ecr:*:*:repository/*/base", - "arn:aws:ecr:*:*:repository/*/onlinefs", - "arn:aws:ecr:*:*:repository/*/airflow", - "arn:aws:ecr:*:*:repository/*/git" - ] - }, - { - "Sid":"AllowGetAuthToken", - "Effect":"Allow", - "Action":"ecr:GetAuthorizationToken", - "Resource":"*" - } - ] -} -``` - -## Step 4: Run the upgrade process - -You need to click on *Upgrade* to start the upgrade process. You will be prompted with the screen shown below to confirm your intention to upgrade: - -!!! note - No need to worry about the steps shown below if you have already completed [Step 2](#step-2-add-backup-permissions-to-your-cross-account-role) and [Step 3](#step-3-update-the-instance-profile-permissions) - -

-

- AWS Upgrade Prompt -
Upgrade confirmation
-
-

- -Check the *Yes, upgrade cluster* checkbox to proceed, then the *Upgrade* button will be activated as shown below: - -!!! warning - Currently, we only support upgrade for the head node and you will need to recreate your workers once the upgrade is successfully completed. - -

-

- AWS Upgrade Prompt -
Upgrade confirmation
-
-

- -Depending on how big your current cluster is, the upgrade process may take from 1 hour to a few hours until completion. - -!!! note - We don't delete your old cluster until the upgrade process is successfully completed. - - -

-

- AWS Upgrade starting -
Upgrade is running
-
-

- -Once the upgrade is completed, you can confirm that you have the new Hopsworks version by checking the version on the *Details* tab of your cluster. - -For more details about error handling check [this guide](../upgrade_2.4/#error-handling) \ No newline at end of file From 4d3e735613c81600930e27139832c0b126cf2c7b Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Fri, 4 Oct 2024 20:10:00 +0100 Subject: [PATCH 08/24] Fix step sub numbers in azure. --- docs/setup_installation/azure/getting_started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/setup_installation/azure/getting_started.md b/docs/setup_installation/azure/getting_started.md index fd82dbc55..9dda5af6b 100644 --- a/docs/setup_installation/azure/getting_started.md +++ b/docs/setup_installation/azure/getting_started.md @@ -102,7 +102,7 @@ helm repo add hopsworks-dev https://nexus.hops.works/repository/hopsworks-helm-d helm repo update hopsworks-dev ``` -### Step 3.3: Create Hopsworks namespace & secrets +### Step 3.2: Create Hopsworks namespace & secrets ```bash kubectl create namespace hopsworks From abdf4e5444f260322bb34b315993af48d5ee96f6 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Fri, 4 Oct 2024 20:10:58 +0100 Subject: [PATCH 09/24] Switch to kubernetes steps for GCP/GKE - Remove any references to managed.hopsworks.ai --- .../gcp/cluster_creation.md | 224 ---------------- .../setup_installation/gcp/getting_started.md | 250 ++++++++---------- .../setup_installation/gcp/gke_integration.md | 108 -------- .../gcp/restrictive_permissions.md | 96 ------- 4 files changed, 111 insertions(+), 567 deletions(-) delete mode 100644 docs/setup_installation/gcp/cluster_creation.md delete mode 100644 docs/setup_installation/gcp/gke_integration.md delete mode 100644 docs/setup_installation/gcp/restrictive_permissions.md diff --git a/docs/setup_installation/gcp/cluster_creation.md b/docs/setup_installation/gcp/cluster_creation.md deleted file mode 100644 index f8681e357..000000000 --- a/docs/setup_installation/gcp/cluster_creation.md +++ /dev/null @@ -1,224 +0,0 @@ -# Cluster creation in managed.hopsworks.ai (GCP) -This guide goes into detail for each of the steps of the cluster creation in [managed.hopsworks.ai](https://managed.hopsworks.ai) - -### Step 1 starting to create a cluster - -In [managed.hopsworks.ai](https://managed.hopsworks.ai), select *Create cluster*: - -

-

- Create a Hopsworks cluster -
Create a Hopsworks cluster
-
-

- -### Step 2 setting the General information - -Select the *GCP Project* (1) in which you want the cluster to run. - -!!! note - If the *Project* does not appear in the drop-down, make sure that you properly [Connected your GCP account](./getting_started.md#step-1-connecting-your-gcp-account) for this project. - -Name your cluster (2). Choose the *Region*(3) and *Zone*(4) in which to deploy the cluster. - -Select the *Instance type* (5) and *Local storage* (6) size for the cluster *Head node*. - -Optional: Specify a [customer-managed encryption key](https://cloud.google.com/compute/docs/disks/customer-managed-encryption) to be used for encryption of local storage. The key has to be specified using the format: `projects/PROJECT_ID/locations/REGION/keyRings/KEY_RING/cryptoKeys/KEY`. Note that your project needs to be configured to allow usage of the key. This can be achieved by executing the gcloud command below. Refer to the GCP documentation for more details: [Protect resources by using Cloud KMS keys](https://cloud.google.com/compute/docs/disks/customer-managed-encryption#before_you_begin). - - gcloud projects add-iam-policy-binding KMS_PROJECT_ID \ - --member serviceAccount:service-PROJECT_NUMBER@compute-system.iam.gserviceaccount.com \ - --role roles/cloudkms.cryptoKeyEncrypterDecrypter - - -Enter the name of the bucket in which the Hopsworks cluster will store its data in *Cloud Storage Bucket* (8) - -!!! warning - The bucket must be empty and must be in a region accessible from the region in which the cluster is deployed. - -The artifact registry used to store the clusters docker images (9) - -

-

- General configuration -
General configuration
-
-

- -### Step 3 workers configuration - -In this step, you configure the workers. There are two possible setups: static or autoscaling. In the static setup, the cluster has a fixed number of workers that you decide. You can then add and remove workers manually, for more details: [documentation](../common/adding_removing_workers.md). In the autoscaling setup, you configure conditions to add and remove workers and the cluster will automatically add and remove workers depending on the demand, for more details: [documentation](../common/autoscaling.md). - -#### Static workers configuration -You can set the static configuration by selecting *Disabled* in the first drop-down (1). Then you select the number of workers you want to start the cluster with (2). And, select the *Instance type* (3) and *Local storage* size (4) for the *worker nodes*. - -

-

- Create a Hopsworks cluster, static workers configuration -
Create a Hopsworks cluster, static workers configuration
-
-

- -#### Autoscaling workers configuration -You can set the autoscaling configuration by selecting enabled in the first drop-down (1). You can configure: - -1. The instance type you want to use. -2. The size of the instances' disk. -3. The minimum number of workers. -4. The maximum number of workers. -5. The targeted number of standby workers. Setting some resources on standby ensures that there are always some free resources in your cluster. This ensures that requests for new resources are fulfilled promptly. You configure the standby by setting the number of workers you want to be on standby. For example, if you set a value of *0.5* the system will start a new worker every time the aggregated free cluster resources drop below 50% of a worker's resources. If you set this value to 0 new workers will only be started when a job or notebook requests the resources. -6. The time to wait before removing unused resources. One often starts a new computation shortly after finishing the previous one. To avoid having to wait for workers to stop and start between each computation it is recommended to wait before shutting down workers. Here you set the amount of time in seconds resources need to be unused before they get removed from the system. - -!!! note - The standby will not be taken into account if you set the minimum number of workers to 0 and no resources are used in the cluster. This ensures that the number of nodes can fall to 0 when no resources are used. The standby will start to take effect as soon as you start using resources. - -

-

- Create a Hopsworks cluster, autoscale workers configuration -
Create a Hopsworks cluster, autoscale workers configuration
-
-

- -### Step 4 select the Service Account -Hopsworks cluster store their data in a storage bucket. To let the cluster instances access the bucket we need to attach a *Service Account* to the virtual machines. In this step, you set which *Service Account* to use by entering its *Email*. This *Service Account* needs to have access right to the *bucket* you selected in [Step 2](#step-2-setting-the-general-information). For more details on how to create the *Service Account* and give it access to the bucket refer to [Creating and configuring a storage](getting_started.md#step-2-creating-and-configuring-a-storage) - -

-

- Set the instance service account -
Set the instance service account
-
-

- -### Step 5 set the backup retention policy - -To backup the storage bucket data when taking a cluster backup we need to set a retention policy for the bucket. In this step, you choose the retention period in days. You can deactivate the retention policy by setting this value to 0 but this will block you from taking any backup of your cluster. - -

-

- Choose the backup retention policy -
Choose the backup retention policy
-
-

- -### Step 6 Managed Containers -Hopsworks cluster can integrate with Google GKE (GKE) to launch Python jobs, Jupyter servers, and ML model servings on top of Google GKE. For more detail on how to set up this integration refer to [Integration with Google GKE](gke_integration.md). -

-

- Add GKE cluster name -
Add GKE cluster name
-
-

- - -### Step 7 VPC and Subnet selection - -You can select the VPC which will be used by the Hopsworks cluster. -You can either select an existing VPC or let [managed.hopsworks.ai](https://managed.hopsworks.ai) create one for you. -If you decide to use restricted [managed.hopsworks.ai](https://managed.hopsworks.ai) permissions (see [restrictive-permissions](../restrictive_permissions/#create-a-vpc-permissions) for more details) -you will need to select an existing VPC here. - -

-

- Select the vpc -
Select the vpc
-
-

- -If you selected an existing VPC in the previous step, this step lets you select which subnet of this VPC to use. - -If you did not select an existing virtual network in the previous step [managed.hopsworks.ai](https://managed.hopsworks.ai) will create a subnet for you. -You can choose the CIDR block this subnet will use. -Select the *Subnet* to be used by your cluster and press *Next*. - -

-

- Select the subnet -
Select the subnet
-
-

- - -### Step 8 User management selection -In this step, you can choose which user management system to use. You have three choices: - -* *Managed*: [managed.hopsworks.ai](https://managed.hopsworks.ai) automatically adds and removes users from the Hopsworks cluster when you add and remove users from your organization (more details [here](../common/user_management.md)). -* *OAuth2*: integrate the cluster with your organization's OAuth2 identity provider. See [Use OAuth2 for user management](../common/sso/oauth.md) for more detail. -* *LDAP*: integrate the cluster with your organization's LDAP/ActiveDirectory server. See [Use LDAP for user management](../common/sso/ldap.md) for more detail. -* *Disabled*: let you manage users manually from within Hopsworks. - -

-

- Choose user management type -
Choose user management type
-
-

- -### Step 9 Managed RonDB -Hopsworks uses [RonDB](https://www.rondb.com/) as a database engine for its online Feature Store. By default database will run on its -own VM. Premium users can scale-out database services to multiple VMs -to handle increased workload. - -For details on how to configure RonDB check our guide [here](../common/rondb.md). - -

-

- Configure RonDB -
Configure RonDB
-
-

- -If you need to deploy a RonDB cluster instead of a single node please contact [us](mailto:sales@logicalclocks.com). - -### Step 10 add tags to your instances. -In this step, you can define tags that will be added to the cluster virtual machines. - -

-

- Add tags -
Add tags
-
-

- -### Step 10 add an init script to your instances. -In this step, you can enter an initialization script that will be run at startup on every instance. - -You can select whether this script will run before or after the VM -configuration. **Be cautious** if you select to run it before the VM -configuration as this might affect Cluster creation. - -!!! note - this init script must be a bash script starting with *#!/usr/bin/env bash* - -

-

- Add initialization script -
Add initialization script
-
-

- -### Step 12 Review and create -Review all information and select *Create*: - -

-

- Review cluster information -
Review cluster information
-
-

- -The cluster will start. This will take a few minutes: - -

-

- Booting Hopsworks cluster -
Booting Hopsworks cluster
-
-

- -As soon as the cluster has started, you will be able to log in to your new Hopsworks cluster. You will also be able to stop, restart, or terminate the cluster. - -

-

- Running Hopsworks cluster -
Running Hopsworks cluster
-
-

diff --git a/docs/setup_installation/gcp/getting_started.md b/docs/setup_installation/gcp/getting_started.md index 2a8fddb11..2b36e9d85 100644 --- a/docs/setup_installation/gcp/getting_started.md +++ b/docs/setup_installation/gcp/getting_started.md @@ -1,75 +1,46 @@ -# Getting started with managed.hopsworks.ai (Google Cloud Platform) +# GCP - Getting started with GKE + +Kubernetes and Helm are used to install & run Hopsworks and the Feature Store +in the cloud. They both integrate seamlessly with third-party platforms such as Databricks, +SageMaker and KubeFlow. This guide shows how to set up the Hopsworks platform in your organization's Google Cloud Platform's (GCP) account. -[Managed.hopsworks.ai](https://managed.hopsworks.ai/) is our managed platform for running Hopsworks and the Feature Store -in the cloud. It integrates seamlessly with third-party platforms such as Databricks, -SageMaker and KubeFlow. This guide shows how to set up [managed.hopsworks.ai](https://managed.hopsworks.ai/) with your organization's Google Cloud Platform's (GCP) account. ## Prerequisites -To follow the instruction of this page you will need the following: +To follow the instruction on this page you will need the following: -- A GCP project in which the Hopsworks cluster will be deployed. +- Kubernetes Version: Hopsworks can be deployed on AKS clusters running Kubernetes >= 1.27.0. - The [gcloud CLI](https://cloud.google.com/sdk/gcloud) - The [gsutil tool](https://cloud.google.com/storage/docs/gsutil) +- kubectl (to manage the AKS cluster) +- helm (to deploy Hopsworks) -To run all the commands on this page the user needs to have at least the following permissions on the GCP project: -``` - iam.roles.create - iam.roles.list - iam.serviceAccountKeys.create - iam.serviceAccounts.create - resourcemanager.projects.getIamPolicy - resourcemanager.projects.setIamPolicy - serviceusage.services.enable - storage.buckets.create -``` - -Make sure to enable *Compute Engine API*, *Cloud Resource Manager API*, *Identity and Access Management (IAM) API*, and *Artifact Registry* on the GCP project. This can be done by running the following commands. Replacing *$PROJECT_ID* with the id of your GCP project. -```bash -gcloud --project=$PROJECT_ID services enable compute.googleapis.com -gcloud --project=$PROJECT_ID services enable cloudresourcemanager.googleapis.com -gcloud --project=$PROJECT_ID services enable iam.googleapis.com -gcloud --project=$PROJECT_ID services enable artifactregistry.googleapis.com -``` -You can find more information about GCP cloud APIs in the [GCP documentation](https://cloud.google.com/apis/docs/getting-started). -## Step 1: Connecting your GCP account +## GCR Registry -[Managed.hopsworks.ai](https://managed.hopsworks.ai/) deploys Hopsworks clusters to a project in your GCP account. [Managed.hopsworks.ai](https://managed.hopsworks.ai/) uses service account keys to connect to your GCP project. To enable this, you need to create a service account in your GCP project. Assign to the service account the required permissions. And, create a service account key JSON. For more details about creating and managing service accounts steps in GCP, see [documentation](https://cloud.google.com/iam/docs/creating-managing-service-accounts). +Hopsworks allows users to customize images for Python jobs, Jupyter Notebooks, and (Py)Spark applications. These images should be stored in Google Container Registry (GCR). The GKE cluster needs access to a GCR repository to push project images. -In [managed.hopsworks.ai](https://managed.hopsworks.ai/) click on *Connect to GCP* or go to *Settings* and click on *Configure* next to *GCP*. This will direct you to a page with the instructions needed to create the service account and set up the connection. Follow the instructions. +### Permissions -!!! note - it is possible to limit the permissions that step up during this phase. For more details see [restrictive-permissions](restrictive_permissions.md). +- The deployment requires cluster admin access to create ClusterRoles, ServiceAccounts, and ClusterRoleBindings. +- A namespace is required to deploy the Hopsworks stack. If you don’t have permissions to create a namespace, ask your GKE administrator to provision one. -

-

- GCP configuration page -
GCP configuration page
-
-

-## Step 2: Creating storage +## Step 1: GCP GKE Setup -The Hopsworks clusters deployed by [managed.hopsworks.ai](https://managed.hopsworks.ai/) store their data in a bucket in your GCP account. This bucket needs to be created before creating the Hopsworks cluster. +### Step 1.1: Create a Google Cloud Storage (GCS) bucket -Execute the following gsutil command to create a bucket. Replace all occurrences $PROJECT_ID with your GCP project id and $BUCKET_NAME with the name you want to give to your bucket. You can also replace US with another location if you are not going to run your cluster in this *Multi-Region (see note below for more details). +Create a bucket to store project data. Ensure the bucket is in the same region as your GKE cluster for performance and cost optimization. +```bash +gsutil mb -l $region gs://$bucket_name ``` -gsutil mb -p $PROJECT_ID -l US gs://$BUCKET_NAME -``` - -!!! note - The Hopsworks cluster created by [managed.hopsworks.ai](https://managed.hopsworks.ai/) must be in the same region as the bucket. The above command will create the bucket in the US so in the following steps, you must deploy your cluster in a US region. If you want to deploy your cluster in another part of the world us the *-l* option of *gsutil mb*. For more details about creating buckets with gsutil, see the [google documentation](https://cloud.google.com/storage/docs/creating-buckets) +### Step 1.2: Create Service Account -## Step 3: Creating a service account for your cluster instances -The cluster instances will need to be granted permission to access the storage bucket and the artifact registry. You achieve this by creating a service account that will later be attached to the Hopsworks cluster instances. This service account should be different from the service account created in step 1, as it has only those permissions related to storing objects in a GCP bucket and docker images in an artifact registry repository. +Create a file named hopsworksai_role.yaml with the following content: -### Step 3.1: Creating a custom role for accessing storage -Create a file named *hopsworksai_instances_role.yaml* with the following content: - -```yaml +```bash title: Hopsworks AI Instances description: Role that allows Hopsworks AI Instances to access resources stage: GA @@ -93,138 +64,139 @@ includedPermissions: - artifactregistry.tags.delete ``` -!!! note - it is possible to limit the permissions that set up during this phase. For more details see [restrictive-permissions](restrictive_permissions.md#limiting-the-instances-service-account-permissions). - Execute the following gcloud command to create a custom role from the file. Replace $PROJECT_ID with your GCP project id: -``` -gcloud iam roles create hopsworksai_instances \ - --project=$PROJECT_ID \ - --file=hopsworksai_instances_role.yaml +```bash +gcloud iam roles create hopsworksai_instances --project=$PROJECT_ID --file=hopsworksai_role.yaml ``` -### Step 3.2: Creating a service account +Create a service account: Execute the following gcloud command to create a service account for Hopsworks AI instances. Replace $PROJECT_ID with your GCP project id: -``` -gcloud iam service-accounts create hopsworks-ai-instances \ - --project=$PROJECT_ID \ - --description="Service account for Hopsworks AI instances" \ - --display-name="Hopsworks AI instances" +```bash +gcloud iam service-accounts create hopsworksai_instances --project=$PROJECT_ID --description="Service account for Hopsworks AI instances" --display-name="Hopsworks AI instances" ``` Execute the following gcloud command to bind the custom role to the service account. Replace all occurrences $PROJECT_ID with your GCP project id: +```bash +gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:hopsworks-ai-instances@$PROJECT_ID.iam.gserviceaccount.com" --role="projects/$PROJECT_ID/roles/hopsworksai_instances" +``` + + +### Step 1.3: Create a GKE Cluster + +```bash +gcloud container clusters create --zone --machine-type n2-standard-8 --num-nodes 3 --enable-ip-alias --service-account my-service-account@my-project.iam.gserviceaccount.com ``` -gcloud projects add-iam-policy-binding $PROJECT_ID \ - --member="serviceAccount:hopsworks-ai-instances@$PROJECT_ID.iam.gserviceaccount.com" \ - --role="projects/$PROJECT_ID/roles/hopsworksai_instances" + +### Step 1.4: Create GCR repository + +Enable Artifact Registry and create a GCR repository to store images: + +```bash +gcloud artifacts repositories create --repository-format=docker --location= ``` -## Step 4: Deploying a Hopsworks cluster +### Step 1.5: Link the GCS bucket and the GCR repository -In [managed.hopsworks.ai](https://managed.hopsworks.ai/), select *Create cluster*: +```bash +gsutil iam ch serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com:objectViewer gs://YOUR_BUCKET_NAME +gsutil iam ch serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com:objectAdmin gs://YOUR_BUCKET_NAME -

-

- Create a Hopsworks cluster -
Create a Hopsworks cluster
-
-

+gsutil iam ch serviceAccount:YOUR_EMAIL_ADDRESS:objectViewer gs://YOUR_BUCKET_NAME +gsutil iam ch serviceAccount:YOUR_EMAIL_ADDRESS:objectAdmin gs://YOUR_BUCKET_NAME -Select the *Project* (1) in which you created your *Bucket* and *Service Account* (see above). +gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:SERVICE_ACCOUNT_EMAIL" --role="roles/storage.objectViewer" +``` -!!! note - If the *Project* does not appear in the drop-down, make sure that you properly [Connected your GCP account](#step-1-connecting-your-gcp-account) for this project. +## Step 2: Configure kubectl -Name your cluster (2). Choose the *Region*(3) and *Zone*(4) in which to deploy the cluster. +```bash +gcloud auth configure-docker -!!! warning - The cluster must be deployed in a region having access to the bucket you created above. +kubectl get pods +``` -Select the *Instance type* (5) and *Local storage* (6) size for the cluster *Head node*. +## Step 3: Setup Hopsworks for Deployment -Enter the name of the bucket you created [above](#step-23-creating-a-bucket) in *Cloud Storage Bucket* (7) +### Step 3.1: Add the Hopsworks Helm repository -Press *Next*: +Add the Hopsworks Helm repository -

-

- General configuration -
General configuration
-
-

+```bash +helm repo add hopsworks-dev https://nexus.hops.works/repository/hopsworks-helm-dev --username $NEXUS_USER --password $NEXUS_PASS +helm repo update hopsworks-dev +``` -Select the number of workers you want to start the cluster with (2). -Select the *Instance type* (3) and *Local storage* size (4) for the *worker nodes*. +### Step 3.2: Create Hopsworks namespace & secrets -!!! note - It is possible to [add or remove workers](../common/adding_removing_workers.md) or to [enable autoscaling](../common/autoscaling.md) once the cluster is running. +```bash +kubectl create namespace hopsworks -Press *Next*: +kubectl create secret docker-registry regcred --namespace=hopsworks --docker-server=docker.hops.works --docker-username=$NEXUS_USER --docker-password=$NEXUS_PASS --docker-email=$NEXUS_EMAIL_ADDRESS +``` -

-

- Create a Hopsworks cluster, static workers configuration -
Create a Hopsworks cluster, static workers configuration
-
-

+### Step 3.3: Create helm values file -Enter *Email* of the instances *service account* that you created [above](#step-22-creating-a-service-account). If you followed the instruction it should be *hopsworks-ai-instances@$PROJECT_ID.iam.gserviceaccount.com* with $PROJECT_ID the name of your project: +Below is a simplifield values.gcp.yaml file to get started which can be updated for improved performance and further customisation. -

-

- Set the instance service account -
Set the instance service account
-
-

+```bash +global: + _hopsworks: + storageClassName: null + cloudProvider: "GCP" + managedDockerRegistery: + enabled: true + domain: "europe-north1-docker.pkg.dev" + namespace: "PROJECT_ID/hopsworks" + credHelper: + enabled: true + secretName: &gcpregcred "gcpregcred" + + managedObjectStorage: + enabled: true + s3: + bucket: + name: &bucket "hopsworks" + region: ®ion "europe-north1" + endpoint: &gcpendpoint "https://storage.cloud.google.com" + secret: + name: &gcpcredentials "gcp-credentials" + acess_key_id: &gcpaccesskey "access-key-id" + secret_key_id: &gcpsecretkey "secret-access-key" + minio: + enabled: false +``` -To backup the storage bucket data when taking a cluster backup we need to set a retention policy for the bucket. You can deactivate the retention policy by setting this value to 0 but this will block you from taking any backup of your cluster. Choose the retention period in days and click on *Review and submit*. +## Step 4: Deploy Hopsworks -

-

- Choose the backup retention policy -
Choose the backup retention policy
-
-

+Deploy Hopsworks in the created namespace. -Review all information and select *Create*: +```bash +helm install hopsworks hopsworks-dev/hopsworks --devel --namespace hopsworks --values values.gcp.yaml --timeout=600s +``` -

-

- Review cluster information -
Review cluster information
-
-

+Check that Hopsworks is installing on your provisioned AKS cluster. -!!! note - We skipped cluster creation steps that are not mandatory. +```bash +kubectl get pods --namespace=hopsworks -The cluster will start. This will take a few minutes: +kubectl get svc -n hopsworks -o wide +``` -

-

- Booting Hopsworks cluster -
Booting Hopsworks cluster
-
-

+Upon completion (circa 20 minutes), setup a load balancer to access Hopsworks: -As soon as the cluster has started, you will be able to log in to your new Hopsworks cluster. You will also be able to stop, restart or terminate the cluster. +```bash +kubectl expose deployment hopsworks --type=LoadBalancer --name=hopsworks-service --namespace +``` -

-

- Running Hopsworks cluster -
Running Hopsworks cluster
-
-

## Step 5: Next steps Check out our other guides for how to get started with Hopsworks and the Feature Store: -* Make Hopsworks services [accessible from outside services](../common/services.md) * Get started with the [Hopsworks Feature Store](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/quickstart.ipynb){:target="_blank"} * Follow one of our [tutorials](../../tutorials/index.md) * Follow one of our [Guide](../../user_guides/index.md) diff --git a/docs/setup_installation/gcp/gke_integration.md b/docs/setup_installation/gcp/gke_integration.md deleted file mode 100644 index b17b8749f..000000000 --- a/docs/setup_installation/gcp/gke_integration.md +++ /dev/null @@ -1,108 +0,0 @@ -# Integration with Google GKE - -This guide demonstrates the step-by-step process to create a cluster in [managed.hopsworks.ai](https://managed.hopsworks.ai) with integrated support for Google Kubernetes Engine (GKE). This enables Hopsworks to launch Python jobs, Jupyter servers, and serve models on top of GKE. - -!!! note - Currently, we don't support sharing GKE clusters between Hopsworks clusters. That is, a GKE cluster can be only used by one Hopsworks cluster. Also, we only support integration with GKE in the same project as Hopsworks cluster. - -!!! note - If you prefer to use Terraform over gcloud command line, then you can refer to our Terraform example [here](https://github.com/logicalclocks/terraform-provider-hopsworksai/tree/main/examples/complete/gcp/gke). - -## Step 1: Attach Kubernetes developer role to the service account for cluster instances - -Ensure that the Hopsworks cluster has access to the GKE cluster by attaching the Kubernetes Engine Developer role to the [service account you will attach to the cluster nodes](getting_started.md#step-3-creating-a-service-account-for-your-cluster-instances). Execute the following gcloud command to attach `roles/container.developer` to the cluster service account. Replace *\$PROJECT_ID* with your GCP project id and *\$SERVICE_ACCOUNT* with your service account that you have created during getting started [Step 3](getting_started.md#step-3-creating-a-service-account-for-your-cluster-instances). - -```bash -gcloud projects add-iam-policy-binding $PROJECT_ID --member=$SERVICE_ACCOUNT --role="roles/container.developer" -``` - -## Steps 2: Create a virtual network to be used by Hopsworks and GKE - -You need to create a virtual network and a subnet in which Hopsworks and the GKE nodes will run. To do this run the following commands, replacing *\$PROJECT_ID* with your GCP project id in which you will run your cluster and *\$SERVICE_ACCOUNT* with the service account that you have updated in [Step 1](#step-1-attach-kubernetes-developer-role-to-the-service-account-for-cluster-instances). In this step, we will create a virtual network `hopsworks`, a subnetwork `hopsworks-eu-north`, and 3 firewall rules to allow communication within the virtual network and allow inbound http and https traffic. - -```bash -gcloud compute networks create hopsworks --project=$PROJECT_ID --subnet-mode=custom --mtu=1460 --bgp-routing-mode=regional - -gcloud compute networks subnets create hopsworks-eu-north --project=$PROJECT_ID --range=10.1.0.0/24 --stack-type=IPV4_ONLY --network=hopsworks --region=europe-north1 - -gcloud compute firewall-rules create hopsworks-nodetonode --network=hopsworks --allow=all --direction=INGRESS --target-service-accounts=$SERVICE_ACCOUNT --source-service-accounts=$SERVICE_ACCOUNT --project=$PROJECT_ID - -gcloud compute firewall-rules create hopsworks-inbound-http --network=hopsworks --allow=all --direction=INGRESS --target-service-accounts=$SERVICE_ACCOUNT --allow=tcp:80 --source-ranges="0.0.0.0/0" --project=$PROJECT_ID - -gcloud compute firewall-rules create hopsworks-inbound-https --network=hopsworks --allow=all --direction=INGRESS --target-service-accounts=$SERVICE_ACCOUNT --allow=tcp:443 --source-ranges="0.0.0.0/0" --project=$PROJECT_ID - -``` - -## Step 3: Create a GKE cluster - -In this step, we create a GKE cluster and we set the cluster pod CIDR to `10.124.0.0/14`. GKE offers two different modes of operation for clusters: `Autopilot` and `Standard` clusters. Choose one of the two following options to create a GKE cluster. - -### Option 1: Standard cluster - -Run the following gcloud command to create a zonal standard GKE cluster. Replace *\$PROJECT_ID* with your GCP project id in which you will run your cluster. - -```bash -gcloud container clusters create hopsworks-gke --project=$PROJECT_ID --machine-type="e2-standard-8" --num-nodes=1 --zone="europe-north1-c" --network="hopsworks" --subnetwork="hopsworks-eu-north" --cluster-ipv4-cidr="10.124.0.0/14" --cluster-version="1.27.3-gke.100" -``` - -Run the following gcloud command to allow all incoming traffic from the GKE cluster to Hopsworks. - -```bash -gcloud compute firewall-rules create hopsworks-allow-traffic-from-gke-pods --project=$PROJECT_ID --network="hopsworks" --direction=INGRESS --priority=1000 --action=ALLOW --rules=all --source-ranges="10.124.0.0/14" -``` - -### Option 2: Autopilot cluster - -Run the following gcloud command to create an autopilot cluster. Replace *\$PROJECT_ID* with your GCP project id in which you will run your cluster. - -```bash -gcloud container clusters create-auto hopsworks-gke --project $PROJECT_ID --region="europe-north1" --network="hopsworks" --subnetwork="hopsworks-eu-north" --cluster-ipv4-cidr="10.124.0.0/14" -``` - -Run the following gcloud command to allow all incoming traffic from the GKE cluster to Hopsworks. - -```bash -gcloud compute firewall-rules create hopsworks-allow-traffic-from-gke-pods --project=$PROJECT_ID --network="hopsworks" --direction=INGRESS --priority=1000 --action=ALLOW --rules=all --source-ranges="10.124.0.0/14" -``` - -## Step 4: Create a Hopsworks cluster - -In [managed.hopsworks.ai](https://managed.hopsworks.ai), follow the same instructions as in [the cluster creation guide](cluster_creation.md) except when setting *Region*, *Managed Containers*, *VPC* and *Subnet*. - -- On the General tab, choose the same region as what we use in [Step 2](#steps-2-create-a-virtual-network-to-be-used-by-hopsworks-and-gke) and [Step 3](#step-3-create-a-gke-cluster) (`europe-north1`) -- On the *Managed Containers* tab, choose **Enabled** and input the name of the GKE cluster that we have created in [Step 3](#step-3-create-a-gke-cluster) (`hopsworks-gke`) -- On the VPC and Subnet tabs, choose the name of the network and subnetwork that we have created in [Step 2](#steps-2-create-a-virtual-network-to-be-used-by-hopsworks-and-gke) (`hopsworks`, `hopsworks-eu-north`). - -## Step 5: Configure DNS - -### Option 1: Standard cluster -In the setup described in [Step 3](#option-1-standard-cluster), we are using the default DNS which is `kube-dns`. Hopsworks automatically configures `kube-dns` during cluster initialization, so there is no extra steps that needs to be done here. - -Alternatively, if you configure `Cloud DNS` while creating the standard GKE cluster, then you would need to add the following firewall rule to allow the incoming traffic from `Cloud DNS` on port `53` to Hopsworks. `35.199.192.0/19` is the ip range used by Cloud DNS to issue DNS requests, check [this guide](https://cloud.google.com/dns/docs/zones/forwarding-zones#firewall-rules) for more details. - -```bash -gcloud compute --project=$PROJECT_ID firewall-rules create hopsworks-clouddns-forward-consul --direction=INGRESS --priority=1000 --network="hopsworks" --action=ALLOW --rules=udp:53 --source-ranges="35.199.192.0/19" -``` - - -### Option 2: Autopilot cluster - -Hopsworks internally uses Consul for service discovery and we automatically forward traffic from Standard GKE clusters to the corresponding Hopsworks cluster. However, Autopilot clusters don't allow updating the DNS configurations through `kube-dns` and they use Cloud DNS by default. Therefore, in order to allow seamless communication between pods running on GKE and Hopsworks, we would need to add a [forwarding zone](https://cloud.google.com/dns/docs/zones/forwarding-zones) to Cloud DNS to forward `.consul` DNS Zone to Hopsworks head node. - -First, we need to get the IP of the Hopsworks head node of your cluster. Replace *\$PROJECT_ID* with your GCP project id in which you will run your cluster, *\$CLUSTER_NAME* with the name you gave to your Hopsworks cluster during creation in [Step 4](#step-4-create-a-hopsworks-cluster). Using the following gcloud command, we will be able to get the internal IP of Hopsworks cluster. - -```bash -HOPSWORKS_HEAD_IP=`gcloud compute instances describe --project=$PROJECT_ID $CLUSTER_NAME-master --format='get(networkInterfaces[0].networkIP)'` -``` - -Use the *\$HOPSWORKS_HEAD_IP* you just got from the above command to create the following forwarding zone on Cloud DNS - -```bash -gcloud dns --project=$PROJECT_ID managed-zones create hopsworks-consul --description="Forward .consul DNS requests to Hopsworks" --dns-name="consul." --visibility="private" --networks="hopsworks" --forwarding-targets=$HOPSWORKS_HEAD_IP -``` - -Finally, you would need to add the following firewall rule to allow the incoming traffic from `Cloud DNS` on port `53` to Hopsworks. `35.199.192.0/19` is the IP range used by Cloud DNS to issue DNS requests, check [this guide](https://cloud.google.com/dns/docs/zones/forwarding-zones#firewall-rules) for more details. - -```bash -gcloud compute --project=$PROJECT_ID firewall-rules create hopsworks-clouddns-forward-consul --direction=INGRESS --priority=1000 --network="hopsworks" --action=ALLOW --rules=udp:53 --source-ranges="35.199.192.0/19" -``` \ No newline at end of file diff --git a/docs/setup_installation/gcp/restrictive_permissions.md b/docs/setup_installation/gcp/restrictive_permissions.md deleted file mode 100644 index 88a07ab0d..000000000 --- a/docs/setup_installation/gcp/restrictive_permissions.md +++ /dev/null @@ -1,96 +0,0 @@ -# Limiting GCP permissions - -[Managed.hopsworks.ai](https://managed.hopsworks.ai) requires a set of permissions to be able to manage resources in the user’s GCP project. -By default, these permissions are set to easily allow a wide range of different configurations and allow -us to automate as many steps as possible. While we ensure to never access resources we shouldn’t, -we do understand that this might not be enough for your organization or security policy. -This guide explains how to lock down access permissions following the IT security policy principle of least privilege. - -## Default permissions -This is the list of default permissions that are required by [managed.hopsworks.ai](https://managed.hopsworks.ai). If you prefer to limit these permissions, then proceed to the [next section](#limiting-the-account-service-account-permissions). - -```yaml -{!setup_installation/gcp/gcp_permissions.yml!} -``` - -## Limiting the Account Service Account permissions - -Some of the permissions set up when connection your GCP account to [managed.hopsworks.ai](https://managed.hopsworks.ai) ([here](getting_started.md#step-1-connecting-your-gcp-account)) can be removed under certain conditions. - -### Backup permissions - -The following permissions are only needed for the backup feature. If you are not going to create backups or if you do not have access to this Enterprise feature, you can limit the permission of the Service Account by removing them. - -```yaml -- compute.disks.createSnapshot -- compute.snapshots.create -- compute.snapshots.delete -- compute.snapshots.setLabels -- compute.snapshots.get -- compute.snapshots.useReadOnly -``` - -### Instance type modification permissions - -The following permission is only needed to be able to change the head node and RonDB nodes instance type on an existing cluster ([documentation](../common/scalingup.md)). If you are not going to use this feature, you can limit the permission of the Service Account by removing it. - -```yaml -- compute.instances.setMachineType -``` - -### Instance Service account check -The following permissions are only needed to check that the service account you select for your cluster has the proper permissions. If you remove them the check will be skipped. This may result in your cluster taking more time to detect the error if the service account does not have the proper permissions - -```yaml -- iam.roles.get -- resourcemanager.projects.getIamPolicy -``` - -### Create a VPC permissions -The following permissions are only needed if you want [managed.hopsworks.ai](https://managed.hopsworks.ai) to create VPC and subnet for you. - -```yaml -- compute.networks.create -- compute.networks.delete -- compute.networks.get -- compute.subnetworks.create -- compute.subnetworks.delete -- compute.subnetworks.get -- compute.firewalls.create -- compute.firewalls.delete -``` - -You can remove these permissions by creating your own VPC, subnet, and firewalls and selecting them during cluster creation. For the VPC to be accepted you will need to associate it with firewall rules with the following constraints: - -- One firewall rule associated with the VPC must allow ingress communication on all ports for communication between sources with the [service account you will attach to the cluster nodes](getting_started.md#step-3-creating-a-service-account-for-your-cluster-instances) (source and target). To create such a rule, run the following command replacing *\$NETWORK* with the name of your VPC, *\$SERVICE_ACCOUNT* with the email of your service account, and *\$PROJECT_ID* with the id of your project: - -```bash -gcloud compute firewall-rules create nodetonode --network=$NETWORK --allow=all --direction=INGRESS --target-service-accounts=$SERVICE_ACCOUNT --source-service-accounts=$SERVICE_ACCOUNT --project=$PROJECT_ID -``` - -- We recommend that you have a firewall rule associated with the VPC allowing TCP ingress communication to ports 443 and 80. If you don't have a rule opening port 443 the cluster will not be accessible from the internet. If you don't have a rule opening port 80 your cluster will be created with a self-signed certificate. You will have to acknowledge it by checking the *Continue with self-signed certificate* check box during [subnet selection](cluster_creation.md#step-6-vpc-and-subnet-selection). Depending on your firewall rules setup at cluster creation [managed.hopsworks.ai](https://managed.hopsworks.ai) may not be able to manage services port at run time. In such a case you will have to open and close the ports yourself in the cluster provider. To create this rule, run the following command replacing *\$NETWORK* with the name of your VPC, *\$SERVICE_ACCOUNT* with the email of your service account and *\$PROJECT_ID* with the id of your project: - -```bash -gcloud compute firewall-rules create inbound --network=$NETWORK --allow=all --direction=INGRESS --target-service-accounts=$SERVICE_ACCOUNT --allow=tcp:80,tcp:443 --source-ranges="0.0.0.0/0" --project=$PROJECT_ID -``` - -### Update Firewall - -The following permission is only needed to open and close service ports on the cluster. If you are not intending to open and close these ports from [managed.hopsworks.ai](https://managed.hopsworks.ai) you can remove the permission. - -```yaml -- compute.firewalls.update -``` - -## Limiting the Instances Service Account permissions - -Some of the permissions set up for the instances service account used during cluster creation ([here](cluster_creation.md#step-4-select-the-service-account)) can be removed under certain conditions. - -### Backups - -If you do not intend to take backups or if you do not have access to this Enterprise feature you can remove the permissions that are only used by the backup feature when configuring your instances service account. -For this remove the following permission from [your instances service account](getting_started.md#step-21-creating-a-custom-role-for-accessing-storage): - -```yaml - storage.buckets.update -``` \ No newline at end of file From 1ffa69505f5200135cb5ad7c08ae58a53897aba7 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Mon, 7 Oct 2024 21:58:09 +0100 Subject: [PATCH 10/24] Remove Managed files and top-level tab. -Remove references to managed.hopsworks.ai --- docs/common/adding_removing_workers.md | 72 --- docs/common/api_key.md | 58 -- docs/common/arrow_flight_duckdb.md | 56 -- docs/common/autoscaling.md | 60 -- docs/common/backup.md | 149 ----- docs/common/dashboard.md | 130 ---- docs/common/rondb.md | 365 ----------- docs/common/scalingup.md | 100 ---- docs/common/services.md | 58 -- docs/common/settings.md | 78 --- docs/common/sso/ldap.md | 32 - docs/common/sso/oauth.md | 88 --- docs/common/terraform.md | 565 ------------------ docs/common/user_management.md | 95 --- docs/js/dropdown.js | 4 +- .../admin/oauth2/create-azure-client.md | 6 - docs/setup_installation/admin/user.md | 2 +- docs/setup_installation/admin/variables.md | 2 +- .../fs/feature_group/data_validation.md | 2 +- .../fs/feature_group/feature_monitoring.md | 4 +- .../fs/feature_view/feature_monitoring.md | 4 +- .../integrations/emr/networking.md | 12 - docs/user_guides/integrations/python.md | 1 - mkdocs.yml | 16 - 24 files changed, 9 insertions(+), 1950 deletions(-) delete mode 100644 docs/common/adding_removing_workers.md delete mode 100644 docs/common/api_key.md delete mode 100644 docs/common/arrow_flight_duckdb.md delete mode 100644 docs/common/autoscaling.md delete mode 100644 docs/common/backup.md delete mode 100644 docs/common/dashboard.md delete mode 100644 docs/common/rondb.md delete mode 100644 docs/common/scalingup.md delete mode 100644 docs/common/services.md delete mode 100644 docs/common/settings.md delete mode 100644 docs/common/sso/ldap.md delete mode 100644 docs/common/sso/oauth.md delete mode 100644 docs/common/terraform.md delete mode 100644 docs/common/user_management.md diff --git a/docs/common/adding_removing_workers.md b/docs/common/adding_removing_workers.md deleted file mode 100644 index a9658cf6b..000000000 --- a/docs/common/adding_removing_workers.md +++ /dev/null @@ -1,72 +0,0 @@ -# Adding and removing workers -Once you have started a Hopsworks cluster you can add and remove workers from the cluster to accommodate your workload. - -## Adding workers -If the computation you are running is using all the resources of your Hopsworks cluster you can add workers to your cluster. -To add workers to a cluster, go to the *Details* tab of this cluster and click on *Add workers*. - -

-

- Add worker -
Add worker
-
-

- -Select the number of workers you want to add (1). Select the type of instance you want the workers to run on (2). Select the local storage size for the workers (3). Click on *Next*. - -

-

- Add workers config -
Add workers
-
-

- -Review your request and click *Add*. - -

-

- Add workers review -
Add workers
-
-

- -[Managed.hopsworks.ai](https://managed.hopsworks.ai) will start the new workers and you will be able to use them in your cluster as soon as they have finished starting. - -## Removing workers - -If the load on your Hopsworks cluster is low, you can decide to remove worker nodes from your cluster. - -!!! warning - When removing workers [managed.hopsworks.ai](https://managed.hopsworks.ai) will try to select workers that can be removed while interfering as little as possible with any ongoing computation. It will also wait for the workers to be done with their computation before stopping them. But, if this computation lasts too long, the worker may get stopped before the computation properly finish. This could interfere with your ongoing computation. - -!!! note - You can remove all the workers of your cluster. If you do so the cluster will be able to store data but not run any computations. This may affect feature store functionality. - -To remove workers from a cluster, go to the *Details* tab of this cluster and click on *Remove workers* - -

-

- Remove worker -
Remove workers
-
-

- -For each of the types of instances existing in your cluster select the number of workers you want to remove and click on *Next*. - -

-

- Remove workers config -
Remove workers
-
-

- -Review your request and click *Remove*. - -

-

- Remove workers review -
Remove workers
-
-

- -[Managed.hopsworks.ai](https://managed.hopsworks.ai) will select the workers corresponding to your criteria which can be stopped with as little interferences as possible with any ongoing computation. It will set them to decommission and stop them when they have finished decommissioning. diff --git a/docs/common/api_key.md b/docs/common/api_key.md deleted file mode 100644 index 268423f6a..000000000 --- a/docs/common/api_key.md +++ /dev/null @@ -1,58 +0,0 @@ -# Managed.hopsworks.ai API Key - -[Managed.hopsworks.ai](https://managed.hopsworks.ai) allows users to generate an API Key that can be used to authenticate and access the [managed.hopsworks.ai](https://managed.hopsworks.ai) REST APIs. - -## Generate an API Key - -First, login to your [managed.hopsworks.ai](https://managed.hopsworks.ai) account, then click on the Settings tab as shown below: - -

-

- Click on Settings -
Click on the Settings tab
-
-

- -Click on the API Key tab, and then click on the *Generate API Key* button: - -

-

- Generate API Key -
Generate an API Key
-
-

- -Copy the generated API Key and store it in a secure location. - -!!! warning - Make sure to copy your API Key now. You won’t be able to see it again. However, you can always delete it and generate a new one. - - -

-

- Copy API Key -
Copy the generated API Key
-
-

- -## Use the API Key - -To access the [managed.hopsworks.ai](https://managed.hopsworks.ai) REST APIs, you should pass the API key as a header **x-api-key** when executing requests on [managed.hopsworks.ai](https://managed.hopsworks.ai) as shown below: - -```bash -curl -XGET -H "x-api-key: " https://api.hopsworks.ai/api/clusters -``` - -Alternatively, you can use your API Key with the [Hopsworks.ai terraform provider](https://registry.terraform.io/providers/logicalclocks/hopsworksai/latest) to manage your Hopsworks clusters using [terraform](https://www.terraform.io/). - -## Delete your API Key - -First, login to your [managed.hopsworks.ai](https://managed.hopsworks.ai) account, click on the Settings tab, then click on the API Key tab, and finally click on *Delete API Key* as shown below: - -

-

- Delete API Key -
Delete your API Key
-
-

- diff --git a/docs/common/arrow_flight_duckdb.md b/docs/common/arrow_flight_duckdb.md deleted file mode 100644 index bc8df34a2..000000000 --- a/docs/common/arrow_flight_duckdb.md +++ /dev/null @@ -1,56 +0,0 @@ -# ArrowFlight Server with DuckDB -By default, Hopsworks uses big data technologies (Spark or Hive) to create training data and read data for Python clients. -This is great for large datasets, but for small or moderately sized datasets (think of the size of data that would fit in a Pandas -DataFrame in your local Python environment), the overhead of starting a Spark or Hive job and doing distributed data processing can be significant. - -ArrowFlight Server with DuckDB significantly reduces the time that Python clients need to read feature groups -and batch inference data from the Feature Store, as well as creating moderately-sized in-memory training datasets. - -When the service is enabled, clients will automatically use it for the following operations: - -- [reading Feature Groups](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#read) -- [reading Queries](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/query_api/#read) -- [reading Training Datasets](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_training_data) -- [creating In-Memory Training Datasets](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#training_data) -- [reading Batch Inference Data](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_batch_data) - -For larger datasets, clients can still make use of the Spark/Hive backend by explicitly setting -`read_options={"use_hive": True}`. - -## Service configuration - -!!! note - Supported only on AWS at the moment. - -!!! note - Make sure that your cross account role has the load balancer permissions as described in [here](../../aws/restrictive_permissions/#load-balancers-permissions-for-external-access), otherwise you have to create and manage the load balancer yourself. - -The ArrowFlight Server is co-located with RonDB in the Hopsworks cluster. -If the ArrowFlight Server is activated, RonDB and ArrowFlight Server can each use up to 50% -of the available resources on the node, so they can co-exist without impacting each other. -Just like RonDB, the ArrowFlight Server can be replicated across multiple nodes to serve more clients at lower latency. -To guarantee high performance, each individual ArrowFlight Server instance processes client requests sequentially. -Requests will be queued for up to 10 minutes before they are rejected. - -

-

- Configure RonDB -
Activate ArrowFlight Server with DuckDB on a RonDB cluster
-
-

- -To deploy ArrowFlight Server on a cluster: - -1. Select "RonDB cluster" -2. Select an instance type with at least 16GB of memory and 4 cores. (*) -3. Tick the checkbox `Enable ArrowFlight Server`. - -(*) The service should have at least the 2x the amount of memory available that a typical Python client would have. - Because RonDB and ArrowFlight Server share the same node we recommend selecting an instance type with at least 4x the - client memory. For example, if the service serves Python clients with typically 4GB of memory, - an instance with at least 16GB of memory should be selected. - An instance with 16GB of memory will be able to read feature groups and training datasets of up to 10-100M rows, - depending on the number of columns and size of the features (~2GB in parquet). The same instance will be able to create - point-in-time correct training datasets with 1-10M rows, also depending on the number and the size of the features. - Larger instances are able to handle larger datasets. The numbers scale roughly linearly with the instance size. - diff --git a/docs/common/autoscaling.md b/docs/common/autoscaling.md deleted file mode 100644 index 6864f10fa..000000000 --- a/docs/common/autoscaling.md +++ /dev/null @@ -1,60 +0,0 @@ -# Autoscaling -If you run a Hopsworks cluster version 2.2 or above you can enable autoscaling to let [managed.hopsworks.ai](https://managed.hopsworks.ai) start and stop workers depending on the demand. - -## Enabling and configuring the autoscaling -Once you have created a cluster you can enable autoscaling by going to the *Details* tab and clicking on *Configure autoscale*. -You can also set up autoscaling during the cluster creation. For more details about this see the cluster creation documentation ([AWS](../aws/cluster_creation.md#autoscaling-workers-configuration), [AZURE](../azure/cluster_creation.md#autoscaling-workers-configuration)). - -

-

- Configure autoscale -
Configure autoscale
-
-

- -Once you have clicked on *Configure autoscale* you will access a form allowing you to configure the autoscaling. You can configure the following: - -1. The instance type you want to use. -2. The size of the instances' disk. -3. The minimum number of workers. -4. The maximum number of workers. -5. The targeted number of standby workers. Setting some resources in standby ensures that there are always some free resources in your cluster. This ensures that requests for new resources are fulfilled promptly. You configure the standby by setting the amount of workers you want to be in standby. For example, if you set a value of *0.5* the system will start a new worker every time the aggregated free cluster resources drop below 50% of a worker's resources. If you set this value to 0 new workers will only be started when a job or notebook request the resources. -6. The time to wait before removing unused resources. One often starts a new computation shortly after finishing the previous one. To avoid having to wait for workers to stop and start between each computation it is recommended to wait before shutting down workers. Here you set the amount of time in seconds resources need to be unused before they get removed from the system. - -!!! note - The standby will not be taken into account if you set the minimum number of workers to 0 and no resources are used in the cluster. This ensures that the number of nodes can fall to 0 when no resources are used. The standby will start to take effect as soon as you start using resources. - -

-

- Configure autoscale details -
Configure autoscale details
-
-

- -Once you have set your configuration you can review it and enable the autoscaling. - -!!! note - There are two scenarios if you already have workers in your cluster when enabling the autoscaling: - - 1. The preexisting workers have the same *instance type* as the one you set up in the autoscaling. In this case, the autoscaling system will manage these workers and start or stop them automatically. - 2. The preexisting workers have a different *instance type* from the one you set up in the autoscaling. In this case, the autoscaling will not manage these nodes but you will still be able to remove them manually. - -

-

- Configure autoscale review -
Configure autoscale review
-
-

- -## Modifying the autoscaling configuration -You can update the autoscale configuration by going to the *Details* tab of the cluster and clicking on *Configure autoscale*. You will then go through the same steps as above. Note that if you change the *instance type*, nodes that currently exist in the cluster with a different instance type will not be managed by the autoscale system anymore and you will have to remove them manually. - -## Disabling the autoscaling -To disable the autoscaling go to the *Details* tab, click on *Disable autoscale* and confirm your action. When you disable autoscaling the nodes that are currently running will keep running. You will need to stop them manually. - -

-

- Disable autoscale -
Disable autoscale
-
-

diff --git a/docs/common/backup.md b/docs/common/backup.md deleted file mode 100644 index 7c6bbe666..000000000 --- a/docs/common/backup.md +++ /dev/null @@ -1,149 +0,0 @@ -# How to take, restore and manage backups in managed.hopsworks.ai - -### Introduction -[Managed.hopsworks.ai](https://managed.hopsworks.ai) is our managed platform for running Hopsworks and the Feature Store in the cloud. When managing a cluster it is important to be able to take and restore backups to handle any failure eventuality. In this tutorial you will learn how to [take](), [restore]() and [manage]() backups in [managed.hopsworks.ai](https://managed.hopsworks.ai) - -## Prerequisites -To follow this tutorial you need to create a cluster in [Managed.hopsworks.ai](https://managed.hopsworks.ai). During the [cluster creation](../aws/cluster_creation.md) you need to set a positive number for the maximum retention period for your backups. This is done in the [backups](../aws/cluster_creation.md#step-6-set-the-backup-retention-policy) step of the cluster creation by setting the wanted retention period in _Validity of cluster backup images_. This step is needed because the backup process relies on the cloud bucket retention policy and needs to configure it before any backup is taken. - -!!! Warning - This value cannot be edited later on so make sure to set the proper one. - -!!! Note - To be able to take backup the cluster instances need special permissions. These permissions are cloud provider specific and indicated on the cluster creation page. - -

-

- Choose the backup retention policy -
Choose the backup retention policy
-
-

- - -## Taking a backup -To take a backup go to the backup tab of your cluster (1) and click on _Create backup_ (2). If you wish to give a name to your backup edit the value in _New backup name_ (3) before clicking on _Create backup_. - -!!! Note - The cluster needs to be running to take a backup. - -

-

- Create a backup -
Create a backup
-
-

- -Taking a backup takes time and necessitates restarting the cluster. To avoid any risk of accidental restart you will be asked to confirm that you want to take a backup. To confirm check the check box and click on the _Backup_ button. - -!!! Warning - This will interrupt any operation currently running on the cluster. Make sure to stop them properly before taking a backup. - -

-

- Confirm backup creation -
Confirm backup creation
-
-

- -You can then wait until the backup is complete. The backup process being underway is indicated next to the cluster status. - -

-

- Backup ongoing -
Backup ongoing
-
-

- -Once the backup is taken your cluster will be back up and running and ready to use. - -## Restoring a backup -Go to the Backup tab of the dashboard (left menu (1)) to the list of all the backups. This list is organized by cluster. For each of the clusters, you can see the state of the cluster (2) and the list of backups for this cluster (3). - -To be able to restore a backup the corresponding cluster needs to be [terminated](./dashboard.md#terminate-the-cluster). If your cluster is not terminated go and terminate it. Once the cluster is terminated you restore a backup by clicking on the _Restore_ button (4) of the backup you want to restore. - -

-

- List of backups -
List of backups
-
-

- - -!!! Warning - Restoring a backup put back the bucket in the state it was at the time of the backup. This makes it impossible to then restore a backup taken more recently. - - If you try to restore a backup that is not the latest backup you will be asked to confirm that you want to restore and thus delete any more recent backup. - -

-

- Delete succeeding backups -
Delete succeeding backups
-
-

- -Once you have clicked on _Restore_ you will be brought to the [cluster creation](../aws/cluster_creation.md) menu. All the entries should be prefilled with the values corresponding to your cluster configuration. You can go through all the cluster configurations to verify (recommended) or directly click on _Review_ in the left menu and click on the _Create_ button. - -

-

- Review and restore backup -
Review and restore backup
-
-

- -A new cluster will be created and set in the state your cluster was at the time of the backup. - -!!! Note - Restoring a backup does not recreate the workers for this cluster. You need to [add the workers](./adding_removing_workers.md) back once the cluster is created. - -## Managing your backups -To manage your backups either go to the Backup tab of your cluster (1) or go to the backup tab of the dashboard (2). - -

-

- Backup tabs -
Backup tabs
-
-

- -If you go to the backup tab of your cluster you will see the list of backups associated with this cluster. For each of the backups, you will see their name (1), id (2), date of creation (3), and status (4). The status can be: - -- **Completed**: the backup has been created and is ready to be restored. -- **Expired**: the backup is older than the maximum retention time set during [cluster creation](../aws/cluster_creation.md#step-6-set-the-backup-retention-policy), it will not be possible to restore it. -- **Failed**: the backup failed during its creation. -- **Running**: the backup is currently being created. -- **Deleting**: the backup is being deleted. - - -

-

- Backup info -
Backup info
-
-

- -If you go to the dashboard backup tab you will get a view of all the backups of all the clusters. For each of the backups, you get the same information as above. - -To delete a backup click on the _Delete_ button on the same line as the backup name. - -

-

- Delete backup -
Delete backup
-
-

- -Once you have clicked on the _Delete_ button, you will be asked to confirm that you want to delete it. Check the check box and click _Delete_ to confirm. - -

-

- Confirm backup deletion -
Confirm backup deletion
-
-

- -The backup will then be deleted. - -## Conclusion -During this tutorial, you have created a backup, restored a cluster from this backup, checked the information about this backup, and finally deleted the backup. - -Now that you have restored a cluster you can [add workers](./adding_removing_workers.md) or set up [autoscale](./autoscaling.md) on it. \ No newline at end of file diff --git a/docs/common/dashboard.md b/docs/common/dashboard.md deleted file mode 100644 index 62451aad8..000000000 --- a/docs/common/dashboard.md +++ /dev/null @@ -1,130 +0,0 @@ -# How to manage your clusters in managed.hopsworks.ai - -### Introduction -[Managed.hopsworks.ai](https://managed.hopsworks.ai) is our managed platform for running Hopsworks and the Feature Store in the cloud. On this page, you will get an overview of the different functionalities of the [managed.hopsworks.ai](https://managed.hopsworks.ai) dashboard. - -## Prerequisites -If you want to navigate the to the different tabs presented in this document you will need to connect [managed.hopsworks.ai](https://managed.hopsworks.ai) and create a cluster. Instructions about this process can be found in the getting started pages ([AWS](../aws/getting_started.md), [Azure](../azure/getting_started.md), [GCP](../gcp/getting_started.md)) - -## Dashboard overview -The landing page of [managed.hopsworks.ai](https://managed.hopsworks.ai) can be seen in the picture below. It is composed of three main parts. At the top, you have a menu bar (1) allowing you to navigate between the dashboard and the [settings](./settings.md). Bellow, you have a menu column (2) allowing you to navigate between different functionalities of the dashboard. And finally, in the middle, you find panels representing your different clusters (3) and a button to [create new clusters](../aws/cluster_creation.md) (4). - -

-

- Dashboard overview -
Dashboard overview
-
-

- -The different functionalities of the dashboard are: - -- **Clusters**: the landing page which we will detail below. -- **Members**: the place to manage the members of your organization ([doc](user_management.md)). -- **Backup**: the place to manage your clusters' backup ([doc](backup.md)). -- **Usage**: the place to get your credits consumption. - -## Managing your clusters in the cluster panel -The cluster panels contain the name of the cluster (1), its state (2) a button to [terminate the cluster](#terminate-the-cluster) (3), a button to [stop the cluster](#stop-the-cluster) (4) and different tabs to manage the cluster (5). You will now learn more details about these actions and tabs. - -

-

- Cluster pannel structure -
Cluster pannel structure
-
-

- -### Stop the cluster -To stop a cluster click on the __Stop__ button on the top right part of the cluster panel. Stopping the cluster stop in instances it is running on and delete the workers nodes. This allows you to save credits in [managed.hopsworks.ai](https://managed.hopsworks.ai) and in your cloud provider. - -

-

- Stop a cluster -
Stop a cluster
-
-

- -### Terminate the cluster -To terminate a cluster click on the __Terminate__ button on the top right part of the cluster panel. - -

-

- Terminate a cluster -
Terminate a cluster
-
-

- -Terminating the cluster will destroy it and delete all the resources that were automatically created during the cluster creation. To be sure that you are not terminating a cluster by accident you will be asked to confirm that you want to terminate the cluster. To confirm the termination, check the check box and click on __Terminate__. - -!!! Note - Terminating a cluster does not delete or empty the bucket associated with the cluster. This is because this bucket is needed to restore a backup. You can find more information about backups in the [backup documentation](./backup.md). - -

-

- Confirm a cluster termination -
Confirm a cluster Termination
-
-

- -### The general tab -The General tab gives you the basic information about your cluster. If you have created a cluster with managed users it will only give you the URL of the cluster. If you have created a cluster without managed cluster it will also give you the user name and password that were set for the admin user at cluster creation. - -

-

- General tab -
General tab
-
-

- -### Manage the services in the services tab -The services tab shows which service ports are open to the internet on your cluster. More details can be found in the [services documentation](./services.md). - -

-

- Services tab -
Services tab
-
-

- -### Get information about your cluster state in the Console tab -The console tab display more detailed information about the current state of your cluster. If your cluster is running and everything is as planned it will only say "everything is ok". But, if something failed, this is where you will find more details about the error. - -

-

- Console tab -
Console tab
-
-

- - -### Manage your backups in the Backups tab -The backups tab is where you create and manage backups for your cluster. You can find more details about the backups, in the [backups documentation](./backup.md). - -

-

- Backups tab -
Backups tab
-
-

- -### Get more details and manage your workers in the Details tab -The Details tab provides you with details about your cluster setup. It is also where you can [add and remove workers](./adding_removing_workers.md) or [configure the autoscaling](./autoscaling.md). - -

-

- Details tab -
Details tab
-
-

- -### Get more details about your cluster RonDB in the RonDB tab -The RonDB tab provides you with details about the instances running RonDB in your cluster. This is also where you can [scale up RonDB](./scalingup.md) if needed. - -

-

- RonDB tab -
RonDB tab
-
-

- -## Conclusion -You now have an overview of where your different cluster information can be found and how you can manage your cluster. To go further you can learn how to [add and remove workers](./adding_removing_workers.md) or [configure the autoscaling](./autoscaling.md) on your cluster or how to take and restore [backups](./backup.md). \ No newline at end of file diff --git a/docs/common/rondb.md b/docs/common/rondb.md deleted file mode 100644 index 3a4c5e756..000000000 --- a/docs/common/rondb.md +++ /dev/null @@ -1,365 +0,0 @@ -# Managed RonDB -For applications where Feature Store's performance and scalability is paramount we give users the option to create clusters with -Managed [RonDB](https://www.rondb.com/). You don't need to worry about configuration as [managed.hopsworks.ai](https://managed.hopsworks.ai) will -automatically pick the best options for your setup. - -To start a new Hopsworks Cluster with a Managed RonDB installation one first goes to the Dashboard view seen below. - -

-

- Dashboard -
Dashboard
-
-

- -After clicking on **Create cluster** we can select the Cloud to use, AWS, GCP or Azure. - -

-

- Choose cloud menu -
Choose cloud menu
-
-

- -The configuration of your RonDB cluster is done in the **RonDB database** tab at the left in your **Create cluster menu**. - -## Single node RonDB - -The minimum setup for a Hopsworks cluster is to run all database services on their own virtual machine additionally to the Head node. -This way the RonDB database service can scale independently of other cluster services. You can pick the VM type of this VM -in the scroll bar **RonDB database instance type**. - -It is also possible to change the size of the local storage in the VM -through the scroll bar **Local storage size**. Normally it isn't necessary to change this setting. - -It is possible to reconfigure from a Single Node RonDB to a RonDB cluster. It is not possible to reconfigure from a RonDB cluster -to a Single Node RonDB. - -When reconfiguring from a Single Node RonDB to RonDB cluster the Single Node VM is converted into the -Management Server VM. The management server VM only needs to use 2 VCPUs in a RonDB cluster. - -The Single Node RonDB is mainly intended for experiments and Proof of Concept installations. For production usage -it is better to select a RonDB cluster and use a single replica if a really small cluster is desired. Scaling from a small -RonDB cluster to a very large RonDB cluster can be done as online operations. - -

-

- Configure RonDB -
Configure RonDB
-
-

- -!!! note - For cluster versions <= 2.5.0 the database services run on Head node - -## RonDB cluster -To setup a cluster with multiple RonDB nodes, select `RonDB cluster` during cluster creation. If this option is not available contact [us](mailto:sales@hopsworks.ai). - -### General -If you enable Managed RonDB you will see a basic configuration page where you can configure the database nodes. - -

-

- RonDB basic configuration -
RonDB basic configuration
-
-

- -#### Data node - -First, you need to select the instance type and local storage size for the data nodes. -These are the database nodes that will store data. - -RonDB in Hopsworks is an in-memory database, so in order to fit more data you need to choose an instance type with more memory. -Local storage is used for offline storage of recovery data. RonDB does support on-disk columns that can be used for on-disk features, this will be accessible from Hopsworks in some future version. - -Since the data node is an in-memory database we only provide options with high memory compared to the CPUs. Most VMs have -8 GByte of memory per VCPU. The VM type is changed with the scroll bar **RonDB Data node instance type**. Local storage size -should be sufficient to use the default. - -#### Number of replicas - -Next you need to select the number of *replicas*. -This is the number of copies the cluster will maintain of your data. - -* Choosing 1 replica is the cheapest option since it requires the lowest number of nodes, but this way you don't have High Availability. - Whenever any of the RonDB data nodes in the cluster fails, the cluster will also fail, so **only** choose 1 replica if you are willing to accept cluster failure. -* The default and recommended is to use 2 replicas, which allows the cluster to continue operating after any one data node fails. -* With 3 replicas, the cluster can continue operating after any two data node failures that happen after each other. - -If you want to try out RonDB with a cheap setup it is possible to select 1 replica when you create the cluster and later -reconfigure the cluster with 2 or 3 replicas. One can also decrease the number of replicas through online reconfiguration. - -#### MySQLd nodes - -Next you can configure the number of MySQLd nodes. -These are dedicated nodes for performing SQL queries against your RonDB cluster. - -The Feature Store will use all the MySQL Servers with load balancing. The load balancing is implemented using Consul. -This means that if a MySQL Server is stopped the application will discover a failure and will reconnect and choose one -of the MySQL Servers that are still up. Similarly when a MySQL Server is started it will be used in the selection of -which MySQL Server to use for a new connection setup. - -Selection of VM type for the MySQL Servers is done in the scroll bar **MySQLd instance type**. As usual the default for -local storage size should be sufficient. - -Feature Stores is a read-heavy application. In such environment it is normal that optimal throughput is to have 50% to -100% more CPUs in the MySQL Servers compared to the Data nodes. MySQL Servers are most efficient up to 32 VCPUs. -Scaling to more MySQL Servers is efficient, thus it is best to first use enough MySQL Servers for high availability, -next to scale them to 32 VCPUs. If more CPUs are needed then scale with even more MySQL Servers. - -MySQL Servers are only clients to the data nodes in RonDB. Thus it will not use a lot of memory, we mainly provide -options with high CPU compared to the memory. - -### Advanced -The advanced tab offers less common options. -One node group will serve most use cases, it is mainly useful to have several node groups if the database doesn't -fit in a single data node VM. API nodes is currently mostly useful for benchmarking, but is intended also for -custom applications with extreme requirements on low latency. - -We recommend keeping the defaults unless you know what you are doing. - -

-

- RonDB advanced configuration -
RonDB advanced configuration
-
-

- -#### RonDB Data node groups -You can choose the number of node groups, also known as database shards. -The default is 1 node group, which means that all nodes will have a complete copy of all data. -Increasing the number of node groups will split the data evenly. -This way, you can create a cluster with higher capacity than a single node. -For use cases where it is possible, we recommend using 1 node group and choose an instance type with enough memory to fit all data. - -Below the number of node groups, you will see a summary of cluster resources. - -* The number of data nodes is an important consideration for the cost of the cluster. - It is calculated as the number of node groups multiplied by the number of replicas. -* The memory available to the cluster is calculated as the number of node groups multiplied by the memory per node. - Note that the number replicas does not affect the available memory. -* The CPUs available to the cluster is calculated as the number of node groups multiplied by the number of CPUs per node. - Note that the number replicas does not affect the available CPUs. - -#### API nodes -API nodes are specialized nodes which can run user code connecting directly to RonDB datanodes for increased performance. - -You can choose the number of nodes, the instance type and local storage size. - -There is also a checkbox to grant access to benchmark tools. -This will let a benchmark user access specific database tables, so that you can benchmark RonDB safely. - -For more information on how to benchmark RonDB, see the [RonDB documentation](https://docs.rondb.com). - -# Online Reconfiguration of RonDB -After creating a RonDB cluster it is possible to resize the RonDB cluster through an online reconfiguration. -This means that both reads and writes of tables and feature groups can continue while the change is ongoing -without any downtime. - -During reconfiguration of data nodes there could be periods where no new tables can be added and tables cannot be -dropped. Reads and writes to all existing tables will however always be possible. - -The online reconfiguration supports increasing and decreasing size of VMs, local storage of MySQL Servers. For -data nodes we support changing to larger VMs and changing numbers of replicas to 1,2 and 3 and changing the -local storage size to a larger size. - -## Why Reconfiguring Storage Client Layer (MySQL Servers) -Reconfiguration of MySQL Servers can be done several times per day to meet current demands by the -application. During a reconfiguration new MySQL Servers are added to the Consul address -onlinefs.mysql.service.consul. Thus new connections will be quickly using the new MySQL Servers. -MySQL Servers that are stopped will be removed from Consul a few seconds before the MySQL Server is stopped. - -Thus an application using the MySQL Server can avoid most, if not all temporary errors due to reconfiguration -by reconnecting to the MySQL Server after using a connection for about 3 seconds. However even without this -the impact on applications due to the reconfiguration will be very small. - -An example of an application could be an online retailer. During nights the activity is low, so the need of -the MySQL Servers is low and one could use 2 x 4 VCPUs. In the morning business picks up and one can increase -the use to 2 x 16 VCPUs. In the evening the business is heavier and one needs 2 x 32 VCPUs to meet the demand. -Finally weekend rushes is handled with 3 x 32 VCPUs. - -## Why Reconfiguring Storage Layer (RonDB Data nodes) -Reconfiguration of the RonDB Data nodes is a bit more rare since the database resides in memory. At the moment -we only support increasing the size of the RonDB data nodes to ensure that we always have room for the database. -Storing the database in-memory provides much lower latency to online applications. - -Thus reconfiguring of the Storage Layer is mainly intended to meet long-term needs of the business in terms of -CPU and memory. In addition one can change the number of replicas to improve the availability of the RonDB -cluster. - -## Reconfiguration of Storage Client Layer -To start a reconfiguration of the MySQL Servers choose the RonDB tab in the Cluster you want to change. -In the Configuration Tab you select the **MySQLd instance type** scroll bar and set the new VM type you -want to use. In the figure above we selected e2-highcpu-16 (previously used e2-highcpu-8), thus doubling -the VCPUs used by the Storage Client Layer. - -

-

- Reconfiguration of MySQL Server -
Reconfiguration of MySQL Server
-
-

- -When you made this choice the **Changes to be submitted** is listed above the **Submit** button. - -If you decide to skip the current change you can push the **Reset** button to return to the original settings and -start again the reconfiguration change. - -After clicking the **Submit** button, the below pop-up window appears. You need to check the **Yes, reconfigure RonDB cluster** -button to proceeed. - -

-

- Accept Reconfiguration -
Accept Reconfiguration
-
-

- -After clicking this button you now need to push **Reconfigure** button in the below pop-up window. - -

-

- Final Accept Reconfiguration -
Final Accept Reconfiguration
-
-

- -After clicking the **Reconfigure** the reconfiguration and you will see the below. - -

-

- Reconfiguration Pending -
Reconfiguration Pending
-
-

- -And shortly thereafter you will see the state change to only reconfiguring. - -

-

- Reconfiguration Ongoing -
Reconfiguration Ongoing
-
-

- -The process to reconfigure a few MySQL Servers will take a few minutes where a major portion is spent on creating -the new MySQL Server VMs. - -During this process we can follow the process by clicking at the **Console** where the state of the Reconfiguration will -be presented in more details. - -### The Reconfiguration process -The first step in the reconfiguration process is to create the new VMs. During this step the we see the below message in -the Console. The steps below isn't all the states shown and could change in future versions. So this part is mostly to give -an idea of what is happening during the reconfiguration process. - -

-

- Reconfiguration Console Waiting for VMs -
Reconfiguration Console Wait For VMs
-
-

- -The next step is to initialise the VMs and the below message is shown while this step is ongoing. - -

-

- Reconfiguration Console Waiting for Initialisation of VMs -
Reconfiguration Console Wait For Initialisation of VMs
-
-

- -When all VMs have been created and initialised we see the below message in the Console. - -

-

- Reconfiguration Console All VMs created -
Reconfiguration Console All VMs created
-
-

- -With all the new MySQL Server VMs created and initialised we are ready to start the first MySQL Server. -During this we see the following message for a short time. - -

-

- Reconfiguration Console Start MySQL Server -
Reconfiguration Console Start MySQL Server
-
-

- -After starting the new MySQL Server we need to insert the MySQL Server into Consul and other post init activities. -During this step we see the message below. Already here the new MySQL Server can start serving queries in the -cluster. - -

-

- Reconfiguration Console Waiting for VMs -
Reconfiguration Console Post Init MySQL Server
-
-

- -At some point the old MySQL Servers need to be stopped and removed from the cluster. During this step we see this -message. After this step the old MySQL Servers are no longer serving queries in the cluster. - -The order of starting MySQL Server and stopping them can change, but we always ensure that we never decrease -the number of MySQL Servers until the final step if at all. - -

-

- Reconfiguration Console Deactivate MySQL Server -
Reconfiguration Console Deactivate MySQL Server
-
-

- -After starting all the MySQL Servers we have some cleanup steps to go through such as deleting the -old VMs. However during this step the cluster is already reconfigured. After all cleanup steps are completed the -final message arrives. As you can see the state is changed to **running**. This means that a new reconfiguration -can be started again as well. Only one reconfiguration at a time is allowed. - -

-

- Reconfiguration Console Final state -
Reconfiguration Console Final state
-
-

- -## Reconfiguration of the Storage Layer (RonDB Data nodes) - -When reconfiguring the Storage Layer we can change 3 things. We can change the VM type to choose a VM type with more -memory and more CPUs. We can change the number of replicas and finally we can change the local storage size. - -We can change all three parameters at once. The below shows an example of how to do this. - -

-

- Reconfiguration Data nodes -
Reconfiguration Data nodes
-
-

- -It is currently not possible to reconfigure API nodes. The VM type differs from cloud -to cloud. - - -The start of a reconfiguration uses the same pop-up windows as when reconfiguring the Storage Client Layer. - -The steps it goes through is slightly different but shares many similarities. However starting a RonDB data node -will take longer time, the time it takes is dependent on the database size. - -## Combined reconfiguration of all Layers -It is possible to reconfigure both the Storage Layer and the Storage Client Layer simultaneously. The process is -the same, but will obviously take a bit more time since more changes are required. - -## RonDB details -Once the cluster is created you can view some details by clicking on the `RonDB` tab, followed by clicking on the -`Nodes` tab as shown in the picture below. - -

-

- RonDB cluster details -
RonDB cluster details
-
-

- diff --git a/docs/common/scalingup.md b/docs/common/scalingup.md deleted file mode 100644 index b2a5b684e..000000000 --- a/docs/common/scalingup.md +++ /dev/null @@ -1,100 +0,0 @@ -# Scaling up -If you run into limitations due to the instance types you chose during a cluster creation it is possible to scale up the instances to overcome these limitations. - -## Scaling up the workers -If spark jobs are not starting in your cluster it may come from the fact that you don't have worker resources to run them. As workers are stateless the best way to solve this problem is to [add new workers](adding_removing_workers.md) with enough resources to handle your job. Or to [configure autoscalling](autoscaling.md) to automatically add the workers when needed. - -## Scaling up the head node -You may run into the need to scale up the head node for different reasons. For example: - -* You are running a cluster without [dedicated RonDB nodes](../aws/cluster_creation.md#step-12-managed-rondb) and have a workload with a high demand on the online feature store. -* You are running a cluster without [managed containers](../aws/cluster_creation.md#step-7-managed-containers) and want to run an important number of jupyter notebooks simultaneously. - -While we are working on implementing a solution to add these features to an existing cluster you can use the following approach to run your head node on an instance with more vcores and memory to handle more load. - -To scale up the head node you first have to stop your cluster. - -

-

- Stop the cluster -
Stop the cluster
-
-

- -Once the cluster is stopped you can go to the *Details* tab and click on the head node *instance type*. - -

-

- Go to details tab an click on the head node instance type -
Go to details tab an click on the head node instance type
-
-

- -This will open a new window. Select the type of instance you want to change to and click on *Review and submit* - -

-

- Select the new instance type for the heade node -
Select the new instance type for the heade node
-
-

- -Verify your choice and click on *Modify* - -!!! note - If you set up your account with AWS in a period predating the introduction of this feature you may need to add the indicated permission to your [managed.hopsworks.ai](https://managed.hopsworks.ai) role. - -

-

- Validate your choice -
Validate your choice
-
-

- -You can now start your cluster. The head node will be started on an instance type of the new type you chose. - -## Scaling up the RonDB nodes - -If you are running a cluster with [dedicated RonDB nodes](../aws/cluster_creation.md#step-12-managed-rondb) and have a workload with a high demand on the online feature store you may need to scale up the RonDB *Datanodes* and *MySQLd* nodes. For this stop the cluster. - -

-

- Stop the cluster -
Stop the cluster
-
-

- -Once the cluster is stopped you can go to the *RonDB* tab. -To scale MySQLd or API nodes, click on the *instance type* for the node you want to scale up. -To scale all datanodes, click on the *Change* button over their instance types. -Datanodes cannot be scaled individually. - -

-

- Go to RonDB tab an click on the instance type you want to change -
Go to RonDB and click on the instance type you want to change or, for datanodes, click on the Change button
-
-

- -This will open a new window. Select the type of instance you want to change to and click on *Review and submit* - -

-

- Select the new instance type for the head node -
Select the new instance type for the head node
-
-

- -Verify your choice and click on *Modify* - -!!! note - If you set up your account with AWS in a period predating the introduction of this feature you may need to add the indicated permission to your [managed.hopsworks.ai](https://managed.hopsworks.ai) role. - -

-

- Validate your choice -
Validate your choice
-
-

- -You can now start your cluster. The nodes will be started on an instance type of the new type you chose. diff --git a/docs/common/services.md b/docs/common/services.md deleted file mode 100644 index 8a762321d..000000000 --- a/docs/common/services.md +++ /dev/null @@ -1,58 +0,0 @@ -# Services -Hopsworks clusters provide several services that can be accessed from outside Hopsworks. In this documentation, we first show how to make these services accessible to external networks. We will then go through the different services to give a short introduction and link to the associated documentation. - -## Outside Access to the Feature Store -By default, only the Hopsworks UI is made available to clients on external networks, like the Internet. -To integrate with external platforms and access APIs for services such as the Feature Store, you have to open the service's ports. - -Open ports by going to the *Services* tab, selecting a service, and pressing *Update*. This will update the *Security Group* attached to the Hopsworks cluster to allow incoming traffic on the relevant ports. - -

-

- Outside Access to the Feature Store -
Outside Access to the Feature Store
-
-

- -If you do not want the ports to be open to the internet you can set up VPC peering between the Hopsworks VPC and your client VPC. You then need to make sure that the ports associated with the services you want to use are open between the two VPCs. The ports associated with each of the services are indicated in the descriptions of the services below. - -## Feature store -The Feature Store is a data management system for managing machine learning features, including the feature engineering code and the feature data. The Feature Store helps ensure that features used during training and serving are consistent and that features are documented and reused within enterprises. You can find more about the feature store [here](../../concepts/fs/index.md) and information about how to connect to the Feature Store from different external services [here](../../user_guides/fs/storage_connector/index.md) - -Ports: 8020, 30010, 9083 and 9085 - -## Online Feature store -The online Feature store is required for online applications, where the goal is to retrieve a single feature vector with low latency and the same logic as was applied to generate the training dataset, such that the vector can subsequently be passed to a machine learning model in production to compute a prediction. You can find a more detailed explanation of the difference between Online and Offline Feature Store [here](../../concepts/fs/feature_group/fg_overview.md#online-and-offline-storage). Once you have opened the ports, the Online Feature store can be used with the same library as the offline feature store. You can find more in the [user guides](../../user_guides/index.md). - -Port: 3306 - -## Kafka -Hopsworks provides Kafka-as-a-Service for streaming applications and to ingest data. You can find more information about how to use Kafka in Hopsworks in this [documentation](../../user_guides/projects/kafka/create_schema.md) - -Port: 9092 - -## SSH -If you want to be able to SSH into the virtual machines running the Hopsworks cluster, you can open the ports using the *Services* tab. You can then SSH into the machine using your cluster operation system (*ubuntu* or *centos*) as the user name and the ssh key you selected during the cluster creation. - -Port: 22. - -## ArrowFlight with DuckDB -Hopsworks provides ArrowFlight Server(s) with DuckDB to significantly reduce the time that Python clients need to read feature groups and batch inference data from the Feature Store, as well as creating moderately-sized in-memory training datasets. - -Port: 5005 - -## OpenSearch -Hopsworks includes a vector database to provide similarity search capabilities for embeddings, based on OpenSearch. - -Port: 9200 - -## Limiting outbound traffic to managed.hopsworks.ai - -If you have enabled the use of static IPs to communicate with [managed.hopsworks.ai](https://managed.hopsworks.ai) as described in [AWS](../../aws/cluster_creation/#limiting-outbound-traffic-to-hopsworksai) and [AZURE](../../azure/cluster_creation/#limiting-outbound-traffic-to-hopsworksai), you need to ensure that your security group allow outbound traffic to the two IPs indicated in the service page. - -

-

- Limiting outbound traffic to managed.hopsworks.ai -
Limiting outbound traffic to managed.hopsworks.ai
-
-

diff --git a/docs/common/settings.md b/docs/common/settings.md deleted file mode 100644 index 3dd8d5d8f..000000000 --- a/docs/common/settings.md +++ /dev/null @@ -1,78 +0,0 @@ -# How to manage your managed.hopsworks.ai account - -### Introduction -[Managed.hopsworks.ai](https://managed.hopsworks.ai) is our managed platform for running Hopsworks and the Feature Store in the cloud. On this page, you will get an overview of the [managed.hopsworks.ai](https://managed.hopsworks.ai) page. - -## How to get to the settings page and what does it look like. -From the [managed.hopsworks.ai](https://managed.hopsworks.ai) [landing page](./dashboard.md), you can access the setting page by clicking on __Settings__ on the top left. - -The settings page contains a menu on the left. The remaining of the page display information depending on the section you have selected in the menu. - -

-

- Getting to the settings page -
Getting to the settings page
-
-

- -## Manage the connection to your cloud accounts under Cloud Accounts -The landing section of the settings page is the __Cloud Accounts__ section. On this page you can edit the link between [managed.hopsworks.ai](https://managed.hopsworks.ai) and your cloud provider by clicking on the __Edit__ button (1). You can delete the link and remove any access from [managed.hopsworks.ai](https://managed.hopsworks.ai) to your cloud manager by clicking on the __Delete__ button (2). Or, you can configure a new connection with a cloud provider by clicking on the __Configure__ button (3). - -For more details about setting up a connection with a cloud provider see the getting started pages for: - -- [AWS](../aws/getting_started.md) -- [Azure](../azure/getting_started.md) -- [GCP](../gcp/getting_started.md) - -

-

- Cloud accounts management -
Cloud accounts management
-
-

- -## Manage your personal information under Profile -The __Profile__ section is where you can edit your personal information such as name, phone number, company, etc. - -

-

- Profile management -
Profile
-
-

- -## Change your password and configure multi-factor authentication under the Security section -### Change your password -To change your password, go to the security section, enter your current password in the __Current password__ field (1), enter the new password in the __New password__ field (2), and click on __Save__ (3). - -

-

- Change password -
Change password
-
-

- -### Set up multi-factor authentication. -To set up multi-factor authentication, go to the security section, scan the QR code (1) with your authenticator app (example: [Google Authenticator](https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2&hl=en&gl=US)). Then enter the security code provided by the authenticator app in the __Security code__ field (2) and click on __Enable TOTP__ (3). - -

-

- Enable MFA -
Enable MFA
-
-

- -## Create and manage API keys under the API keys section -The API key section is where you create or delete your [managed.hopsworks.ai](https://managed.hopsworks.ai) API keys. More details about the API keys can be found in the [API keys documentation](./api_key.md) - -

-

- Generate API Key -
Generate an API Key
-
-

- -## Conclusion -You now know where and how to update your profile, cloud accounts, and API keys. - -To keep familiarizing with [managed.hopsworks.ai](https://managed.hopsworks.ai) check the [dashboard documentation](./dashboard.md) \ No newline at end of file diff --git a/docs/common/sso/ldap.md b/docs/common/sso/ldap.md deleted file mode 100644 index 9435bf46a..000000000 --- a/docs/common/sso/ldap.md +++ /dev/null @@ -1,32 +0,0 @@ -# Configure your Hopsworks cluster to use LDAP for user management. - -If you want to use your organization's LDAP as an identity provider to manage users in your Hopsworks cluster this document -will guide you through the necessary steps to configure [managed.hopsworks.ai](https://managed.hopsworks.ai) to use LDAP. - -The LDAP attributes below are used to configure JNDI resources in the Hopsworks server. -The JNDI resource will communicate with your LDAP server to perform the authentication. -

-

- Setup ldap -
Setup LDAP
-
-

- -- _jndilookupname_: should contain the LDAP domain. -- _hopsworks.ldap.basedn_: the LDAP domain, it should be the same as _jndilookupname_ -- _java.naming.provider.url_: url of your LDAP server with port. -- _java.naming.ldap.attributes.binary_: is the binary unique identifier that will be used in subsequent logins to identify the user. -- _java.naming.security.authentication_: how to authenticate to the LDAP server. -- _java.naming.security.principal_: contains the username of the user that will be used to query LDAP. -- _java.naming.security.credentials_: contains the password of the user that will be used to query LDAP. -- _java.naming.referral_: whether to follow or ignore an alternate location in which an LDAP Request may be processed. - -After configuring LDAP and creating your cluster you can log into your Hopsworks cluster and edit the LDAP _attributes to field names_ to match -your server. By default all _attributes to field names_ are set to the values in [OpenLDAP](https://www.openldap.org/). -See [Configure LDAP](../../../user_guides/projects/auth/ldap.md) on how to edit the LDAP default configurations. - - -!!!Note - - A default admin user that can log in with **username** and **password** will be created for the user that is creating the cluster. - This user can be removed after making sure users can log in using LDAP. \ No newline at end of file diff --git a/docs/common/sso/oauth.md b/docs/common/sso/oauth.md deleted file mode 100644 index 44f44d831..000000000 --- a/docs/common/sso/oauth.md +++ /dev/null @@ -1,88 +0,0 @@ -# Configure your Hopsworks cluster to use OAuth2 for user management. - -If you want to use your organization's OAuth 2.0 identity provider to manage users in your Hopsworks cluster this document -will guide you through the necessary steps to register your identity provider in [managed.hopsworks.ai](https://managed.hopsworks.ai). - -Before registering your identity provider in Hopsworks you need to create a client application in your identity provider and -acquire a _client id_ and a _client secret_. Examples on how to create a client using [Okta](https://www.okta.com/) -and [Azure Active Directory](https://docs.microsoft.com/en-us/azure/active-directory) identity providers can be found -[here](../../../../admin/oauth2/create-okta-client/) and [here](../../../../admin/oauth2/create-azure-client/) respectively. - -In the User management step of cluster creation ([AWS](../../../aws/cluster_creation/#step-11-user-management-selection), -[Azure](../../../azure/cluster_creation/#step-10-user-management-selection)) you can choose which user management system to use. Select -_OAuth2 (OpenId)_ from the dropdown and configure your identity provider. - -

-

- Setup OAuth -
Setup OAuth
-
-

- -Register your identity provider by setting the following fields: - -- _Create Administrator password user_: if checked an administrator that can log in to the Hopsworks cluster, with email and password, -will be created for the user creating the cluster. If **Not** checked a group mapping that maps at least one group in the identity provider to _HOPS_ADMIN_ is required. -- _ClientId_: the client id generated when registering Hopsworks in your identity provider. -- _Client Secret_: the client secret generated when registering Hopsworks in your identity provider. -- _Provider URI_: is the base uri of the identity provider (URI should contain scheme http:// or https://). -- _Provider Name_: a unique name to identify the identity provider in your Hopsworks cluster. - This name will be used in the login page as an alternative login method if _Provider DisplayName_ is not set. - - -Optionally you can also set: - -- _Provider DisplayName_: the name to display for the alternative login method (if not set _Provider Name_ will be used) -- _Provider Logo URI_: a logo URL to an image. The logo will be shown on the login page with the provider name. -- _Code Challenge Method_: if your identity provider requires a code challenge for authorization request check the code challenge check box. - This will allow you to choose a code challenge method that can be either plain or S256. -- _Group Mapping_: will allow you to map groups in your identity provider to groups in hopsworks. - You can choose to map all users to HOPS_USER or HOPS_ADMIN. Alternatively you can add mappings as in the example below. - ``` - IT->HOPS_ADMIN;DATA_SCIENCE->HOPS_USER - ``` - This will map users in the IT group in your identity provider to HOPS_ADMIN and users in the DATA_SCIENCE group to HOPS_USER. -- _Verify Email_: if checked only users with verified email address (in the identity provider) can log in to Hopsworks. -- _Activate user_: if not checked an administrator in Hopsworks needs to activate users before they can login. -- _Need consent_: if checked, users will be asked for consent when logging in for the first time. -- _Disable registration_: if unchecked users will have the possibility to create accounts in the Hopsworks cluster using user name and password instead of OAuth. -- _Provider Metadata Endpoint Supported_: if your provider defines a discovery mechanism, called OpenID Connect Discovery, - where it publishes its metadata at a well-known URL, typically -``` -https://server.com/.well-known/openid-configuration -``` -you can check this and the metadata will be discovered by hopsworks. -If your provider does not publish its metadata you need to supply these values manually. - -

-

- provider metadata -
Setup Provider
-
-

- -- _Authorization Endpoint_: the authorization endpoint of your identity provider, typically -``` -https://server.com/oauth2/authorize -``` -- _End Session Endpoint_: the logout endpoint of your identity provider, typically -``` -https://server.com/oauth2/logout -``` -- _Token Endpoint_: the token endpoint of your identity provider, typically -``` -https://server.com/oauth2/token -``` -- _UserInfo Endpoint_: the user info endpoint of your identity provider, typically -``` -https://server.com/oauth2/userinfo -``` -- _JWKS URI_: the JSON Web Key Set endpoint of your identity provider, typically -``` -https://server.com/oauth2/keys -``` - -After configuring OAuth2 you can click on **Next** to configure the rest of your cluster. - -You can also configure OAuth2 once you have created a Hopsworks cluster. For instructions on how to configure OAUth2 on Hopsworks see -[Authentication Methods](../../../../admin/oauth2/create-client/). diff --git a/docs/common/terraform.md b/docs/common/terraform.md deleted file mode 100644 index c1d265b8e..000000000 --- a/docs/common/terraform.md +++ /dev/null @@ -1,565 +0,0 @@ -# Hopsworks.ai Terraform Provider - -[Managed.hopsworks.ai](https://managed.hopsworks.ai) allows users to create and manage their clusters using the [Hopsworks.ai terraform provider](https://registry.terraform.io/providers/logicalclocks/hopsworksai/latest). In this guide, we first provide brief description on how to get started on AWS, AZURE, and GCP, then we show how to import an existing cluster to be managed by terraform. - -## Getting Started with AWS - -Complete the following steps to start using Hopsworks.ai Terraform Provider on AWS. - -1. Create a [managed.hopsworks.ai](https://managed.hopsworks.ai) API KEY as described in details [here](api_key.md), and export the API KEY as follows -```bash -export HOPSWORKSAI_API_KEY= -``` -2. Download the proper Terraform CLI for your os from [here](https://www.terraform.io/downloads.html). -3. Install the [AWS CLI](https://aws.amazon.com/cli/) and run `aws configurre` to configure your AWS credentials. - -### Example -In this section, we provide a simple example to create a Hopsworks cluster on AWS along with all its required resources (ssh key, S3 bucket, and instance profile with the required permissions). - -1. In your terminal, run the following to create a demo directory and cd to it -```bash -mkdir demo -cd demo -``` -2. In this empty directory, create an empty file `main.tf`. Open the file and paste the following configurations to it then save it. -```hcl -terraform { - required_providers { - aws = { - source = "hashicorp/aws" - version = "4.16.0" - } - hopsworksai = { - source = "logicalclocks/hopsworksai" - } - } -} - -variable "region" { - type = string - default = "us-east-2" -} - -provider "aws" { - region = var.region -} - -provider "hopsworksai" { -} - -# Create the required aws resources, an ssh key, an s3 bucket, and an instance profile with the required Hopsworks permissions -module "aws" { - source = "logicalclocks/helpers/hopsworksai//modules/aws" - region = var.region - version = "2.3.0" -} - -# Create a cluster with no workers -resource "hopsworksai_cluster" "cluster" { - name = "tf-hopsworks-cluster" - ssh_key = module.aws.ssh_key_pair_name - - head { - instance_type = "m5.2xlarge" - } - - aws_attributes { - region = var.region - instance_profile_arn = module.aws.instance_profile_arn - bucket { - name = module.aws.bucket_name - } - } - - rondb { - single_node { - instance_type = "t3a.xlarge" - } - } - - autoscale { - non_gpu_workers { - instance_type = "m5.2xlarge" - disk_size = 256 - min_workers = 1 - max_workers = 5 - standby_workers = 0.5 - downscale_wait_time = 300 - } - } - - open_ports { - ssh = true - } -} - -output "hopsworks_cluster_url" { - value = hopsworksai_cluster.cluster.url -} -``` -3. Initialize the terraform directory by running the following command -```bash -terraform init -``` -4. Now you can apply the changes to create all required resources -```bash -terraform apply -``` -5. Once terraform finishes creating the resources, it will output the url to the newly created cluster. Notice that for now, you have to navigate to your [managed.hopsworks.ai dashboard](https://managed.hopsworks.ai/dashboard) to get your login credentials. - -6. After you finish working with the cluster, you can terminate it along with the other AWS resources using the following command -```bash -terraform destroy -``` - - -## Getting Started with AZURE - -Complete the following steps to start using Hopsworks.ai Terraform Provider on AZURE. - -1. Create a [managed.hopsworks.ai](https://managed.hopsworks.ai) API KEY as described in details [here](api_key.md), and export the API KEY as follows -```bash -export HOPSWORKSAI_API_KEY= -``` -2. Download the proper Terraform CLI for your os from [here](https://www.terraform.io/downloads.html). -3. Install the [AZURE CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) and run `az login` to configure your AZURE credentials. - -### Example -In this section, we provide a simple example to create a Hopsworks cluster on AZURE along with all its required resources (ssh key, storage account, acr registry, and user assigned managed identity with the required permissions). - -1. In your terminal, run the following to create a demo directory and cd to it -```bash -mkdir demo -cd demo -``` -2. In this empty directory, create an empty file `main.tf`. Open the file and paste the following configurations to it then save it. -```hcl -terraform { - required_providers { - azurerm = { - source = "hashicorp/azurerm" - version = "3.8.0" - } - hopsworksai = { - source = "logicalclocks/hopsworksai" - } - } -} - -variable "resource_group" { - type = string -} - -provider "azurerm" { - features {} - skip_provider_registration = true -} - -provider "hopsworksai" { -} - -data "azurerm_resource_group" "rg" { - name = var.resource_group -} - -# Create the required azure resources, an ssh key, a storage account, and an user assigned managed identity with the required Hopsworks permissions -module "azure" { - source = "logicalclocks/helpers/hopsworksai//modules/azure" - resource_group = var.resource_group - version = "2.3.0" -} - -# Create an ACR registry -resource "azurerm_container_registry" "acr" { - name = replace(module.azure.storage_account_name, "storageaccount", "acr") - resource_group_name = module.azure.resource_group - location = module.azure.location - sku = "Premium" - admin_enabled = false - retention_policy { - enabled = true - days = 7 - } -} - -# Create a cluster with no workers -resource "hopsworksai_cluster" "cluster" { - name = "tf-hopsworks-cluster" - ssh_key = module.azure.ssh_key_pair_name - - head { - instance_type = "Standard_D8_v3" - } - - azure_attributes { - location = module.azure.location - resource_group = module.azure.resource_group - user_assigned_managed_identity = module.azure.user_assigned_identity_name - container { - storage_account = module.azure.storage_account_name - } - acr_registry_name = azurerm_container_registry.acr.name - } - - rondb { - single_node { - instance_type = "Standard_D4s_v4" - } - } - - autoscale { - non_gpu_workers { - instance_type = "Standard_D8_v3" - disk_size = 256 - min_workers = 1 - max_workers = 5 - standby_workers = 0.5 - downscale_wait_time = 300 - } - } - - open_ports { - ssh = true - } -} - -output "hopsworks_cluster_url" { - value = hopsworksai_cluster.cluster.url -} -``` -3. Initialize the terraform directory by running the following command -```bash -terraform init -``` -4. Now you can apply the changes to create all required resources. Replace the placeholders with your Azure resource group -```bash -terraform apply -var="resource_group=" -``` -5. Once terraform finishes creating the resources, it will output the url to the newly created cluster. Notice that for now, you have to navigate to your [managed.hopsworks.ai dashboard](https://managed.hopsworks.ai/dashboard) to get your login credentials. - -6. After you finish working with the cluster, you can terminate it along with the other AZURE resources using the following command -```bash -terraform destroy -var="resource_group=" -``` - -## Getting Started with GCP - -Complete the following steps to start using Hopsworks.ai Terraform Provider on GCP. - -1. Create a [managed.hopsworks.ai](https://managed.hopsworks.ai) API KEY as described in details [here](api_key.md), and export the API KEY as follows -```bash -export HOPSWORKSAI_API_KEY= -``` -2. Download the proper Terraform CLI for your os from [here](https://www.terraform.io/downloads.html). -3. Install the [Google Cloud CLI](https://cloud.google.com/sdk/docs/install-sdk) and run `gcloud init` to configure your GCP credentials. - -### Example -In this section, we provide a simple example to create a Hopsworks cluster on GCP along with all its required resources (Google storage bucket and service account with the required permissions). - -1. In your terminal, run the following to create a demo directory and cd to it -```bash -mkdir demo -cd demo -``` -2. In this empty directory, create an empty file `main.tf`. Open the file and paste the following configurations to it then save it. -```hcl -terraform { - required_version = ">= 0.14.0" - - required_providers { - google = { - source = "hashicorp/google" - version = "5.13.0" - } - hopsworksai = { - source = "logicalclocks/hopsworksai" - } - time = { - source = "hashicorp/time" - version = "0.10.0" - } - } -} - -variable "region" { - type = string - default = "europe-north1" -} - -variable "project" { - type = string -} - -provider "google" { - region = var.region - project = var.project -} - -provider "hopsworksai" { -} - -provider "time" { - -} - -# Create required google resources, a storage bucket and an service account with the required hopsworks permissions -data "hopsworksai_gcp_service_account_custom_role_permissions" "service_account" { - -} - -resource "google_project_iam_custom_role" "service_account_role" { - role_id = "tf.HopsworksAIInstances" - title = "Hopsworks AI Instances" - description = "Role that allows Hopsworks AI Instances to access resources" - permissions = data.hopsworksai_gcp_service_account_custom_role_permissions.service_account.permissions -} - -resource "google_service_account" "service_account" { - account_id = "tf-hopsworks-ai-instances" - display_name = "Hopsworks AI instances" - description = "Service account for Hopsworks AI instances" -} - -resource "google_project_iam_binding" "service_account_role_binding" { - project = var.project - role = google_project_iam_custom_role.service_account_role.id - - members = [ - google_service_account.service_account.member - ] -} - -resource "google_storage_bucket" "bucket" { - name = "tf-hopsworks-bucket" - location = var.region - force_destroy = true -} - -resource "time_sleep" "wait_60_seconds" { - depends_on = [google_project_iam_binding.service_account_role_binding] - create_duration = "60s" -} - -# Create a simple cluster with two workers with two different configuration -data "google_compute_zones" "available" { - region = var.region -} - -locals { - zone = data.google_compute_zones.available.names.0 -} - -resource "hopsworksai_cluster" "cluster" { - name = "tf-cluster" - - head { - instance_type = "e2-standard-8" - } - - gcp_attributes { - project_id = var.project - region = var.region - zone = local.zone - service_account_email = google_service_account.service_account.email - bucket { - name = google_storage_bucket.bucket.name - } - } - - rondb { - single_node { - instance_type = "e2-highmem-4" - } - } - - autoscale { - non_gpu_workers { - instance_type = "e2-standard-8" - disk_size = 256 - min_workers = 1 - max_workers = 5 - standby_workers = 0.5 - downscale_wait_time = 300 - } - } - - open_ports { - ssh = true - } - - # waiting for 60 seconds after service account permissions has been granted - # to avoid permissions validation failure on hopsworks when creating the cluster - depends_on = [time_sleep.wait_60_seconds] -} - -output "hopsworks_cluster_url" { - value = hopsworksai_cluster.cluster.url -} -``` -3. Initialize the terraform directory by running the following command -```bash -terraform init -``` -4. Now you can apply the changes to create all required resources. Replace the placeholders with your GCP project id -```bash -terraform apply -var="project=" -``` -5. Once terraform finishes creating the resources, it will output the url to the newly created cluster. Notice that for now, you have to navigate to your [managed.hopsworks.ai dashboard](https://managed.hopsworks.ai/dashboard) to get your login credentials. - -6. After you finish working with the cluster, you can terminate it along with the other GCP resources using the following command -```bash -terraform destroy -var="project=" -``` - -## Importing an existing cluster to terraform - -In this section, we show how to use `terraform import` to manage your existing Hopsworks cluster. - -* **Step 1**: In your [managed.hopsworks.ai dashboard](https://managed.hopsworks.ai/dashboard), choose the cluster you want to import to terraform, then go to the *Details* tab and copy the *Id* as shown in the figure below - -

-

- Details tab -
Click on the Details tab and copy the Id
-
-

- -* **Step 2**: In your terminal, create an empty directory and cd to it. -```bash -mkdir import-demo -cd import-demo -``` - -* **Step 3**: In this empty directory, create an empty file `versions.tf`. Open the file and paste the following configurations. - -!!! note - Notice that you need to change these configurations depending on your cluster, in this example, the Hopsworks cluster reside in region `us-east-2` on AWS. - -```hcl -terraform { - required_providers { - aws = { - source = "hashicorp/aws" - version = "4.16.0" - } - hopsworksai = { - source = "logicalclocks/hopsworksai" - } - } -} - -provider "aws" { - region = us-east-2 -} - -provider "hopsworksai" { -} -``` - -* **Step 4**: Initialize the terraform directory by running the following command -```bash -terraform init -``` - -* **Step 5**: Create another file `main.tf`. Open the file and paste the following configuration. -```hcl -resource "hopsworksai_cluster" "cluster" { -} -``` - -* **Step 6**: Import the cluster state using `terraform import`, in this step you need the cluster id from Step 1 (33ae7ae0-d03c-11eb-84e2-af555fb63565). -```bash -terraform import hopsworksai_cluster.cluster 33ae7ae0-d03c-11eb-84e2-af555fb63565 -``` - -The output should be similar to the following snippet -```bash -hopsworksai_cluster.cluster: Importing from ID "33ae7ae0-d03c-11eb-84e2-af555fb63565"... -hopsworksai_cluster.cluster: Import prepared! - Prepared hopsworksai_cluster for import -hopsworksai_cluster.cluster: Refreshing state... [id=33ae7ae0-d03c-11eb-84e2-af555fb63565] - -Import successful! - -The resources that were imported are shown above. These resources are now in -your Terraform state and will henceforth be managed by Terraform. -``` - -* **Step 7**: At that moment the local terraform state is updated, however, if we try to run `terraform plan` or `terraform apply` it will complain about missing configurations. The reason is that our local resource configuration in `main.tf` is empty, we should populate it using the terraform state commands as shown below: -```bash -terraform show -no-color > main.tf -``` - -* **Step 8**: If you try to run `terraform plan` again, the command will complain that the read-only attributes are set (Computed attributes) as shown below. The solution is to remove these attributes from the `main.tf` and retry again until you have no errors. -```bash -Error: Computed attributes cannot be set - - on main.tf line 3, in resource "hopsworksai_cluster" "cluster": - 3: activation_state = "stoppable" - -Computed attributes cannot be set, but a value was set for "activation_state". - - -Error: Computed attributes cannot be set - - on main.tf line 6, in resource "hopsworksai_cluster" "cluster": - 6: cluster_id = "33ae7ae0-d03c-11eb-84e2-af555fb63565" - -Computed attributes cannot be set, but a value was set for "cluster_id". - - -Error: Computed attributes cannot be set - - on main.tf line 7, in resource "hopsworksai_cluster" "cluster": - 7: creation_date = "2021-06-18T15:51:07+02:00" - -Computed attributes cannot be set, but a value was set for "creation_date". - - -Error: Invalid or unknown key - - on main.tf line 8, in resource "hopsworksai_cluster" "cluster": - 8: id = "33ae7ae0-d03c-11eb-84e2-af555fb63565" - - - -Error: Computed attributes cannot be set - - on main.tf line 13, in resource "hopsworksai_cluster" "cluster": - 13: start_date = "2021-06-18T15:51:07+02:00" - -Computed attributes cannot be set, but a value was set for "start_date". - - -Error: Computed attributes cannot be set - - on main.tf line 14, in resource "hopsworksai_cluster" "cluster": - 14: state = "running" - -Computed attributes cannot be set, but a value was set for "state". - - -Error: Computed attributes cannot be set - - on main.tf line 17, in resource "hopsworksai_cluster" "cluster": - 17: url = "https://33ae7ae0-d03c-11eb-84e2-af555fb63565.dev-cloud.hopsworks.ai/hopsworks/#!/" - -Computed attributes cannot be set, but a value was set for "url". -``` - -* **Step 9**: Once you have fixed all the errors, you should get the following output when running `terraform plan`. With that, you can proceed as normal to manage this cluster locally using terraform. -```bash -hopsworksai_cluster.cluster: Refreshing state... [id=33ae7ae0-d03c-11eb-84e2-af555fb63565] - -No changes. Infrastructure is up-to-date. - -This means that Terraform did not detect any differences between your -configuration and real physical resources that exist. As a result, no -actions need to be performed. -``` - -## Next Steps -* Check the [Hopsworks.ai terraform provider documentation](https://registry.terraform.io/providers/logicalclocks/hopsworksai/latest/docs) for more details about the different resources and data sources supported by the provider and a description of their attributes. -* Check the [Hopsworks.ai terraform AWS examples](https://github.com/logicalclocks/terraform-provider-hopsworksai/tree/main/examples/complete/aws), each example contains a README file describing how to run it and more details about configuring it. -* Check the [Hopsworks.ai terraform AZURE examples](https://github.com/logicalclocks/terraform-provider-hopsworksai/tree/main/examples/complete/azure), each example contains a README file describing how to run it and more details about configuring it. -* Check the [Hopsworks.ai terraform GCP examples](https://github.com/logicalclocks/terraform-provider-hopsworksai/tree/main/examples/complete/gcp), each example contains a README file describing how to run it and more details about configuring it. diff --git a/docs/common/user_management.md b/docs/common/user_management.md deleted file mode 100644 index ec258c790..000000000 --- a/docs/common/user_management.md +++ /dev/null @@ -1,95 +0,0 @@ -# User management -In [managed.hopsworks.ai](https://managed.hopsworks.ai) users can be grouped into *organizations* to access the same resources. -When a new user registers with [managed.hopsworks.ai](https://managed.hopsworks.ai) a new organization is created. This user later on can -invite other registered users to their organization so they can share access to the same clusters. - -Cloud Accounts configuration is also shared among users of the same organization. So if user Alice has configured -her account with her credentials, all member of her *organization* will automatically deploy clusters in her cloud -account. Credits and cluster usage are also grouped to ease reporting. - -## Adding members to an organization -Organization membership can be edited by clicking **Members** on the left of [managed.hopsworks.ai](https://managed.hopsworks.ai) Dashboard page. - -

-

- Organization membership -
Organization membership
-
-

- -To add a new member to your organization add the user's email and click **Add**. The invited user will -receive an email with the invitation. You can set the user as administrator by checking the __Admin__ checkbox. More details about organization administrators can be found [here].(#administrator-role) - -An invited user **must accept** the invitation to be part of the organization. An invitation will show up on the invitee's Dashboard. The invitee may have to close the __Welcome__ splash screen to be able to see the invitation. -In this example, Alice has invited Bob to her organization, but Bob hasn't accepted -the invitation yet. - -

-

- Invitation -
Alice has sent the invitation
-
- -
- Accept invitation -
Bob's dashboard
-
-

- -## Sharing resources -Once Bob has accepted the invitation he does **not** have to configure his account, they share the same configuration. -Also, he will be able to view **the same** Dashboard as Alice, so he can start, stop or terminate clusters in the organization. - -

-

- Alice dashboard -
Alice's dashboard
-
- -
- Bob dashboard -
Bob's dashboard
-
-

- -If Alice had existing clusters running and she had selected [Managed user management](../../aws/cluster_creation/#step-11-user-management-selection) -during cluster creation, an account will be automatically created for Bob on these clusters. - -## Removing members from an organization -To remove a member from your organization simply go to **Members** page and click the **Remove** button next to the user you want to remove. -You will **stop** sharing any resource and the user **will be blocked** from any shared cluster. - -

-

- Delete organization member -
Delete organization member
-
-

- -## Organization permissions -The owner and the administrators of an organization can set permissions at the organization level. For this got to the members tab, check the checkboxes in the __Member permissions__ section and click on __Update__. - -The supported permissions are: - -- __Non admin members can invite new members to the organization__. If this permission is enabled, any member of the organization will be able to invite other members to the organization. Note that only the owner and the administrators will be able to invite new members as administrators. If this permission is not enabled only the owner and the administrators can invite new members to the organization. -- __Non admin members can create and terminate clusters__. If this permission is enabled, any member of the organization will be able to create and terminate clusters. If it is not enabled, only the owner and the administrators will be able to create and terminate clusters. -- __Non admin members can open clusters ports__. If this permission is enabled, any member of the organization can open and close [services ports](./services.md) on organization's clusters. If it is not enabled, only the organization owner and administrators will be able to do so. - -

-

- Modify permissions -
Modify permissions
-
-

- -## Administrator role -Members of an organization can be set as administrators. This can be done by checking the __admin__ checkbox at the time of invitation or by checking the __admin__ checkbox then clicking the __Update__ button next to a member email. - -Administrators can do all the actions described in the [Organization permissions](#organization-permissions) section of this documentation. They can also update the configuration of these permissions and set other users as administrators. Finally, administrators are automatically set as administrators on all the clusters of the organization that have [Managed user enabled](../../aws/cluster_creation/#step-11-user-management-selection) and are version __2.6.0__ or above. - -

-

- Set a member as admin -
Set a member as admin
-
-

\ No newline at end of file diff --git a/docs/js/dropdown.js b/docs/js/dropdown.js index a8528fdf5..4f0a7e8a7 100644 --- a/docs/js/dropdown.js +++ b/docs/js/dropdown.js @@ -1,3 +1,3 @@ -document.getElementsByClassName("md-tabs__link")[7].style.display = "none"; -document.getElementsByClassName("md-tabs__link")[9].style.display = "none"; +document.getElementsByClassName("md-tabs__link")[6].style.display = "none"; +document.getElementsByClassName("md-tabs__link")[8].style.display = "none"; diff --git a/docs/setup_installation/admin/oauth2/create-azure-client.md b/docs/setup_installation/admin/oauth2/create-azure-client.md index beb4c513a..52491a480 100644 --- a/docs/setup_installation/admin/oauth2/create-azure-client.md +++ b/docs/setup_installation/admin/oauth2/create-azure-client.md @@ -105,12 +105,6 @@ Enter the *Redirect URI* and click on *Configure*. The redirect URI is *HOPSWORK

-!!! note - - If your Hopsworks cluster is created on the cloud (managed.hopsworks.ai), - you can find your *HOPSWORKS-URI* by going to the [managed.hopsworks.ai dashboard](https://managed.hopsworks.ai/dashboard) - in the *General* tab of your cluster and copying the URI. - ## Conclusion In this guide you learned how to create a client in your Azure identity provider and diff --git a/docs/setup_installation/admin/user.md b/docs/setup_installation/admin/user.md index 883a81d6a..479122655 100644 --- a/docs/setup_installation/admin/user.md +++ b/docs/setup_installation/admin/user.md @@ -1,7 +1,7 @@ # User Management ## Introduction -Whether you run Hopsworks on-premise, or on the cloud using [managed.hopsworks.ai](https://managed.hopsworks.ai), +Whether you run Hopsworks on-premise, or on the cloud using kubernetes, you have a Hopsworks cluster which contains all users and projects. ## Prerequisites diff --git a/docs/setup_installation/admin/variables.md b/docs/setup_installation/admin/variables.md index d68b4e989..88a32306a 100644 --- a/docs/setup_installation/admin/variables.md +++ b/docs/setup_installation/admin/variables.md @@ -1,7 +1,7 @@ # Cluster Configuration ## Introduction -Whether you run Hopsworks on-premise, or on the cloud using [managed.hopsworks.ai](https://managed.hopsworks.ai), it is possible to change a variety of configurations on the cluster, changing its default behaviour. +Whether you run Hopsworks on-premise, or on the cloud using kubernetes, it is possible to change a variety of configurations on the cluster, changing its default behaviour. This section is not going into detail for every setting, since every Hopsworks cluster comes with a robust default setup. However, this guide is to explain where to find the configurations and if necessary, how to change them. !!! note diff --git a/docs/user_guides/fs/feature_group/data_validation.md b/docs/user_guides/fs/feature_group/data_validation.md index 88c7edaf1..abb91121f 100644 --- a/docs/user_guides/fs/feature_group/data_validation.md +++ b/docs/user_guides/fs/feature_group/data_validation.md @@ -62,7 +62,7 @@ First checkout the pre-requisite and Hopsworks setup to follow the guide below. In order to define and validate an expectation when writing to a Feature Group, you will need: -- A Hopsworks project. If you don't have a project yet you can go to [managed.hopsworks.ai](https://managed.hopsworks.ai), signup with your email and create your first project. +- A Hopsworks project. If you don't have a project yet you can go to [app.hopsworks.ai](https://app.hopsworks.ai), signup with your email and create your first project. - An API key, you can get one by following the instructions [here](../../../setup_installation/common/api_key.md) - The [Hopsworks Python library](https://pypi.org/project/hopsworks) installed in your client. See the [installation guide](../../client_installation/index.md). diff --git a/docs/user_guides/fs/feature_group/feature_monitoring.md b/docs/user_guides/fs/feature_group/feature_monitoring.md index b355cea01..4644016a4 100644 --- a/docs/user_guides/fs/feature_group/feature_monitoring.md +++ b/docs/user_guides/fs/feature_group/feature_monitoring.md @@ -19,7 +19,7 @@ After that, you can optionally define a detection window of data to compute stat In order to setup feature monitoring for a Feature Group, you will need: -- A Hopsworks project. If you don't have a project yet you can go to [managed.hopsworks.ai](https://managed.hopsworks.ai), signup with your email and create your first project. +- A Hopsworks project. If you don't have a project yet you can go to [app.hopsworks.ai](https://app.hopsworks.ai), signup with your email and create your first project. - An API key, you can get one by following the instructions [here](../../../setup_installation/common/api_key.md) - The Hopsworks Python library installed in your client. See the [installation guide](../../client_installation/index.md). - A Feature Group @@ -188,4 +188,4 @@ Finally, you can save your feature monitoring configuration by calling the `save ``` !!! info "Next steps" - See the [Advanced guide](../feature_monitoring/feature_monitoring_advanced.md) to learn how to delete, disable or trigger feature monitoring manually. \ No newline at end of file + See the [Advanced guide](../feature_monitoring/feature_monitoring_advanced.md) to learn how to delete, disable or trigger feature monitoring manually. diff --git a/docs/user_guides/fs/feature_view/feature_monitoring.md b/docs/user_guides/fs/feature_view/feature_monitoring.md index 6dbcc6378..6d42d508a 100644 --- a/docs/user_guides/fs/feature_view/feature_monitoring.md +++ b/docs/user_guides/fs/feature_view/feature_monitoring.md @@ -19,7 +19,7 @@ After that, you can optionally define a detection window of data to compute stat In order to setup feature monitoring for a Feature View, you will need: -- A Hopsworks project. If you don't have a project yet you can go to [managed.hopsworks.ai](https://managed.hopsworks.ai), signup with your email and create your first project. +- A Hopsworks project. If you don't have a project yet you can go to [app.hopsworks.ai](https://app.hopsworks.ai), signup with your email and create your first project. - An API key, you can get one by following the instructions [here](../../../setup_installation/common/api_key.md) - The [Hopsworks Python library](https://pypi.org/project/hopsworks) installed in your client. See the [installation guide](../../client_installation/index.md). - A Feature View @@ -201,4 +201,4 @@ Finally, you can save your feature monitoring configuration by calling the `save ``` !!! info "Next steps" - See the [Advanced guide](../feature_monitoring/feature_monitoring_advanced.md) to learn how to delete, disable or trigger feature monitoring manually. \ No newline at end of file + See the [Advanced guide](../feature_monitoring/feature_monitoring_advanced.md) to learn how to delete, disable or trigger feature monitoring manually. diff --git a/docs/user_guides/integrations/emr/networking.md b/docs/user_guides/integrations/emr/networking.md index 87dede772..da468b945 100644 --- a/docs/user_guides/integrations/emr/networking.md +++ b/docs/user_guides/integrations/emr/networking.md @@ -27,19 +27,10 @@ Identify your EMR VPC in the Summary of your EMR cluster:

-!!! info "Hopsworks installer" - If you are performing an installation using the [Hopsworks installer script](../../../setup_installation/on_prem/hopsworks_installer.md), ensure that the virtual machines you install Hopsworks on are deployed in the EMR VPC. - -!!! info "managed.hopsworks.ai" - If you are on **[managed.hopsworks.ai](https://managed.hopsworks.ai)**, you can directly deploy Hopsworks to the EMR VPC, by simply selecting it at the [VPC selection step during cluster creation](https://docs.hopsworks.ai/hopsworks-cloud/latest/aws/cluster_creation/#step-8-vpc-selection). - **Option 2: Set up VPC peering** Follow the guide [VPC Peering](https://docs.aws.amazon.com/vpc/latest/peering/create-vpc-peering-connection.html) to set up VPC peering between the Feature Store and EMR. Get your Feature Store *VPC ID* and *CIDR* by searching for the Feature Store VPC in the AWS Management Console: -!!! info "managed.hopsworks.ai" - On **[managed.hopsworks.ai](https://managed.hopsworks.ai)**, the VPC is shown in the cluster details. -

Identify the Feature Store VPC @@ -51,9 +42,6 @@ Follow the guide [VPC Peering](https://docs.aws.amazon.com/vpc/latest/peering/cr The Feature Store *Security Group* needs to be configured to allow traffic from your EMR clusters to be able to connect to the Feature Store. -!!! note "managed.hopsworks.ai" - If you deployed your Hopsworks Feature Store with [managed.hopsworks.ai](https://managed.hopsworks.ai), you only need to enable [outside access of the Feature Store services](https://docs.hopsworks.ai/hopsworks-cloud/latest/services/#outside-access-to-the-feature-store). - Open your feature store instance under EC2 in the AWS Management Console and ensure that ports *443*, *3306*, *9083*, *9085*, *8020* and *30010* (443,3306,8020,30010,9083,9085) are reachable from the EMR Security Group: diff --git a/docs/user_guides/integrations/python.md b/docs/user_guides/integrations/python.md index 777b22e6f..c12348359 100644 --- a/docs/user_guides/integrations/python.md +++ b/docs/user_guides/integrations/python.md @@ -59,7 +59,6 @@ fs = conn.get_feature_store() # Get the project's default feature stor If you have trouble to connect, please ensure that your Feature Store can receive incoming traffic from your Python environment on ports 443, 9083 and 9085 (443,9083,9085). - If you deployed your Hopsworks Feature Store instance with [managed.hopsworks.ai](https://managed.hopsworks.ai), it suffices to enable [outside access of the Feature Store and Online Feature Store services](https://docs.hopsworks.ai/hopsworks-cloud/latest/services/#outside-access-to-the-feature-store). ## Next Steps diff --git a/mkdocs.yml b/mkdocs.yml index b1bbc90f3..492995fc5 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -108,14 +108,7 @@ nav: - Client Integrations: - user_guides/integrations/index.md - Python: user_guides/integrations/python.md - - Databricks: - - Networking: user_guides/integrations/databricks/networking.md - - Hopsworks API Key: user_guides/integrations/databricks/api_key.md - - Configuration: user_guides/integrations/databricks/configuration.md - AWS Sagemaker: user_guides/integrations/sagemaker.md - - AWS EMR: - - Networking: user_guides/integrations/emr/networking.md - - Configure EMR for the Hopsworks Feature Store: user_guides/integrations/emr/emr_configuration.md - Azure HDInsight: user_guides/integrations/hdinsight.md - Azure Machine Learning: - Designer: user_guides/integrations/mlstudio_designer.md @@ -222,7 +215,6 @@ nav: - Project Management: setup_installation/admin/project.md - Configure Alerts: setup_installation/admin/alert.md - Manage Services: setup_installation/admin/services.md - - IAM Role Chaining: setup_installation/admin/roleChaining.md - Monitoring: - Services Dashboards: setup_installation/admin/monitoring/grafana.md - Export metrics: setup_installation/admin/monitoring/export-metrics.md @@ -245,14 +237,6 @@ nav: - Audit: - Access Audit Logs: setup_installation/admin/audit/audit-logs.md - Export Audit Logs: setup_installation/admin/audit/export-audit-logs.md - - Managed: - - The dashboard: common/dashboard.md - - Settings: common/settings.md - - Services: common/services.md - - Adding and Removing workers: common/adding_removing_workers.md - - Autoscaling: common/autoscaling.md - - Backup: common/backup.md - - Scaling up: common/scalingup.md - : https://docs.hopsworks.ai - Community ↗: https://community.hopsworks.ai/ From 17e519bf65bdd47671f0f9386a7db7dc49572050 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Mon, 7 Oct 2024 22:31:50 +0100 Subject: [PATCH 11/24] Rework On-Prem content to retain contact Hopsworks. - Remove content related to hardware requirements. --- ...hopsworks_installer.md => contact_hopsworks.md} | 14 ++------------ mkdocs.yml | 2 +- 2 files changed, 3 insertions(+), 13 deletions(-) rename docs/setup_installation/on_prem/{hopsworks_installer.md => contact_hopsworks.md} (69%) diff --git a/docs/setup_installation/on_prem/hopsworks_installer.md b/docs/setup_installation/on_prem/contact_hopsworks.md similarity index 69% rename from docs/setup_installation/on_prem/hopsworks_installer.md rename to docs/setup_installation/on_prem/contact_hopsworks.md index 57959cf49..fee9e4600 100644 --- a/docs/setup_installation/on_prem/hopsworks_installer.md +++ b/docs/setup_installation/on_prem/contact_hopsworks.md @@ -1,21 +1,11 @@ --- -description: Requirements and instructions on how to install the Hopsworks feature store on-premises. +description: Requirements and instructions on how to install the Hopsworks on-premises. --- -# Hopsworks On premises +# Hopsworks On-Premise Installation It is possible to use Hopsworks on-premises, which means that companies can run their machine learning workloads on their own hardware and infrastructure, rather than relying on a cloud provider. This can provide greater flexibility, control, and cost savings, as well as enabling companies to meet specific compliance and security requirements. Working on-premises with Hopsworks typically involves collaboration with the Hopsworks engineering teams, as each infrastructure is unique and requires a tailored approach to deployment and configuration. The process begins with an assessment of the company's existing infrastructure and requirements, including network topology, security policies, and hardware specifications. For further details about on-premise installations; [contact us](https://www.hopsworks.ai/contact). - -## Minimum Requirements - -You need at least one server or virtual machine on which Hopsworks will be installed with at least the following specification: - -* Ubuntu 22.04, Centos or RHEL Linux (Version 8) supported, -* at least 32GB RAM, -* at least 8 CPUs, -* 100 GB of free hard-disk space, -* a UNIX user account with sudo privileges. \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 492995fc5..136af3428 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -206,7 +206,7 @@ nav: - Azure - Getting Started: setup_installation/azure/getting_started.md - GCP - Getting Started: setup_installation/gcp/getting_started.md - On-Prem: - - Hopsworks Installer: setup_installation/on_prem/hopsworks_installer.md + - Background: setup_installation/on_prem/contact_hopsworks.md - External Kafka cluster: setup_installation/on_prem/external_kafka_cluster.md - Administration: - Cluster Configuration: setup_installation/admin/variables.md From f58e193e7d786eb7d1d982b4329f8017143954b4 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Mon, 7 Oct 2024 22:38:49 +0100 Subject: [PATCH 12/24] Fix last step number in each getting started. - Remove referenc to archived examples repo --- docs/setup_installation/aws/getting_started.md | 3 +-- docs/setup_installation/azure/getting_started.md | 4 ++-- docs/setup_installation/gcp/getting_started.md | 1 - 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/docs/setup_installation/aws/getting_started.md b/docs/setup_installation/aws/getting_started.md index 0b6a69f2c..fa53b901c 100644 --- a/docs/setup_installation/aws/getting_started.md +++ b/docs/setup_installation/aws/getting_started.md @@ -377,11 +377,10 @@ Cluster Roles and Cluster Role Bindings: By default a set of cluster roles are provisioned, if you don’t have permissions to provision cluster roles or cluster role bindings, you should reach out to your K8s administrator. You should then provide the appropriate resource names as value in the values.yml file. -## Step 6: Next steps +## Step 4: Next steps Check out our other guides for how to get started with Hopsworks and the Feature Store: * Get started with the [Hopsworks Feature Store](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/quickstart.ipynb){:target="_blank"} * Follow one of our [tutorials](../../tutorials/index.md) * Follow one of our [Guide](../../user_guides/index.md) -* Code examples and notebooks: [hops-examples](https://github.com/logicalclocks/hops-examples) diff --git a/docs/setup_installation/azure/getting_started.md b/docs/setup_installation/azure/getting_started.md index 9dda5af6b..217740a14 100644 --- a/docs/setup_installation/azure/getting_started.md +++ b/docs/setup_installation/azure/getting_started.md @@ -155,11 +155,11 @@ kubectl expose deployment hopsworks --type=LoadBalancer --name=hopsworks-service -## Step 7: Next steps +## Step 5: Next steps Check out our other guides for how to get started with Hopsworks and the Feature Store: * Get started with the [Hopsworks Feature Store](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/quickstart.ipynb){:target="_blank"} * Follow one of our [tutorials](../../tutorials/index.md) * Follow one of our [Guide](../../user_guides/index.md) -* Code examples and notebooks: [hops-examples](https://github.com/logicalclocks/hops-examples) + diff --git a/docs/setup_installation/gcp/getting_started.md b/docs/setup_installation/gcp/getting_started.md index 2b36e9d85..db8cda902 100644 --- a/docs/setup_installation/gcp/getting_started.md +++ b/docs/setup_installation/gcp/getting_started.md @@ -200,4 +200,3 @@ Check out our other guides for how to get started with Hopsworks and the Feature * Get started with the [Hopsworks Feature Store](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/quickstart.ipynb){:target="_blank"} * Follow one of our [tutorials](../../tutorials/index.md) * Follow one of our [Guide](../../user_guides/index.md) -* Code examples and notebooks: [hops-examples](https://github.com/logicalclocks/hops-examples) From 431fad2b564af6088692b7ae5811a2ed37563961 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Mon, 7 Oct 2024 22:43:01 +0100 Subject: [PATCH 13/24] Remove "--devel" from helm install in getting started. --- docs/setup_installation/aws/getting_started.md | 2 +- docs/setup_installation/azure/getting_started.md | 2 +- docs/setup_installation/gcp/getting_started.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/setup_installation/aws/getting_started.md b/docs/setup_installation/aws/getting_started.md index fa53b901c..0f820aa80 100644 --- a/docs/setup_installation/aws/getting_started.md +++ b/docs/setup_installation/aws/getting_started.md @@ -300,7 +300,7 @@ consul: - Run the Helm install ```bash -helm install hopsworks hopsworks-dev/hopsworks --devel --namespace hopsworks --values values.aws.yaml --timeout=600s +helm install hopsworks hopsworks-dev/hopsworks --namespace hopsworks --values values.aws.yaml --timeout=600s ``` diff --git a/docs/setup_installation/azure/getting_started.md b/docs/setup_installation/azure/getting_started.md index 217740a14..8a9d44936 100644 --- a/docs/setup_installation/azure/getting_started.md +++ b/docs/setup_installation/azure/getting_started.md @@ -136,7 +136,7 @@ global: Deploy Hopsworks in the created namespace. ```bash -helm install hopsworks hopsworks-dev/hopsworks --devel --namespace hopsworks --values values.azure.yaml --timeout=600s +helm install hopsworks hopsworks-dev/hopsworks --namespace hopsworks --values values.azure.yaml --timeout=600s ``` Check that Hopsworks is installing on your provisioned AKS cluster. diff --git a/docs/setup_installation/gcp/getting_started.md b/docs/setup_installation/gcp/getting_started.md index db8cda902..34f434abc 100644 --- a/docs/setup_installation/gcp/getting_started.md +++ b/docs/setup_installation/gcp/getting_started.md @@ -175,7 +175,7 @@ global: Deploy Hopsworks in the created namespace. ```bash -helm install hopsworks hopsworks-dev/hopsworks --devel --namespace hopsworks --values values.gcp.yaml --timeout=600s +helm install hopsworks hopsworks-dev/hopsworks --namespace hopsworks --values values.gcp.yaml --timeout=600s ``` Check that Hopsworks is installing on your provisioned AKS cluster. From d10fd2cee156472e2f609c2f565328de9701b884 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Mon, 7 Oct 2024 22:51:18 +0100 Subject: [PATCH 14/24] Change "hopsworks-dev" to hopsworks for helm commands. --- docs/setup_installation/aws/getting_started.md | 6 +++--- docs/setup_installation/azure/getting_started.md | 6 +++--- docs/setup_installation/gcp/getting_started.md | 6 +++--- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/setup_installation/aws/getting_started.md b/docs/setup_installation/aws/getting_started.md index 0f820aa80..319d4ce3b 100644 --- a/docs/setup_installation/aws/getting_started.md +++ b/docs/setup_installation/aws/getting_started.md @@ -215,8 +215,8 @@ This section describes the steps required to deploy the Hopsworks stack using He - Configure Repo ```bash -helm repo add hopsworks-dev https://nexus.hops.works/repository/hopsworks-helm-dev --username NEXUS_USER --password NEXUS_PASS -helm repo update hopsworks-dev +helm repo add hopsworks https://nexus.hops.works/repository/hopsworks-helm-dev --username NEXUS_USER --password NEXUS_PASS +helm repo update hopsworks ``` - Create Hopsworks namespace @@ -300,7 +300,7 @@ consul: - Run the Helm install ```bash -helm install hopsworks hopsworks-dev/hopsworks --namespace hopsworks --values values.aws.yaml --timeout=600s +helm install hopsworks hopsworks/hopsworks --namespace hopsworks --values values.aws.yaml --timeout=600s ``` diff --git a/docs/setup_installation/azure/getting_started.md b/docs/setup_installation/azure/getting_started.md index 8a9d44936..80d41619a 100644 --- a/docs/setup_installation/azure/getting_started.md +++ b/docs/setup_installation/azure/getting_started.md @@ -98,8 +98,8 @@ kubectl config current-context ### Step 3.1: Add the Hopsworks Helm repository ```bash -helm repo add hopsworks-dev https://nexus.hops.works/repository/hopsworks-helm-dev --username $NEXUS_USER --password $NEXUS_PASS -helm repo update hopsworks-dev +helm repo add hopsworks https://nexus.hops.works/repository/hopsworks-helm-dev --username $NEXUS_USER --password $NEXUS_PASS +helm repo update hopsworks ``` ### Step 3.2: Create Hopsworks namespace & secrets @@ -136,7 +136,7 @@ global: Deploy Hopsworks in the created namespace. ```bash -helm install hopsworks hopsworks-dev/hopsworks --namespace hopsworks --values values.azure.yaml --timeout=600s +helm install hopsworks hopsworks/hopsworks --namespace hopsworks --values values.azure.yaml --timeout=600s ``` Check that Hopsworks is installing on your provisioned AKS cluster. diff --git a/docs/setup_installation/gcp/getting_started.md b/docs/setup_installation/gcp/getting_started.md index 34f434abc..ecdb66c37 100644 --- a/docs/setup_installation/gcp/getting_started.md +++ b/docs/setup_installation/gcp/getting_started.md @@ -126,8 +126,8 @@ kubectl get pods Add the Hopsworks Helm repository ```bash -helm repo add hopsworks-dev https://nexus.hops.works/repository/hopsworks-helm-dev --username $NEXUS_USER --password $NEXUS_PASS -helm repo update hopsworks-dev +helm repo add hopsworks https://nexus.hops.works/repository/hopsworks-helm-dev --username $NEXUS_USER --password $NEXUS_PASS +helm repo update hopsworks ``` ### Step 3.2: Create Hopsworks namespace & secrets @@ -175,7 +175,7 @@ global: Deploy Hopsworks in the created namespace. ```bash -helm install hopsworks hopsworks-dev/hopsworks --namespace hopsworks --values values.gcp.yaml --timeout=600s +helm install hopsworks hopsworks/hopsworks --namespace hopsworks --values values.gcp.yaml --timeout=600s ``` Check that Hopsworks is installing on your provisioned AKS cluster. From a4fc983a4672817110f3ff1aa6e50e1836016876 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Mon, 7 Oct 2024 23:12:20 +0100 Subject: [PATCH 15/24] Add try URL to content and update helm command. --- docs/setup_installation/aws/getting_started.md | 8 +++++++- docs/setup_installation/azure/getting_started.md | 8 +++++++- docs/setup_installation/gcp/getting_started.md | 8 ++++++-- 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/docs/setup_installation/aws/getting_started.md b/docs/setup_installation/aws/getting_started.md index 319d4ce3b..c3fbe3df5 100644 --- a/docs/setup_installation/aws/getting_started.md +++ b/docs/setup_installation/aws/getting_started.md @@ -214,8 +214,14 @@ This section describes the steps required to deploy the Hopsworks stack using He - Configure Repo +To obtain access to the Hopsworks helm chart repository, please obtain +an evaluation/startup licence [here](https://www.hopsworks.ai/try). + +Once you have the helm chart repository URL, replace the environment +variable $HOPSWORKS_REPO in the following command with this URL. + ```bash -helm repo add hopsworks https://nexus.hops.works/repository/hopsworks-helm-dev --username NEXUS_USER --password NEXUS_PASS +helm repo add hopsworks $HOPSWORKS_REPO helm repo update hopsworks ``` diff --git a/docs/setup_installation/azure/getting_started.md b/docs/setup_installation/azure/getting_started.md index 80d41619a..1a2c68174 100644 --- a/docs/setup_installation/azure/getting_started.md +++ b/docs/setup_installation/azure/getting_started.md @@ -97,8 +97,14 @@ kubectl config current-context ### Step 3.1: Add the Hopsworks Helm repository +To obtain access to the Hopsworks helm chart repository, please obtain +an evaluation/startup licence [here](https://www.hopsworks.ai/try). + +Once you have the helm chart repository URL, replace the environment +variable $HOPSWORKS_REPO in the following command with this URL. + ```bash -helm repo add hopsworks https://nexus.hops.works/repository/hopsworks-helm-dev --username $NEXUS_USER --password $NEXUS_PASS +helm repo add hopsworks $HOPSWORKS_REPO helm repo update hopsworks ``` diff --git a/docs/setup_installation/gcp/getting_started.md b/docs/setup_installation/gcp/getting_started.md index ecdb66c37..7e240f34a 100644 --- a/docs/setup_installation/gcp/getting_started.md +++ b/docs/setup_installation/gcp/getting_started.md @@ -123,10 +123,14 @@ kubectl get pods ### Step 3.1: Add the Hopsworks Helm repository -Add the Hopsworks Helm repository +To obtain access to the Hopsworks helm chart repository, please obtain +an evaluation/startup licence [here](https://www.hopsworks.ai/try). + +Once you have the helm chart repository URL, replace the environment +variable $HOPSWORKS_REPO in the following command with this URL. ```bash -helm repo add hopsworks https://nexus.hops.works/repository/hopsworks-helm-dev --username $NEXUS_USER --password $NEXUS_PASS +helm repo add hopsworks $HOPSWORKS_REPO helm repo update hopsworks ``` From 839da8c30a022b79d69b8843a7a12cf0e75c0ebf Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Mon, 7 Oct 2024 23:14:58 +0100 Subject: [PATCH 16/24] Remove kubectl create secret commands --- docs/setup_installation/aws/getting_started.md | 11 ----------- docs/setup_installation/azure/getting_started.md | 4 +--- docs/setup_installation/gcp/getting_started.md | 4 +--- 3 files changed, 2 insertions(+), 17 deletions(-) diff --git a/docs/setup_installation/aws/getting_started.md b/docs/setup_installation/aws/getting_started.md index c3fbe3df5..db3c26d6e 100644 --- a/docs/setup_installation/aws/getting_started.md +++ b/docs/setup_installation/aws/getting_started.md @@ -231,17 +231,6 @@ helm repo update hopsworks kubectl create namespace hopsworks ``` -- Create Hopsworks secrets - -```bash -kubectl create secret docker-registry regcred \ - --namespace=hopsworks \ - --docker-server=docker.hops.works \ - --docker-username=NEXUS_USER \ - --docker-password=NEXUS_PASS \ - --docker-email=NEXUS_EMAIL_ADDRESS -``` - - Update values.aws.yml ```bash diff --git a/docs/setup_installation/azure/getting_started.md b/docs/setup_installation/azure/getting_started.md index 1a2c68174..7cfb15a5f 100644 --- a/docs/setup_installation/azure/getting_started.md +++ b/docs/setup_installation/azure/getting_started.md @@ -108,12 +108,10 @@ helm repo add hopsworks $HOPSWORKS_REPO helm repo update hopsworks ``` -### Step 3.2: Create Hopsworks namespace & secrets +### Step 3.2: Create Hopsworks namespace ```bash kubectl create namespace hopsworks - -kubectl create secret docker-registry regcred --namespace=hopsworks --docker-server=docker.hops.works --docker-username=$NEXUS_USER --docker-password=$NEXUS_PASS --docker-email=$NEXUS_EMAIL_ADDRESS ``` ### Step 3.3: Create helm values file diff --git a/docs/setup_installation/gcp/getting_started.md b/docs/setup_installation/gcp/getting_started.md index 7e240f34a..73a943d35 100644 --- a/docs/setup_installation/gcp/getting_started.md +++ b/docs/setup_installation/gcp/getting_started.md @@ -134,12 +134,10 @@ helm repo add hopsworks $HOPSWORKS_REPO helm repo update hopsworks ``` -### Step 3.2: Create Hopsworks namespace & secrets +### Step 3.2: Create Hopsworks namespace ```bash kubectl create namespace hopsworks - -kubectl create secret docker-registry regcred --namespace=hopsworks --docker-server=docker.hops.works --docker-username=$NEXUS_USER --docker-password=$NEXUS_PASS --docker-email=$NEXUS_EMAIL_ADDRESS ``` ### Step 3.3: Create helm values file From 49e812a57ccb9c6a3edf7e31b851b8f04aa838fc Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Mon, 7 Oct 2024 23:23:37 +0100 Subject: [PATCH 17/24] Remove reference to kserve in values.yaml file. --- docs/setup_installation/aws/getting_started.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/setup_installation/aws/getting_started.md b/docs/setup_installation/aws/getting_started.md index db3c26d6e..e1626bff8 100644 --- a/docs/setup_installation/aws/getting_started.md +++ b/docs/setup_installation/aws/getting_started.md @@ -256,7 +256,6 @@ global: hopsworks: variables: -  kube_kserve_installed: false docker_operations_managed_docker_secrets: *awsregcred docker_operations_image_pull_secrets: "regcred" dockerRegistry: From b1df44e536a3cc2ed2a6091b0e5c15cd03dde344 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Mon, 7 Oct 2024 23:43:08 +0100 Subject: [PATCH 18/24] Fix broken link to on-prem contact Hopsworks. --- docs/setup_installation/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/setup_installation/index.md b/docs/setup_installation/index.md index fe4b9544d..bf0e220d7 100644 --- a/docs/setup_installation/index.md +++ b/docs/setup_installation/index.md @@ -5,7 +5,7 @@ This section contains installation guides for the **Hopsworks Platform** using k - [AWS](aws/getting_started.md) - [Azure](azure/getting_started.md) - [GCP](gcp/getting_started.md) -- [On-Prem](on_prem/hopsworks_installer.md) environments +- [On-Prem](on_prem/contact_hopsworks.md) environments and [common](admin/index.md) administration instructions. From 6e776b0636bbba2740a2d69505fe7fd0ee1bbcb4 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Tue, 8 Oct 2024 00:34:15 +0100 Subject: [PATCH 19/24] Fix link checks and corresponding content. --- docs/setup_installation/admin/roleChaining.md | 5 +- docs/setup_installation/admin/user.md | 2 +- .../aws/instance_profile_permissions.md | 116 ------------------ .../common/arrow_flight_duckdb.md | 56 +++++++++ .../on_prem/external_kafka_cluster.md | 4 +- .../fs/feature_group/data_validation.md | 2 +- .../data_validation_best_practices.md | 2 +- .../fs/feature_group/feature_monitoring.md | 2 +- .../fs/feature_monitoring/index.md | 6 +- .../fs/feature_view/feature_monitoring.md | 2 +- .../fs/storage_connector/creation/redshift.md | 6 +- .../fs/storage_connector/creation/s3.md | 8 +- .../integrations/databricks/networking.md | 6 - 13 files changed, 76 insertions(+), 141 deletions(-) delete mode 100644 docs/setup_installation/aws/instance_profile_permissions.md create mode 100644 docs/setup_installation/common/arrow_flight_duckdb.md diff --git a/docs/setup_installation/admin/roleChaining.md b/docs/setup_installation/admin/roleChaining.md index 8815acfd7..a4a76c624 100644 --- a/docs/setup_installation/admin/roleChaining.md +++ b/docs/setup_installation/admin/roleChaining.md @@ -13,7 +13,8 @@ Before you begin this guide you'll need the following: - Administrator account on a Hopsworks cluster. ### Step 1: Create an instance profile role -To use role chaining the head node need to be able to impersonate the roles you want to be linked to your project. For this you need to create an instance profile with assume role permissions and attach it to your head node. For more details about the creation of instance profile see the [aws documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). If running in [managed.hopsworks.ai](https://managed.hopsworks.ai) you can also refer to our [getting started guide](../setup_installation/aws/getting_started.md#step-3-creating-instance-profile). +To use role chaining the head node need to be able to impersonate the roles you want to be linked to your project. For this you need to create an instance profile with assume role permissions and attach it to your head node. For more details about the creation of instance profile see the [aws documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html). + !!!note To ensure that the Hopsworks users can't use the head node instance profile and impersonate the roles by their own means, you need to ensure that they can't execute code on the head node. This means having all jobs running on worker nodes and using EKS to run jupyter notebooks. @@ -75,7 +76,7 @@ Add mappings by clicking on *New role chaining*. Enter the project name. Select
Create Role Chaining
-Project member can now create connectors using *temporary credentials* to assume the role you configured. More detail about using temporary credentials can be found [here](../user_guides/fs/storage_connector/creation/s3.md#temporary-credentials). +Project member can now create connectors using *temporary credentials* to assume the role you configured. More detail about using temporary credentials can be found [here](../../user_guides/fs/storage_connector/creation/s3.md#temporary-credentials). Project member can see the list of role they can assume by going the _Project Settings_ -> [Assuming IAM Roles](../../../user_guides/projects/iam_role/iam_role_chaining) page. diff --git a/docs/setup_installation/admin/user.md b/docs/setup_installation/admin/user.md index 479122655..d394bb04d 100644 --- a/docs/setup_installation/admin/user.md +++ b/docs/setup_installation/admin/user.md @@ -87,7 +87,7 @@ it securely to the user. ### Step 5: Reset user password In the case where a user loses her/his password and can not recover it with the -[password recovery](../user_guides/projects/auth/recovery.md), an administrator can reset it for them. +[password recovery](../../user_guides/projects/auth/recovery.md), an administrator can reset it for them. On the bottom of the _Users_ page click on the _Reset a user password_ link. A popup window with a dropdown for searching users by name or email will open. Find the user and click on _Reset new password_. diff --git a/docs/setup_installation/aws/instance_profile_permissions.md b/docs/setup_installation/aws/instance_profile_permissions.md deleted file mode 100644 index 6bd8c63b0..000000000 --- a/docs/setup_installation/aws/instance_profile_permissions.md +++ /dev/null @@ -1,116 +0,0 @@ - -Replace the following placeholders with their appropriate values - -* *BUCKET_NAME* - S3 bucket name -* *REGION* - region where the cluster is deployed -* *ECR_AWS_ACCOUNT_ID* - AWS account id for ECR repositories - -!!! note - Some of these permissions can be removed. Refer to [this guide](restrictive_permissions.md#limiting-the-instance-profile-permissions) for more information. - -```json -{ - "Version": "2012-10-17", - "Statement": [ - { - "Sid": "hopsworksaiInstanceProfile", - "Effect": "Allow", - "Action": [ - "S3:PutObject", - "S3:ListBucket", - "S3:GetObject", - "S3:DeleteObject", - "S3:AbortMultipartUpload", - "S3:ListBucketMultipartUploads", - "S3:PutLifecycleConfiguration", - "S3:GetLifecycleConfiguration", - "S3:PutBucketVersioning", - "S3:GetBucketVersioning", - "S3:ListBucketVersions", - "S3:DeleteObjectVersion" - ], - "Resource": [ - "arn:aws:s3:::BUCKET_NAME/*", - "arn:aws:s3:::BUCKET_NAME" - ] - }, - { - "Sid": "AllowPullImagesFromHopsworkAi", - "Effect": "Allow", - "Action": [ - "ecr:GetDownloadUrlForLayer", - "ecr:BatchGetImage" - ], - "Resource": [ - "arn:aws:ecr:REGION:822623301872:repository/filebeat", - "arn:aws:ecr:REGION:822623301872:repository/base", - "arn:aws:ecr:REGION:822623301872:repository/onlinefs", - "arn:aws:ecr:REGION:822623301872:repository/airflow", - "arn:aws:ecr:REGION:822623301872:repository/git", - "arn:aws:ecr:REGION:822623301872:repository/testconnector", - "arn:aws:ecr:REGION:822623301872:repository/flyingduck" - ] - }, - { - "Sid": "AllowCreateRespositry", - "Effect": "Allow", - "Action": "ecr:CreateRepository", - "Resource": "*" - }, - { - "Sid": "AllowPushandPullImagesToUserRepo", - "Effect": "Allow", - "Action": [ - "ecr:GetDownloadUrlForLayer", - "ecr:BatchGetImage", - "ecr:CompleteLayerUpload", - "ecr:UploadLayerPart", - "ecr:InitiateLayerUpload", - "ecr:BatchCheckLayerAvailability", - "ecr:PutImage", - "ecr:ListImages", - "ecr:BatchDeleteImage", - "ecr:GetLifecyclePolicy", - "ecr:PutLifecyclePolicy", - "ecr:TagResource" - ], - "Resource": [ - "arn:aws:ecr:REGION:ECR_AWS_ACCOUNT_ID:repository/*/filebeat", - "arn:aws:ecr:REGION:ECR_AWS_ACCOUNT_ID:repository/*/base", - "arn:aws:ecr:REGION:ECR_AWS_ACCOUNT_ID:repository/*/onlinefs", - "arn:aws:ecr:REGION:ECR_AWS_ACCOUNT_ID:repository/*/airflow", - "arn:aws:ecr:REGION:ECR_AWS_ACCOUNT_ID:repository/*/git", - "arn:aws:ecr:REGION:ECR_AWS_ACCOUNT_ID:repository/*/testconnector", - "arn:aws:ecr:REGION:ECR_AWS_ACCOUNT_ID:repository/*/flyingduck" - ] - }, - { - "Sid": "AllowGetAuthToken", - "Effect": "Allow", - "Action": "ecr:GetAuthorizationToken", - "Resource": "*" - }, - { - "Effect": "Allow", - "Action": [ - "cloudwatch:PutMetricData", - "ec2:DescribeVolumes", - "ec2:DescribeTags", - "logs:PutLogEvents", - "logs:DescribeLogStreams", - "logs:DescribeLogGroups", - "logs:CreateLogStream", - "logs:CreateLogGroup" - ], - "Resource": "*" - }, - { - "Effect": "Allow", - "Action": [ - "ssm:GetParameter" - ], - "Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*" - } - ] -} -``` diff --git a/docs/setup_installation/common/arrow_flight_duckdb.md b/docs/setup_installation/common/arrow_flight_duckdb.md new file mode 100644 index 000000000..bc8df34a2 --- /dev/null +++ b/docs/setup_installation/common/arrow_flight_duckdb.md @@ -0,0 +1,56 @@ +# ArrowFlight Server with DuckDB +By default, Hopsworks uses big data technologies (Spark or Hive) to create training data and read data for Python clients. +This is great for large datasets, but for small or moderately sized datasets (think of the size of data that would fit in a Pandas +DataFrame in your local Python environment), the overhead of starting a Spark or Hive job and doing distributed data processing can be significant. + +ArrowFlight Server with DuckDB significantly reduces the time that Python clients need to read feature groups +and batch inference data from the Feature Store, as well as creating moderately-sized in-memory training datasets. + +When the service is enabled, clients will automatically use it for the following operations: + +- [reading Feature Groups](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_group_api/#read) +- [reading Queries](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/query_api/#read) +- [reading Training Datasets](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_training_data) +- [creating In-Memory Training Datasets](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#training_data) +- [reading Batch Inference Data](https://docs.hopsworks.ai/feature-store-api/{{{ hopsworks_version }}}/generated/api/feature_view_api/#get_batch_data) + +For larger datasets, clients can still make use of the Spark/Hive backend by explicitly setting +`read_options={"use_hive": True}`. + +## Service configuration + +!!! note + Supported only on AWS at the moment. + +!!! note + Make sure that your cross account role has the load balancer permissions as described in [here](../../aws/restrictive_permissions/#load-balancers-permissions-for-external-access), otherwise you have to create and manage the load balancer yourself. + +The ArrowFlight Server is co-located with RonDB in the Hopsworks cluster. +If the ArrowFlight Server is activated, RonDB and ArrowFlight Server can each use up to 50% +of the available resources on the node, so they can co-exist without impacting each other. +Just like RonDB, the ArrowFlight Server can be replicated across multiple nodes to serve more clients at lower latency. +To guarantee high performance, each individual ArrowFlight Server instance processes client requests sequentially. +Requests will be queued for up to 10 minutes before they are rejected. + +

+

+ Configure RonDB +
Activate ArrowFlight Server with DuckDB on a RonDB cluster
+
+

+ +To deploy ArrowFlight Server on a cluster: + +1. Select "RonDB cluster" +2. Select an instance type with at least 16GB of memory and 4 cores. (*) +3. Tick the checkbox `Enable ArrowFlight Server`. + +(*) The service should have at least the 2x the amount of memory available that a typical Python client would have. + Because RonDB and ArrowFlight Server share the same node we recommend selecting an instance type with at least 4x the + client memory. For example, if the service serves Python clients with typically 4GB of memory, + an instance with at least 16GB of memory should be selected. + An instance with 16GB of memory will be able to read feature groups and training datasets of up to 10-100M rows, + depending on the number of columns and size of the features (~2GB in parquet). The same instance will be able to create + point-in-time correct training datasets with 1-10M rows, also depending on the number and the size of the features. + Larger instances are able to handle larger datasets. The numbers scale roughly linearly with the instance size. + diff --git a/docs/setup_installation/on_prem/external_kafka_cluster.md b/docs/setup_installation/on_prem/external_kafka_cluster.md index f112afe44..d7bb893a1 100644 --- a/docs/setup_installation/on_prem/external_kafka_cluster.md +++ b/docs/setup_installation/on_prem/external_kafka_cluster.md @@ -10,7 +10,7 @@ This guide will cover how to configure an Hopsworks cluster to leverage an exter ## Configure the external Kafka cluster integration -To enable the integration with an external Kafka cluster, you should set the `enable_bring_your_own_kafka` [configuration option](../../admin/variables.md) to `true`. +To enable the integration with an external Kafka cluster, you should set the `enable_bring_your_own_kafka` [configuration option](../admin/variables.md) to `true`. This can also be achieved in the cluster definition by setting the following attribute: ``` @@ -64,4 +64,4 @@ As mentioned above, when configuring Hopsworks to use an external Kafka cluster, Users should create a [Kafka storage connector](../../user_guides/fs/storage_connector/creation/kafka.md) named `kafka_connector` which is going to be used by the feature store clients to configure the necessary Kafka producers to send data. The configuration is done for each project to ensure its members have the necessary authentication/authorization. -If the storage connector is not found in the project, default values referring to Hopsworks managed Kafka will be used. \ No newline at end of file +If the storage connector is not found in the project, default values referring to Hopsworks managed Kafka will be used. diff --git a/docs/user_guides/fs/feature_group/data_validation.md b/docs/user_guides/fs/feature_group/data_validation.md index abb91121f..1e5d8886a 100644 --- a/docs/user_guides/fs/feature_group/data_validation.md +++ b/docs/user_guides/fs/feature_group/data_validation.md @@ -63,7 +63,7 @@ First checkout the pre-requisite and Hopsworks setup to follow the guide below. In order to define and validate an expectation when writing to a Feature Group, you will need: - A Hopsworks project. If you don't have a project yet you can go to [app.hopsworks.ai](https://app.hopsworks.ai), signup with your email and create your first project. -- An API key, you can get one by following the instructions [here](../../../setup_installation/common/api_key.md) +- An API key, you can get one by going to "Account Settings" on [app.hopsworks.ai](https://app.hopsworks.ai). - The [Hopsworks Python library](https://pypi.org/project/hopsworks) installed in your client. See the [installation guide](../../client_installation/index.md). #### Connect your notebook to Hopsworks diff --git a/docs/user_guides/fs/feature_group/data_validation_best_practices.md b/docs/user_guides/fs/feature_group/data_validation_best_practices.md index 0595a59b1..ddeb32a9f 100644 --- a/docs/user_guides/fs/feature_group/data_validation_best_practices.md +++ b/docs/user_guides/fs/feature_group/data_validation_best_practices.md @@ -101,7 +101,7 @@ timeseries = pd.DataFrame( While checking your feature engineering pipeline executed properly in the morning can be good enough in the development phase, it won't make the cut for demanding production use-cases. In Hopsworks, you can setup alerts if ingestion fails or succeeds. -First you will need to configure your preferred communication endpoint: slack, email or pagerduty. Check out [this page](../../../admin/alert.md) for more information on how to set it up. A typical use-case would be to add an alert on ingestion success to a Feature Group you created to hold data that failed validation. Here is a quick walkthrough: +First you will need to configure your preferred communication endpoint: slack, email or pagerduty. Check out [this page](../../../setup_installation/admin/alert.md) for more information on how to set it up. A typical use-case would be to add an alert on ingestion success to a Feature Group you created to hold data that failed validation. Here is a quick walkthrough: 1. Go the Feature Group page in the UI 2. Scroll down and click on the `Add an alert` button. diff --git a/docs/user_guides/fs/feature_group/feature_monitoring.md b/docs/user_guides/fs/feature_group/feature_monitoring.md index 4644016a4..4720e2e7d 100644 --- a/docs/user_guides/fs/feature_group/feature_monitoring.md +++ b/docs/user_guides/fs/feature_group/feature_monitoring.md @@ -20,7 +20,7 @@ After that, you can optionally define a detection window of data to compute stat In order to setup feature monitoring for a Feature Group, you will need: - A Hopsworks project. If you don't have a project yet you can go to [app.hopsworks.ai](https://app.hopsworks.ai), signup with your email and create your first project. -- An API key, you can get one by following the instructions [here](../../../setup_installation/common/api_key.md) +- An API key, you can get one by going to "Account Settings" on [app.hopsworks.ai](https://app.hopsworks.ai). - The Hopsworks Python library installed in your client. See the [installation guide](../../client_installation/index.md). - A Feature Group diff --git a/docs/user_guides/fs/feature_monitoring/index.md b/docs/user_guides/fs/feature_monitoring/index.md index a8aca0da8..d38e8c44f 100644 --- a/docs/user_guides/fs/feature_monitoring/index.md +++ b/docs/user_guides/fs/feature_monitoring/index.md @@ -12,7 +12,7 @@ in Hopsworks and enable the user to visualise the temporal evolution of statisti - **Statistics Comparison**: Enabled only for individual features, this variant allows the user to schedule the statistics computation on both a _detection_ and a _reference window_. By providing information about how to compare those statistics, you can setup alerts to quickly detect critical change in the data. For more details, see the [Statistics comparison guide](statistics_comparison.md). !!! important - To enable feature monitoring in Hopsworks, you need to set the `enable_feature_monitoring` [configuration option](../../../admin/variables.md) to `true`. + To enable feature monitoring in Hopsworks, you need to set the `enable_feature_monitoring` [configuration option](../../../setup_installation/admin/variables.md) to `true`. This can also be achieved in the cluster definition by setting the following attribute: ``` @@ -42,9 +42,9 @@ Hopsworks provides an interactive graph to make the exploration of statistics an ## Alerting -Moreover, feature monitoring integrates with the Hopsworks built-in system for [alerts](../../../admin/alert.md), enabling you to setup alerts that will notify you as soon as shift is detected in your feature values. You can setup alerts for feature monitoring at a Feature Group, Feature View, and project level. +Moreover, feature monitoring integrates with the Hopsworks built-in system for [alerts](../../../setup_installation/admin/alert.md), enabling you to setup alerts that will notify you as soon as shift is detected in your feature values. You can setup alerts for feature monitoring at a Feature Group, Feature View, and project level. !!! tip "Select the correct trigger" When configuring alerts for feature monitoring, make sure you select the `feature monitoring-shift detected` or `feature monitoring-shift undetected` trigger. -![Feature monitoring alerts](../../../assets/images/guides/fs/feature_monitoring/fm-alerts.png) \ No newline at end of file +![Feature monitoring alerts](../../../assets/images/guides/fs/feature_monitoring/fm-alerts.png) diff --git a/docs/user_guides/fs/feature_view/feature_monitoring.md b/docs/user_guides/fs/feature_view/feature_monitoring.md index 6d42d508a..91f25c4b0 100644 --- a/docs/user_guides/fs/feature_view/feature_monitoring.md +++ b/docs/user_guides/fs/feature_view/feature_monitoring.md @@ -20,7 +20,7 @@ After that, you can optionally define a detection window of data to compute stat In order to setup feature monitoring for a Feature View, you will need: - A Hopsworks project. If you don't have a project yet you can go to [app.hopsworks.ai](https://app.hopsworks.ai), signup with your email and create your first project. -- An API key, you can get one by following the instructions [here](../../../setup_installation/common/api_key.md) +- An API key, you can get one by going to "Account Settings" on [app.hopsworks.ai](https://app.hopsworks.ai). - The [Hopsworks Python library](https://pypi.org/project/hopsworks) installed in your client. See the [installation guide](../../client_installation/index.md). - A Feature View - A Training Dataset diff --git a/docs/user_guides/fs/storage_connector/creation/redshift.md b/docs/user_guides/fs/storage_connector/creation/redshift.md index 7dfbd30d1..8939747b1 100644 --- a/docs/user_guides/fs/storage_connector/creation/redshift.md +++ b/docs/user_guides/fs/storage_connector/creation/redshift.md @@ -22,7 +22,7 @@ Before you begin this guide you'll need to retrieve the following information fr - **Database port:** The port of the cluster. Defaults to 5349. - **Authentication method:** There are three options available for authenticating with the Redshift cluster. The first option is to configure a username and a password. The second option is to configure an IAM role. With IAM roles, Jobs or notebooks launched on Hopsworks do not need to explicitly authenticate with Redshift, as the HSFS library will transparently use the IAM role to acquire a temporary credential to authenticate the specified user. -Read more about IAM roles in our [AWS credentials pass-through guide](../../../../admin/roleChaining.md). Lastly, +Read more about IAM roles in our [AWS credentials pass-through guide](../../../../setup_installation/admin/roleChaining.md). Lastly, option `Instance Role` will use the default ARN Role configured for the cluster instance. ## Creation in the UI @@ -62,7 +62,7 @@ Enter the details for your Redshift connector. Start by giving it a **name** and By default, the session duration that the role will be assumed for is 1 hour or 3600 seconds. This means if you want to use the storage connector for example to [read or create an external Feature Group from Redshift](../usage.md##creating-an-external-feature-group), the operation cannot take longer than one hour. - Your administrator can change the default session duration for AWS storage connectors, by first [increasing the max session duration of the IAM Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html#id_roles_use_view-role-max-session) that you are assuming. And then changing the `fs_storage_connector_session_duration` [configuration property](../../../../admin/variables.md) to the appropriate value in seconds. + Your administrator can change the default session duration for AWS storage connectors, by first [increasing the max session duration of the IAM Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html#id_roles_use_view-role-max-session) that you are assuming. And then changing the `fs_storage_connector_session_duration` [configuration property](../../../../setup_installation/admin/variables.md) to the appropriate value in seconds. ### Step 3: Upload the Redshift database driver (optional) @@ -106,4 +106,4 @@ file, you can select it using the "From Project" option. To upload the jar file ## Next Steps -Move on to the [usage guide for storage connectors](../usage.md) to see how you can use your newly created Redshift connector. \ No newline at end of file +Move on to the [usage guide for storage connectors](../usage.md) to see how you can use your newly created Redshift connector. diff --git a/docs/user_guides/fs/storage_connector/creation/s3.md b/docs/user_guides/fs/storage_connector/creation/s3.md index a85efab56..f44b6fcb0 100644 --- a/docs/user_guides/fs/storage_connector/creation/s3.md +++ b/docs/user_guides/fs/storage_connector/creation/s3.md @@ -18,7 +18,7 @@ Before you begin this guide you'll need to retrieve the following information fr - **Bucket:** You will need a S3 bucket that you have access to. The bucket is identified by its name. - **Region (Optional):** You will need an S3 region to have complete control over data when managing the feature group that relies on this storage connector. The region is identified by its code. -- **Authentication Method:** You can authenticate using Access Key/Secret, or use IAM roles. If you want to use an IAM role it either needs to be attached to the entire Hopsworks cluster or Hopsworks needs to be able to assume the role. See [IAM role documentation](../../../../admin/roleChaining.md) for more information. +- **Authentication Method:** You can authenticate using Access Key/Secret, or use IAM roles. If you want to use an IAM role it either needs to be attached to the entire Hopsworks cluster or Hopsworks needs to be able to assume the role. See [IAM role documentation](../../../../setup_installation/admin/roleChaining.md) for more information. - **Server Side Encryption details:** If your bucket has server side encryption (SSE) enabled, make sure you know which algorithm it is using (AES256 or SSE-KMS). If you are using SSE-KMS, you need the resource ARN of the managed key. ## Creation in the UI @@ -48,13 +48,13 @@ Optionally, specify the region if you wish to have a Hopsworks-managed feature g Choose instance role if you have an EC2 instance profile attached to your Hopsworks cluster nodes with a role which grants you access to the specified bucket. #### Temporary Credentials -Choose temporary credentials if you are using [AWS Role chaining](../../../../admin/roleChaining.md) to control the access permission on a project and user role base. Once you have selected *Temporary Credentials* select the role that give access to the specified bucket. For this role to appear in the list it needs to have been configured by an administrator, see the [AWS Role chaining documentation](../../../../admin/roleChaining.md) for more details. +Choose temporary credentials if you are using [AWS Role chaining](../../../../setup_installation/admin/roleChaining.md) to control the access permission on a project and user role base. Once you have selected *Temporary Credentials* select the role that give access to the specified bucket. For this role to appear in the list it needs to have been configured by an administrator, see the [AWS Role chaining documentation](../../../../setup_installation/admin/roleChaining.md) for more details. !!! warning "Session Duration" By default, the session duration that the role will be assumed for is 1 hour or 3600 seconds. This means if you want to use the storage connector for example to write [training data to S3](../usage.md#writing-training-data), the training dataset creation cannot take longer than one hour. - Your administrator can change the default session duration for AWS storage connectors, by first [increasing the max session duration of the IAM Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html#id_roles_use_view-role-max-session) that you are assuming. And then changing the `fs_storage_connector_session_duration` [configuration variable](../../../../admin/variables.md) to the appropriate value in seconds. + Your administrator can change the default session duration for AWS storage connectors, by first [increasing the max session duration of the IAM Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html#id_roles_use_view-role-max-session) that you are assuming. And then changing the `fs_storage_connector_session_duration` [configuration variable](../../../../setup_installation/admin/variables.md) to the appropriate value in seconds. #### Access Key/Secret The most simple authentication method are Access Key/Secret, choose this option to get started quickly, if you are able to retrieve the keys using the IAM user administration. @@ -77,4 +77,4 @@ Here you can specify any additional spark options that you wish to add to the sp To connect to a S3 compatible storage other than AWS S3, you can add the option with key as `fs.s3a.endpoint` and the endpoint you want to use as value. The storage connector will then be able to read from your specified S3 compatible storage. ## Next Steps -Move on to the [usage guide for storage connectors](../usage.md) to see how you can use your newly created S3 connector. \ No newline at end of file +Move on to the [usage guide for storage connectors](../usage.md) to see how you can use your newly created S3 connector. diff --git a/docs/user_guides/integrations/databricks/networking.md b/docs/user_guides/integrations/databricks/networking.md index 509fd92af..065de136d 100644 --- a/docs/user_guides/integrations/databricks/networking.md +++ b/docs/user_guides/integrations/databricks/networking.md @@ -22,12 +22,6 @@ Identify your Databricks VPC by searching for VPCs containing Databricks in thei

-!!! info "Hopsworks installer" - If you are performing an installation using the [Hopsworks installer script](../../../setup_installation/on_prem/hopsworks_installer.md), ensure that the virtual machines you install Hopsworks on are deployed in the EMR VPC. - -!!! info "managed.hopsworks.ai" - If you are working on **[managed.hopsworks.ai](https://managed.hopsworks.ai)**, you can directly deploy the Hopsworks instance to the Databricks VPC, by simply selecting it at the [VPC selection step during cluster creation](https://docs.hopsworks.ai/hopsworks-cloud/latest/aws/cluster_creation/#step-8-vpc-selection). - **Option 2: Set up VPC peering** Follow the guide [VPC Peering](https://docs.databricks.com/administration-guide/cloud-configurations/aws/vpc-peering.html) to set up VPC peering between the Feature Store cluster and Databricks. Get your Feature Store *VPC ID* and *CIDR* by searching for the Feature Store VPC in the AWS Management Console: From 44deac271d068be87c4707be15ce66d7880d4deb Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Tue, 8 Oct 2024 00:48:18 +0100 Subject: [PATCH 20/24] Fix more link check failures. --- docs/setup_installation/admin/audit/audit-logs.md | 2 +- docs/user_guides/projects/auth/krb.md | 4 ++-- docs/user_guides/projects/auth/ldap.md | 4 ++-- docs/user_guides/projects/auth/oauth.md | 4 ++-- docs/user_guides/projects/iam_role/iam_role_chaining.md | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/setup_installation/admin/audit/audit-logs.md b/docs/setup_installation/admin/audit/audit-logs.md index 660ccadd0..b15bd391b 100644 --- a/docs/setup_installation/admin/audit/audit-logs.md +++ b/docs/setup_installation/admin/audit/audit-logs.md @@ -17,7 +17,7 @@ Audit logs can be configured from the _Cluster Settings_ Configuration tab. You can access the _Configuration_ page of your Hopsworks cluster by clicking on your name, in the top right corner, and choosing _Cluster Settings_ from the dropdown menu.
- Audit log configuration + Audit log configuration
Audit log configuration
diff --git a/docs/user_guides/projects/auth/krb.md b/docs/user_guides/projects/auth/krb.md index 4e246f022..a0f90abb6 100644 --- a/docs/user_guides/projects/auth/krb.md +++ b/docs/user_guides/projects/auth/krb.md @@ -6,7 +6,7 @@ Hopsworks supports different methods of authentication. Here we will look at aut ## Prerequisites A Hopsworks cluster with Kerberos authentication. -See [Configure Kerberos](../../../../admin/ldap/configure-krb) on how to configure Kerberos on your cluster. +See [Configure Kerberos](../../../../setup_installation/admin/ldap/configure-krb) on how to configure Kerberos on your cluster. ### Step 1: Log in with Kerberos If Kerberos is configured you will see a _Log in using_ alternative on the login page. Choose Kerberos and click on @@ -56,4 +56,4 @@ In the landing page, you will find two buttons. Use these buttons to either crea _demo project_ or [a new project](../../../projects/project/create_project). ## Conclusion -In this guide you learned how to log in to Hopsworks using Kerberos. \ No newline at end of file +In this guide you learned how to log in to Hopsworks using Kerberos. diff --git a/docs/user_guides/projects/auth/ldap.md b/docs/user_guides/projects/auth/ldap.md index 6a45a9407..b79f878d0 100644 --- a/docs/user_guides/projects/auth/ldap.md +++ b/docs/user_guides/projects/auth/ldap.md @@ -5,7 +5,7 @@ Hopsworks supports different methods of authentication. Here we will look at aut ## Prerequisites A Hopsworks cluster with LDAP authentication. -See [Configure LDAP](../../../../admin/ldap/configure-ldap) on how to configure LDAP on your cluster. +See [Configure LDAP](../../../../setup_installation/admin/ldap/configure-ldap) on how to configure LDAP on your cluster. ### Step 1: Log in with LDAP If LDAP is configured you will see a _Log in using_ alternative on the login page. Choose LDAP and type in your @@ -40,4 +40,4 @@ In the landing page, you will find two buttons. Use these buttons to either crea _demo project_ or [a new project](../../../projects/project/create_project). ## Conclusion -In this guide you learned how to log in to Hopsworks using LDAP. \ No newline at end of file +In this guide you learned how to log in to Hopsworks using LDAP. diff --git a/docs/user_guides/projects/auth/oauth.md b/docs/user_guides/projects/auth/oauth.md index f21c04f45..4c52f16ec 100644 --- a/docs/user_guides/projects/auth/oauth.md +++ b/docs/user_guides/projects/auth/oauth.md @@ -5,7 +5,7 @@ Hopsworks supports different methods of authentication. Here we will look at aut ## Prerequisites A Hopsworks cluster with OAuth authentication. -See [Configure OAuth2](../../../../admin/oauth2/create-client) on how to configure OAuth on your cluster. +See [Configure OAuth2](../../../../setup_installation/admin/oauth2/create-client) on how to configure OAuth on your cluster. ### Step 1: Log in with OAuth If OAuth is configured a **Login with ** button will appear in the login page. Use this button to log in to Hopsworks @@ -35,4 +35,4 @@ In the landing page, you will find two buttons. Use these buttons to either crea _demo project_ or [a new project](../../../projects/project/create_project). ## Conclusion -In this guide you learned how to log in to Hopsworks using Third-party Identity Provider. \ No newline at end of file +In this guide you learned how to log in to Hopsworks using Third-party Identity Provider. diff --git a/docs/user_guides/projects/iam_role/iam_role_chaining.md b/docs/user_guides/projects/iam_role/iam_role_chaining.md index a63bffeeb..0f70a5fc1 100644 --- a/docs/user_guides/projects/iam_role/iam_role_chaining.md +++ b/docs/user_guides/projects/iam_role/iam_role_chaining.md @@ -10,7 +10,7 @@ Before you begin this guide you'll need the following: - A Hopsworks cluster running on EC2. - [Role chaining](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_terms-and-concepts.html#iam-term-role-chaining) setup in AWS. -- Configure role mappings in Hopsworks. For a guide on how to configure this see [AWS IAM Role Chaining](../../../../admin/roleChaining). +- Configure role mappings in Hopsworks. For a guide on how to configure this see [AWS IAM Role Chaining](../../../../setup_installation/admin/roleChaining). ## UI In this guide, you will learn how to use a mapped IAM role in your project. @@ -29,4 +29,4 @@ In the _Project Settings_ page you can find the _IAM Role Chaining_ section show You can now use the IAM roles listed in your project when creating a storage connector with [Temporary Credentials](../../../fs/storage_connector/creation/s3/#temporary-credentials). ## Conclusion -In this guide you learned how to use IAM roles on a cluster deployed on an EC2 instances. \ No newline at end of file +In this guide you learned how to use IAM roles on a cluster deployed on an EC2 instances. From 19b93cbf345aa3876f1f99525c49251726574da5 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Tue, 8 Oct 2024 00:54:34 +0100 Subject: [PATCH 21/24] Remove last old link to hopsworks installer --- docs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index 406edaf62..2d72b7ab1 100644 --- a/docs/index.md +++ b/docs/index.md @@ -185,7 +185,7 @@ pointer-events: initial;
- +
On-premise
From 8b31521fdf8203e662ea1bfa4b17a9cb4d7205f0 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Tue, 8 Oct 2024 01:01:26 +0100 Subject: [PATCH 22/24] Remove non-existant reference to aws permissions. --- docs/setup_installation/common/arrow_flight_duckdb.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/setup_installation/common/arrow_flight_duckdb.md b/docs/setup_installation/common/arrow_flight_duckdb.md index bc8df34a2..fd0e745b0 100644 --- a/docs/setup_installation/common/arrow_flight_duckdb.md +++ b/docs/setup_installation/common/arrow_flight_duckdb.md @@ -21,9 +21,6 @@ For larger datasets, clients can still make use of the Spark/Hive backend by exp !!! note Supported only on AWS at the moment. - -!!! note - Make sure that your cross account role has the load balancer permissions as described in [here](../../aws/restrictive_permissions/#load-balancers-permissions-for-external-access), otherwise you have to create and manage the load balancer yourself. The ArrowFlight Server is co-located with RonDB in the Hopsworks cluster. If the ArrowFlight Server is activated, RonDB and ArrowFlight Server can each use up to 50% From c6c790523dc4e63bb5cea2525e766aa9a7cca71e Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Tue, 8 Oct 2024 01:08:07 +0100 Subject: [PATCH 23/24] Fix assets reference in ha.md --- docs/setup_installation/admin/ha-dr/ha.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/setup_installation/admin/ha-dr/ha.md b/docs/setup_installation/admin/ha-dr/ha.md index 3952e7fa2..1bd6cb3e2 100644 --- a/docs/setup_installation/admin/ha-dr/ha.md +++ b/docs/setup_installation/admin/ha-dr/ha.md @@ -10,7 +10,7 @@ At a high level a Hopsworks cluster can be divided into 4 groups of nodes. Each Example deployment:
- Example HA deployment + Example HA deployment
Example High Available deployment
From 005b455c8df296b04e0002ea50f41da43cd7c3b7 Mon Sep 17 00:00:00 2001 From: Raymond Cunningham Date: Thu, 24 Oct 2024 09:29:43 +0100 Subject: [PATCH 24/24] Add 4.0 migration info. --- docs/user_guides/index.md | 2 +- docs/user_guides/migration/40_migration.md | 134 +++++++++++++++++++++ mkdocs.yml | 2 +- 3 files changed, 136 insertions(+), 2 deletions(-) create mode 100644 docs/user_guides/migration/40_migration.md diff --git a/docs/user_guides/index.md b/docs/user_guides/index.md index e57a0043a..549180953 100644 --- a/docs/user_guides/index.md +++ b/docs/user_guides/index.md @@ -6,4 +6,4 @@ This section serves to provide guides and examples for the common usage of abstr - [Feature Store](fs/index.md): Learn about the common usage of the core Hopsworks Feature Store abstractions, such as Feature Groups, Feature Views, Data Validation and Storage Connectors. Also, learn from the [Client Integrations](integrations/index.md) guides how to connect to the Feature Store from external environments such as a local Python environment, Databricks, or AWS Sagemaker - [MLOps](mlops/index.md): Learn about the common usage of Hopsworks MLOps abstractions, such as the Model Registry or Model Serving. - [Projects](projects/index.md): The core abstraction on Hopsworks are [Projects](../concepts/projects/governance.md). Learn in this section how to manage your projects and the services therein. -- [Migration](migration/30_migration.md): Learn how to migrate to newer versions of Hopsworks. \ No newline at end of file +- [Migration](migration/40_migration.md): Learn how to migrate to newer versions of Hopsworks. diff --git a/docs/user_guides/migration/40_migration.md b/docs/user_guides/migration/40_migration.md new file mode 100644 index 000000000..cbba2b44f --- /dev/null +++ b/docs/user_guides/migration/40_migration.md @@ -0,0 +1,134 @@ +# 4.0 Migration Guide + +## Breaking Changes + +With the release of Hopsworks 4.0, a number of necessary breaking +changes have been put in place to improve the overall experience of +using the Hopsworks platform. These breaking changes can be categorized +in the following areas: + +- Python API + +- Multi-Environment Docker Images + +- On-Demand Transformation Functions + +### Python API + +A number of significant changes have been made in the Python API +Hopsworks 4.0. Previously, in Hopsworks 3.X, there were 3 python +libraries used (“hopsworks”, “hsfs” & “hsml”) to develop feature, +training & inference pipelines, with the 4.0 release there is now +one single “hopsworks” python library that can should be used. For +backwards compatibility, it will still be possible to import both +the “hsfs” & “hsml” libraries but these are now effectively aliases +to the “hopsworks” python library and their use going forward should +be considered as deprecated. + +Another significant change in the Hopsworks Python API is the use of +optional extras to allow a developer to easily import exactly what is +needed as part of their work. The main ones are great-expectations and +polars. It is arguable whether this is a breaking change but it is +important to note depending on how a particular pipeline has been +written which may encounter a problem when executing using Hopsworks +4.0. + +Finally, there are a number of relatively small breaking changes and +deprecated methods to improve the developer experience, these include: + +- connection.init() is now considered deprecated + +- When loading arrow_flight_client, an OptionalDependencyNotFoundError can be now thrown providing more detailed information on the error than the previous ModuleNotFoundError in 3.X. + +- DatasetApi's zip and unzip will now return False when a timeout is exceeded instead of previously throwing an Exception + + +### Multi-Environment Docker Images + +As part of the Hopsworks 4.0 release, an engineering team using +Hopsworks can now customize the docker images that they use for their +feature, training and inference pipelines. By adding this flexibility, +a set of breaking changes are necessary. Instead of having one common +docker image for fti pipelines, with the release of 4.0 a number of +specific docker images are provided to allow an engineering team using +Hopsworks to install exactly what they need to get their feature, +training and inference pipelines up and running. This breaking change +will require existing customers running Hopsworks 3.X to test their +existing pipelines using Hopsworks 4.0 before upgrading their +production environments. + + +### On-Demand Transformation Functions + +A number of changes have been made to transformation functions in the +last releases of Hopsworks. With 4.0, On-Demand Transformation Functions +are now better supported which has resulted in some breaking changes. +The following is how transformation functions were used in previous +versions of Hopsworks and the how transformation functions are used +in the 4.0 release. + + +=== "Pre-4.0" + ```python + ################################################# + # Creating transformation funciton Hopsworks 3.8# + ################################################# + + # Define custom transformation function + def add_one(feature): + return feature + 1 + + # Create transformation function + add_one = fs.create_transformation_function(add_one, + output_type=int, + version=1, + ) + + # Save transformation function + add_one.save() + + # Retrieve transformation function + scaler = fs.get_transformation_function( + name="add_one", + version=1, + ) + + # Create feature view + feature_view = fs.get_or_create_feature_view( + name='serving_fv', + version=1, + query=selected_features, + # Apply your custom transformation functions to the feature `feature_1` + transformation_functions={ + "feature_1": add_one, + }, + labels=['target'], + ) + ``` + +=== "4.0" + ```python + ################################################# + # Creating transformation funciton Hopsworks 4.0# + ################################################# + + # Define custom transformation function + @hopsworks.udf(int) + def add_one(feature): + return feature + 1 + + # Create feature view + feature_view = fs.get_or_create_feature_view( + name='serving_fv', + version=1, + query=selected_features, + # Apply the custom transformation functions defined to the feature `feature_1` + transformation_functions=[ + add_one("feature_1"), + ], + labels=['target'], + ) + ``` + +Note that the number of lines of code required has been significantly +reduced using the “@hopsworks.udf” python decorator. diff --git a/mkdocs.yml b/mkdocs.yml index 136af3428..abd3a1948 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -196,7 +196,7 @@ nav: - Troubleshooting: user_guides/mlops/serving/troubleshooting.md - Vector Database: user_guides/mlops/vector_database/index.md - Migration: - - 2.X to 3.0: user_guides/migration/30_migration.md + - 3.X to 4.0: user_guides/migration/40_migration.md - Setup and Administration: - setup_installation/index.md - Client Installation: