Skip to content

Docs/external feature groups update #483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/mkdocs-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,5 @@ jobs:
git config --global user.email mike@docs.hopsworks.ai

# Put this back and increment version when cutting a new release branch
# - name: mike deploy docs
# run: mike deploy 3.0 latest -u --push
- name: mike deploy docs
run: mike deploy 4.3 latest -u --push
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/images/guides/mlops/serving/deployment_overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/setup_installation/admin/auth.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Authentication Methods

## Introduction
Hopsworks can be configured to use different type of authentication methods. In this guide we will look at the
Hopsworks can be configured to use different types of authentication methods. In this guide we will look at the
different authentication methods available in Hopsworks.

## Prerequisites
Expand Down
30 changes: 26 additions & 4 deletions docs/user_guides/fs/feature_group/create_external.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,18 +134,40 @@ Nevertheless, external feature groups defined top of any storage connector can b

## Create using the UI

You can also create a new feature group through the UI. For this, navigate to the `Feature Groups` section and press the `Create` button at the top-right corner.
You can also create a new feature group through the UI. For this, navigate to the `Data Source ` section and select existing credentials or create new ones for your prefered data source.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/feature_group/no_feature_group_list.png" alt="List of Feature Groups">
<img src="../../../../assets/images/guides/feature_group/data_source.png" style="border: 10px solid #f5f5f5" alt="Data Source UI">
</figure>
</p>

Subsequently, you will be able to define its properties (such as name, mode, features, and more). Refer to the documentation above for an explanation of the parameters available, they are the same as when you create a feature group using the SDK. Finally, complete the creation by clicking `Create New Feature Group` at the bottom of the page.
If you have existing credentials, simply proceed by clicking `Next: Select Tables `. If you not, create and save the credentials first.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/feature_group/create_feature_group.png" alt="Create new Feature Group">
<img src="../../../../assets/images/guides/feature_group/credentials_selection.png" alt="setup credentials in Data Sources">
</figure>
</p>

The database navigation structure depends on your specific data source. You'll navigate through the appropriate hierarchy for your platform—such as Database → Schema → Table for Snowflake, or Project → Dataset → Table for BigQuery. In the UI you can select one or more tables, for each selected table, you must designate one or more primary keys before proceeding. You can also review the names and data types of individual columns you want to include.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/feature_group/ext_table_selection.png" style="border: 10px solid #f5f5f5" alt="Select Table in Data Sources for External feature Group">
</figure>
</p>

<p align="center">
<figure>
<img src="../../../../assets/images/guides/feature_group/primary_key_selection.png" style="border: 10px solid #f5f5f5" alt="select details of external feature group">
</figure>
</p>

Complete the creation by clicking `Next: Review Configuration` at the bottom of the page, you will be prompted with a final validation window where you will be able to create a name for your external feature group.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/feature_group/validation_ext_feature_group.png" alt="Validate the creation of a new external feature group">
</figure>
</p>
138 changes: 138 additions & 0 deletions docs/user_guides/mlops/serving/external-access.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
description: Documentation on how to configure external access to a model deployment
---

# How To Configure External Access To A Model Deployment

## Introduction

Hopsworks supports role-based access control (RBAC) for project members within a project, where a project ML assets can only be accessed by Hopsworks users that are members of that project (See [governance](../../../concepts/projects/governance.md)).

However, there are cases where you might want to grant ==external users== with access to specific model deployments without them having to register into Hopsworks or to join the project which will give them access to all project ML assets. For these cases, Hopsworks supports fine-grained access control to model deployments based on ==user groups== managed by an external Identity Provider.

!!! info "Authentication methods"
Hopsworks can be configured to use different types of authentication methods including OAuth2, LDAP and Kerberos. See the [Authentication Methods Guide](../../../setup_installation/admin/auth.md) for more information.

## GUI (for Hopsworks users)

### Step 1: Navigate to a model deployment

If you have at least one model deployment already created, navigate to the model deployments page by clicking on the `Deployments` tab on the navigation menu on the left.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/mlops/serving/deployments_tab_sidebar_with_list.svg" alt="Deployments navigation tab">
<figcaption>Deployments navigation tab</figcaption>
</figure>
</p>

Once in the model deployments page, find the model deployment you want to configure external access and click on the name of the deployment to open the model deployment overview page.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/mlops/serving/deployment_overview.png" alt="Deployment overview">
<figcaption>Deployment overview</figcaption>
</figure>
</p>

### Step 2: Go to External Access

You can find the external access configuration by clicking on `External access` on the navigation menu on the left or scrolling down to the external access section.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/mlops/serving/deployment_external_access.png" alt="Deployment external access">
<figcaption>External access configuration</figcaption>
</figure>
</p>

### Step 3: Add or remove user groups

In this section, you can add and remove user groups by clicking on `edit external user groups` and typing the group name in the **text-free** input field or **selecting** one of the existing ones in the dropdown list. After that, click on the `save` button to persist the changes.


!!! Warn "Case sensitivity"
Inference requests are authorized using a ==case-sensitive exact match== between the group names of the user making the request and the group names granted access to the model deployment. Therefore, a user assigned to the group `lab1` won't have access to a model deployment accessible by group `LAB1`.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/mlops/serving/deployment_external_access_edit.png" alt="Deployment external access">
<figcaption>External access configuration</figcaption>
</figure>
</p>

## GUI (for external users)

### Step 1: Login with the external identity provider

Navigate to Hopsworks, and click on the `Login with` button to sign in using the configured external identity provider (e.g., Keycloak in this example).

<p align="center">
<figure>
<img style="max-width: 50%" src="../../../../assets/images/guides/mlops/serving/login_external_idp.png" alt="Login external identity provider">
<figcaption>Login with External Identity Provider</figcaption>
</figure>
</p>

### Step 2: Explore the model deployments you are granted access to

Once you sign in to Hopsworks, you can see the list of model deployments you are granted access to based on your assigned groups.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/mlops/serving/deployment_external_list.png" alt="Deployments list">
<figcaption>Deployments with external access</figcaption>
</figure>
</p>

### Step 2: Inspect your current groups

You can find the current groups you are assigned to at the top of the page.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/mlops/serving/deployment_external_groups.png" alt="External user groups">
<figcaption>External user groups</figcaption>
</figure>
</p>

### Step 3: Get an API key

Inference requests to model deployments are authenticated and authorized based on your external user and user groups. You can create API keys to authenticate your inference requests by clicking on the `Create API Key` button.

!!! info "Authorization header"
API keys are set in the `Authorization` header following the format `ApiKey <api-key-value>`

<p align="center">
<figure>
<img src="../../../../assets/images/guides/mlops/serving/deployment_external_api_key.png" alt="Get API key">
<figcaption>Get API key</figcaption>
</figure>
</p>

### Step 4: Send inference requests

Depending on the type of model deployment, the URI of the model server can differ (e.g., `/chat/completions` for LLM deployments or `/predict` for traditional model deployments). You can find the corresponding URI on every model deployment card.

In addition to the `Authorization` header containing the API key, the `Host` header needs to be set according to the model deployment where the inference requests are sent to. This header is used by the ingress to route the inference requests to the corresponding model deployment. You can find the `Host` header value in the model deployment card.

!!! tip "Code snippets"
For clients sending inference requests using libraries similar to curl or OpenAI API-compatible libraries (e.g., LangChain), you can find code snippet examples by clicking on the `Curl >_` and `LangChain >_` buttons.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/mlops/serving/deployment_external_code_snippets.png" alt="Deployment endpoint">
<figcaption>Deployment endpoint</figcaption>
</figure>
</p>

## Refreshing External User Groups

Every time an external user signs in to Hopsworks using a pre-configured [authentication method](../../../setup_installation/admin/auth.md), Hopsworks fetches the external user groups and updates the internal state accordingly. Given that groups can be added/removed from users at any time by the Identity Provider, Hopsworks needs to periodically fetch the external user groups to keep the state updated.

Therefore, external users that want to access model deployments are **required to login periodically** to ensure they are still part of the allowed groups. The timespan between logins is controlled by the configuration parameter `requireExternalUserLoginAfterHours` available during the Hopsworks installation and upgrade.

The `requireExternalUserLoginAfterHours` configuration parameter controls the ==number of hours== after which external users are required to sign in to Hopsworks to refresh their external user groups.

!!! info "Configuring `requireExternalUserLoginAfterHours`"
Allowed values are -1, 0 and greater than 0, where -1 disables the periodic login requirement and 0 disables external access completely for every model deployment.
6 changes: 5 additions & 1 deletion docs/user_guides/mlops/serving/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,8 @@ Configure the predictor to log inference requests and predictions, see the [Infe

### Troubleshooting

Inspect the model server logs to troubleshoot your model deployments, see the [Troubleshooting Guide](troubleshooting.md).
Inspect the model server logs to troubleshoot your model deployments, see the [Troubleshooting Guide](troubleshooting.md).

### External access

Grant users authenticated by an external Identity Provider access to model deployments, see the [External Access Guide](external-access.md).
32 changes: 29 additions & 3 deletions docs/user_guides/projects/jobs/notebook_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ It is possible to also set following configuration settings for a `PYTHON` job.
* `Environment`: The python environment to use
* `Container memory`: The amount of memory in MB to be allocated to the Jupyter Notebook script
* `Container cores`: The number of cores to be allocated for the Jupyter Notebook script
* `Additional files`: List of files that will be locally accessible by the application
* `Additional files`: List of files that will be locally accessible in the working directory of the application. Only recommended to use if project datasets are not mounted under `/hopsfs`.
You can always modify the arguments in the job settings.

<p align="center">
Expand Down Expand Up @@ -142,7 +142,7 @@ In this snippet we get the `JobsApi` object to get the default job configuration

```python

jobs_api = project.get_jobs_api()
jobs_api = project.get_job_api()

notebook_job_config = jobs_api.get_configuration("PYTHON")

Expand All @@ -166,7 +166,33 @@ In this code snippet, we execute the job with arguments and wait until it reache
execution = job.run(args='-p a 2 -p b 5', await_termination=True)
```

### API Reference
## Configuration
The following table describes the JSON payload returned by `jobs_api.get_configuration("PYTHON")`

| Field | Type | Description | Default |
|-------------------------|----------------|------------------------------------------------------|--------------------------|
| `type` | string | Type of the job configuration | `"pythonJobConfiguration"` |
| `appPath` | string | Project path to notebook (e.g `Resources/foo.ipynb`) | `null` |
| `environmentName` | string | Name of the python environment | `"pandas-training-pipeline"` |
| `resourceConfig.cores` | number (float) | Number of CPU cores to be allocated | `1.0` |
| `resourceConfig.memory` | number (int) | Number of MBs to be allocated | `2048` |
| `resourceConfig.gpus` | number (int) | Number of GPUs to be allocated | `0` |
| `logRedirection` | boolean | Whether logs are redirected | `true` |
| `jobType` | string | Type of job | `"PYTHON"` |


## Accessing project data
!!! notice "Recommended approach if `/hopsfs` is mounted"
If your Hopsworks installation is configured to mount the project datasets under `/hopsfs`, which it is in most cases, then please refer to this section instead of the `Additional files` property to reference file resources.

### Absolute paths
The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your notebook.

### Relative paths
The notebook's working directory is the folder it is located in. For example, if it is located in the `Resources` dataset, and you have a file named `data.csv` in that dataset, you simply access it using `data.csv`. Also, if you write a local file, for example `output.txt`, it will be saved in the `Resources` dataset.


## API Reference

[Jobs](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/jobs/)

Expand Down
44 changes: 41 additions & 3 deletions docs/user_guides/projects/jobs/pyspark_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ description: Documentation on how to configure and execute a PySpark job on Hops

All members of a project in Hopsworks can launch the following types of applications through a project's Jobs service:

- Python (*Hopsworks Enterprise only*)
- Python
- Apache Spark

Launching a job of any type is very similar process, what mostly differs between job types is
Expand Down Expand Up @@ -179,7 +179,7 @@ In this snippet we get the `JobsApi` object to get the default job configuration

```python

jobs_api = project.get_jobs_api()
jobs_api = project.get_job_api()

spark_config = jobs_api.get_configuration("PYSPARK")

Expand Down Expand Up @@ -211,7 +211,45 @@ print(f_err.read())

```

### API Reference
## Configuration
The following table describes the JSON payload returned by `jobs_api.get_configuration("PYSPARK")`

| Field | Type | Description | Default |
| ------------------------------------------ | -------------- |-----------------------------------------------------| -------------------------- |
| `type` | string | Type of the job configuration | `"sparkJobConfiguration"` |
| `appPath` | string | Project path to script (e.g `Resources/foo.py`) | `null` |
| `environmentName` | string | Name of the project spark environment | `"spark-feature-pipeline"` |
| `spark.driver.cores` | number (float) | Number of CPU cores allocated for the driver | `1.0` |
| `spark.driver.memory` | number (int) | Memory allocated for the driver (in MB) | `2048` |
| `spark.executor.instances` | number (int) | Number of executor instances | `1` |
| `spark.executor.cores` | number (float) | Number of CPU cores per executor | `1.0` |
| `spark.executor.memory` | number (int) | Memory allocated per executor (in MB) | `4096` |
| `spark.dynamicAllocation.enabled` | boolean | Enable dynamic allocation of executors | `true` |
| `spark.dynamicAllocation.minExecutors` | number (int) | Minimum number of executors with dynamic allocation | `1` |
| `spark.dynamicAllocation.maxExecutors` | number (int) | Maximum number of executors with dynamic allocation | `2` |
| `spark.dynamicAllocation.initialExecutors` | number (int) | Initial number of executors with dynamic allocation | `1` |
| `spark.blacklist.enabled` | boolean | Whether executor/node blacklisting is enabled | `false` |


## Accessing project data

### Read directly from the filesystem (recommended)

To read a dataset in your project using Spark, use the full filesystem path where the data is stored. For example, to read a CSV file named `data.csv` located in the `Resources` dataset of a project called `my_project`:

```python
df = spark.read.csv("/Projects/my_project/Resources/data.csv", header=True, inferSchema=True)
df.show()
```

### Additional files

Different file types can be attached to the spark job and made available in the `/srv/hops/artifacts` folder when the PySpark job is started. This configuration is mainly useful when you need to add additional setup, such as jars that needs to be added to the CLASSPATH.

When reading data in your Spark job it is recommended to use the Spark read API as previously demonstrated, since this reads from the filesystem directly, whereas `Additional files` configuration options will download the files in its entirety and is not a scalable option.


## API Reference

[Jobs](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/jobs/)

Expand Down
Loading