diff --git a/docs/assets/images/guides/mlops/serving/deployment_adv_form_batcher.png b/docs/assets/images/guides/mlops/serving/deployment_adv_form_batcher.png index a19dccbcc..65081d31e 100644 Binary files a/docs/assets/images/guides/mlops/serving/deployment_adv_form_batcher.png and b/docs/assets/images/guides/mlops/serving/deployment_adv_form_batcher.png differ diff --git a/docs/assets/images/guides/mlops/serving/deployment_adv_form_kserve.png b/docs/assets/images/guides/mlops/serving/deployment_adv_form_kserve.png index 002b52dd2..1b43ca454 100644 Binary files a/docs/assets/images/guides/mlops/serving/deployment_adv_form_kserve.png and b/docs/assets/images/guides/mlops/serving/deployment_adv_form_kserve.png differ diff --git a/docs/assets/images/guides/mlops/serving/deployment_adv_form_logger.png b/docs/assets/images/guides/mlops/serving/deployment_adv_form_logger.png index c79b2066e..6c8b1ca5d 100644 Binary files a/docs/assets/images/guides/mlops/serving/deployment_adv_form_logger.png and b/docs/assets/images/guides/mlops/serving/deployment_adv_form_logger.png differ diff --git a/docs/assets/images/guides/mlops/serving/deployment_adv_form_trans.png b/docs/assets/images/guides/mlops/serving/deployment_adv_form_trans.png index 8df91a038..6b41166ea 100644 Binary files a/docs/assets/images/guides/mlops/serving/deployment_adv_form_trans.png and b/docs/assets/images/guides/mlops/serving/deployment_adv_form_trans.png differ diff --git a/docs/assets/images/guides/mlops/serving/deployment_creating.png b/docs/assets/images/guides/mlops/serving/deployment_creating.png index 9e1f69d9f..4aa460cea 100644 Binary files a/docs/assets/images/guides/mlops/serving/deployment_creating.png and b/docs/assets/images/guides/mlops/serving/deployment_creating.png differ diff --git a/docs/assets/images/guides/mlops/serving/deployment_grpc_select.png b/docs/assets/images/guides/mlops/serving/deployment_grpc_select.png index d312df058..61825bdc6 100644 Binary files a/docs/assets/images/guides/mlops/serving/deployment_grpc_select.png and b/docs/assets/images/guides/mlops/serving/deployment_grpc_select.png differ diff --git a/docs/assets/images/guides/mlops/serving/deployment_simple_form_1.png b/docs/assets/images/guides/mlops/serving/deployment_simple_form_1.png index a24a18781..d99c40e83 100644 Binary files a/docs/assets/images/guides/mlops/serving/deployment_simple_form_1.png and b/docs/assets/images/guides/mlops/serving/deployment_simple_form_1.png differ diff --git a/docs/assets/images/guides/mlops/serving/deployment_simple_form_2.png b/docs/assets/images/guides/mlops/serving/deployment_simple_form_2.png index c1150001f..4a605ffda 100644 Binary files a/docs/assets/images/guides/mlops/serving/deployment_simple_form_2.png and b/docs/assets/images/guides/mlops/serving/deployment_simple_form_2.png differ diff --git a/docs/assets/images/guides/mlops/serving/deployment_simple_form_adv_options.png b/docs/assets/images/guides/mlops/serving/deployment_simple_form_adv_options.png index edba416c4..93da4c06e 100644 Binary files a/docs/assets/images/guides/mlops/serving/deployment_simple_form_adv_options.png and b/docs/assets/images/guides/mlops/serving/deployment_simple_form_adv_options.png differ diff --git a/docs/assets/images/guides/mlops/serving/deployment_simple_form_py_pred.png b/docs/assets/images/guides/mlops/serving/deployment_simple_form_py_pred.png index 9e198e03a..873406ab3 100644 Binary files a/docs/assets/images/guides/mlops/serving/deployment_simple_form_py_pred.png and b/docs/assets/images/guides/mlops/serving/deployment_simple_form_py_pred.png differ diff --git a/docs/assets/images/guides/mlops/serving/deployment_simple_form_py_pred_env.png b/docs/assets/images/guides/mlops/serving/deployment_simple_form_py_pred_env.png index d7ca92e2e..19c80b33e 100644 Binary files a/docs/assets/images/guides/mlops/serving/deployment_simple_form_py_pred_env.png and b/docs/assets/images/guides/mlops/serving/deployment_simple_form_py_pred_env.png differ diff --git a/docs/user_guides/mlops/registry/frameworks/llm.md b/docs/user_guides/mlops/registry/frameworks/llm.md new file mode 100644 index 000000000..986dee6bf --- /dev/null +++ b/docs/user_guides/mlops/registry/frameworks/llm.md @@ -0,0 +1,62 @@ +# How To Export an LLM Model + +## Introduction + +In this guide you will learn how to export a [Large Language Model (LLM)](https://www.hopsworks.ai/dictionary/llms-large-language-models) and register it in the Model Registry. + +## Code + +### Step 1: Connect to Hopsworks + +=== "Python" + ```python + import hopsworks + + project = hopsworks.login() + + # get Hopsworks Model Registry handle + mr = project.get_model_registry() + ``` + +### Step 2: Download the LLM + +Download your base or fine-tuned LLM. LLMs can typically be downloaded using the official frameworks provided by their creators (e.g., HuggingFace, Ollama, ...) + +=== "Python" + ```python + # Download LLM (e.g., using huggingface to download Llama-3.1-8B base model) + from huggingface_hub import snapshot_download + + model_dir = snapshot_download( + "meta-llama/Llama-3.1-8B", + ignore_patterns="original/*" + ) + ``` + +### Step 3: (Optional) Fine-tune LLM + +If necessary, fine-tune your LLM with an [instruction set](https://www.hopsworks.ai/dictionary/instruction-datasets-for-fine-tuning-llms). A LLM can be fine-tuned fully or using [Parameter Efficient Fine Tuning (PEFT)](https://www.hopsworks.ai/dictionary/parameter-efficient-fine-tuning-of-llms) methods such as LoRA or QLoRA. + +=== "Python" + ```python + # Fine-tune LLM using PEFT (LoRA, QLoRA) or other methods + model_dir = ... + ``` + +### Step 4: Register model in registry + +Use the `ModelRegistry.llm.create_model(..)` function to register a model as LLM. Define a name, and attach optional metrics for your model, then invoke the `save()` function with the parameter being the path to the local directory where the model was exported to. + +=== "Python" + ```python + # Model evaluation metrics + metrics = {'f1-score': 0.8, 'perplexity': 31.62, 'bleu-score': 0.73} + + llm_model = mr.llm.create_model("llm_model", metrics=metrics) + + llm_model.save(model_dir) + ``` + +## Going Further + +You can attach an [Input Example](../input_example.md) and a [Model Schema](../model_schema.md) to your model to document the shape and type of the data the model was trained on. diff --git a/docs/user_guides/mlops/registry/frameworks/python.md b/docs/user_guides/mlops/registry/frameworks/python.md index 3895a804b..af4689c60 100644 --- a/docs/user_guides/mlops/registry/frameworks/python.md +++ b/docs/user_guides/mlops/registry/frameworks/python.md @@ -8,55 +8,54 @@ In this guide you will learn how to export a generic Python model and register i ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Registry handle -mr = project.get_model_registry() -``` + # get Hopsworks Model Registry handle + mr = project.get_model_registry() + ``` ### Step 2: Train Define your XGBoost model and run the training loop. -```python -# Define a model -model = XGBClassifier() +=== "Python" + ```python + # Define a model + model = XGBClassifier() -# Train model -model.fit(X_train, y_train) - -``` + # Train model + model.fit(X_train, y_train) + ``` ### Step 3: Export to local path Export the XGBoost model to a directory on the local filesystem. -```python - -model_file = "model.json" - -model.save_model(model_file) +=== "Python" + ```python + model_file = "model.json" -``` + model.save_model(model_file) + ``` ### Step 4: Register model in registry Use the `ModelRegistry.python.create_model(..)` function to register a model as a Python model. Define a name, and attach optional metrics for your model, then invoke the `save()` function with the parameter being the path to the local directory where the model was exported to. -```python - -# Model evaluation metrics -metrics = {'accuracy': 0.92} - -py_model = mr.python.create_model("py_model", metrics=metrics) +=== "Python" + ```python + # Model evaluation metrics + metrics = {'accuracy': 0.92} -py_model.save(model_dir) + py_model = mr.python.create_model("py_model", metrics=metrics) -``` + py_model.save(model_dir) + ``` ## Going Further -You can attach an [Input Example](../input_example.md) and a [Model Schema](../input_example.md) to your model to document the shape and type of the data the model was trained on. +You can attach an [Input Example](../input_example.md) and a [Model Schema](../model_schema.md) to your model to document the shape and type of the data the model was trained on. diff --git a/docs/user_guides/mlops/registry/frameworks/skl.md b/docs/user_guides/mlops/registry/frameworks/skl.md index 9dbcb8991..e364101a9 100644 --- a/docs/user_guides/mlops/registry/frameworks/skl.md +++ b/docs/user_guides/mlops/registry/frameworks/skl.md @@ -8,54 +8,53 @@ In this guide you will learn how to export a Scikit-learn model and register it ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Registry handle -mr = project.get_model_registry() -``` + # get Hopsworks Model Registry handle + mr = project.get_model_registry() + ``` ### Step 2: Train Define your Scikit-learn model and run the training loop. -```python -# Define a model -iris_knn = KNeighborsClassifier(..) +=== "Python" + ```python + # Define a model + iris_knn = KNeighborsClassifier(..) -iris_knn.fit(..) - -``` + iris_knn.fit(..) + ``` ### Step 3: Export to local path Export the Scikit-learn model to a directory on the local filesystem. -```python - -model_file = "skl_knn.pkl" - -joblib.dump(iris_knn, model_file) +=== "Python" + ```python + model_file = "skl_knn.pkl" -``` + joblib.dump(iris_knn, model_file) + ``` ### Step 4: Register model in registry Use the `ModelRegistry.sklearn.create_model(..)` function to register a model as a Scikit-learn model. Define a name, and attach optional metrics for your model, then invoke the `save()` function with the parameter being the path to the local directory where the model was exported to. -```python - -# Model evaluation metrics -metrics = {'accuracy': 0.92} - -skl_model = mr.sklearn.create_model("skl_model", metrics=metrics) +=== "Python" + ```python + # Model evaluation metrics + metrics = {'accuracy': 0.92} -skl_model.save(model_file) + skl_model = mr.sklearn.create_model("skl_model", metrics=metrics) -``` + skl_model.save(model_file) + ``` ## Going Further -You can attach an [Input Example](../input_example.md) and a [Model Schema](../input_example.md) to your model to document the shape and type of the data the model was trained on. +You can attach an [Input Example](../input_example.md) and a [Model Schema](../model_schema.md) to your model to document the shape and type of the data the model was trained on. diff --git a/docs/user_guides/mlops/registry/frameworks/tf.md b/docs/user_guides/mlops/registry/frameworks/tf.md index afcdacec6..d74630cbe 100644 --- a/docs/user_guides/mlops/registry/frameworks/tf.md +++ b/docs/user_guides/mlops/registry/frameworks/tf.md @@ -12,61 +12,60 @@ In this guide you will learn how to export a TensorFlow model and register it in ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Registry handle -mr = project.get_model_registry() -``` + # get Hopsworks Model Registry handle + mr = project.get_model_registry() + ``` ### Step 2: Train Define your TensorFlow model and run the training loop. -```python -# Define a model -model = tf.keras.Sequential() +=== "Python" + ```python + # Define a model + model = tf.keras.Sequential() -# Add layers -model.add(..) + # Add layers + model.add(..) -# Compile the model. -model.compile(..) - -# Train the model -model.fit(..) - -``` + # Compile the model. + model.compile(..) + + # Train the model + model.fit(..) + ``` ### Step 3: Export to local path Export the TensorFlow model to a directory on the local filesystem. -```python - -model_dir = "./model" - -tf.saved_model.save(model, model_dir) +=== "Python" + ```python + model_dir = "./model" -``` + tf.saved_model.save(model, model_dir) + ``` ### Step 4: Register model in registry Use the `ModelRegistry.tensorflow.create_model(..)` function to register a model as a TensorFlow model. Define a name, and attach optional metrics for your model, then invoke the `save()` function with the parameter being the path to the local directory where the model was exported to. -```python - -# Model evaluation metrics -metrics = {'accuracy': 0.92} - -tf_model = mr.tensorflow.create_model("tf_model", metrics=metrics) +=== "Python" + ```python + # Model evaluation metrics + metrics = {'accuracy': 0.92} -tf_model.save(model_dir) + tf_model = mr.tensorflow.create_model("tf_model", metrics=metrics) -``` + tf_model.save(model_dir) + ``` ## Going Further -You can attach an [Input Example](../input_example.md) and a [Model Schema](../input_example.md) to your model to document the shape and type of the data the model was trained on. +You can attach an [Input Example](../input_example.md) and a [Model Schema](../model_schema.md) to your model to document the shape and type of the data the model was trained on. diff --git a/docs/user_guides/mlops/registry/index.md b/docs/user_guides/mlops/registry/index.md index cb01aee24..0fe2f4088 100644 --- a/docs/user_guides/mlops/registry/index.md +++ b/docs/user_guides/mlops/registry/index.md @@ -1,6 +1,6 @@ # Model Registry Guides -Hopsworks Model Registry is a centralized repository, within an organization, to manage machine learning models. A model is the product of training a machine learning algorithm with training data. +**Hopsworks Model Registry** is a centralized repository, within an organization, to manage machine learning models. A model is the product of training a machine learning algorithm with training data. This section provides guides for creating models and publish them to the Model Registry to make them available for download for batch predictions, or deployed to serve realtime applications. @@ -13,6 +13,8 @@ Follow these framework-specific guides to export a Model to the Model Registry. * [Scikit-learn](frameworks/skl.md) +* [LLM](frameworks/llm.md) + * [Other frameworks](frameworks/python.md) diff --git a/docs/user_guides/mlops/registry/input_example.md b/docs/user_guides/mlops/registry/input_example.md index 008d73da9..18bf05e3a 100644 --- a/docs/user_guides/mlops/registry/input_example.md +++ b/docs/user_guides/mlops/registry/input_example.md @@ -12,33 +12,34 @@ In this guide you will learn how to attach an input example to a model. An input ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Registry handle -mr = project.get_model_registry() -``` + # get Hopsworks Model Registry handle + mr = project.get_model_registry() + ``` ### Step 2: Generate an input example Generate an input example which corresponds to a valid input to your model. Currently we support `pandas.DataFrame, pandas.Series, numpy.ndarray, list` to be passed as input example. -```python -import numpy as np +=== "Python" + ```python + import numpy as np -input_example = np.random.randint(0, high=256, size=784, dtype=np.uint8) - -``` + input_example = np.random.randint(0, high=256, size=784, dtype=np.uint8) + ``` ### Step 3: Set input_example parameter Set the `input_example` parameter in the `create_model` function and call `save()` to attaching it to the model and register it in the registry. -```python - -model = mr.tensorflow.create_model(name="mnist", - input_example=input_example) -model.save("./model") -``` +=== "Python" + ```python + model = mr.tensorflow.create_model(name="mnist", + input_example=input_example) + model.save("./model") + ``` diff --git a/docs/user_guides/mlops/registry/model_evaluation_images.md b/docs/user_guides/mlops/registry/model_evaluation_images.md index 4202f4630..3577d4ca5 100644 --- a/docs/user_guides/mlops/registry/model_evaluation_images.md +++ b/docs/user_guides/mlops/registry/model_evaluation_images.md @@ -12,64 +12,67 @@ In this guide, you will learn how to attach ==model evaluation images== to a mod ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Registry handle -mr = project.get_model_registry() -``` + # get Hopsworks Model Registry handle + mr = project.get_model_registry() + ``` ### Step 2: Generate model evaluation images Generate an image that visualizes model performance and evaluation metrics -```python -import seaborn -from sklearn.metrics import confusion_matrix +=== "Python" + ```python + import seaborn + from sklearn.metrics import confusion_matrix -# Predict the training data using the trained model -y_pred_train = model.predict(X_train) + # Predict the training data using the trained model + y_pred_train = model.predict(X_train) -# Predict the test data using the trained model -y_pred_test = model.predict(X_test) + # Predict the test data using the trained model + y_pred_test = model.predict(X_test) -# Calculate and print the confusion matrix for the test predictions -results = confusion_matrix(y_test, y_pred_test) + # Calculate and print the confusion matrix for the test predictions + results = confusion_matrix(y_test, y_pred_test) -# Create a DataFrame for the confusion matrix results -df_confusion_matrix = pd.DataFrame( - results, - ['True Normal', 'True Fraud'], - ['Pred Normal', 'Pred Fraud'], -) + # Create a DataFrame for the confusion matrix results + df_confusion_matrix = pd.DataFrame( + results, + ['True Normal', 'True Fraud'], + ['Pred Normal', 'Pred Fraud'], + ) -# Create a heatmap using seaborn with annotations -heatmap = seaborn.heatmap(df_confusion_matrix, annot=True) + # Create a heatmap using seaborn with annotations + heatmap = seaborn.heatmap(df_confusion_matrix, annot=True) -# Get the figure and display it -fig = heatmap.get_figure() -fig.show() -``` + # Get the figure and display it + fig = heatmap.get_figure() + fig.show() + ``` ### Step 3: Save the figure to a file inside the model directory Save the figure to a file with a common filename extension (for example, .png or .jpeg), and place it in a directory called `images` - a subdirectory of the model directory that is registered to Hopsworks. -```python -# Specify the directory name for saving the model and related artifacts -model_dir = "./model" +=== "Python" + ```python + # Specify the directory name for saving the model and related artifacts + model_dir = "./model" -# Create a subdirectory of model_dir called 'images' for saving the model evaluation images -model_images_dir = model_dir + "/images" -if not os.path.exists(model_images_dir): - os.mkdir(model_images_dir) + # Create a subdirectory of model_dir called 'images' for saving the model evaluation images + model_images_dir = model_dir + "/images" + if not os.path.exists(model_images_dir): + os.mkdir(model_images_dir) -# Save the figure to an image file in the images directory -fig.savefig(model_images_dir + "/confusion_matrix.png") + # Save the figure to an image file in the images directory + fig.savefig(model_images_dir + "/confusion_matrix.png") -# Register the model -py_model = mr.python.create_model(name="py_model") -py_model.save("./model") -``` + # Register the model + py_model = mr.python.create_model(name="py_model") + py_model.save("./model") + ``` diff --git a/docs/user_guides/mlops/registry/model_schema.md b/docs/user_guides/mlops/registry/model_schema.md index 389ff78ab..dfa2effa6 100644 --- a/docs/user_guides/mlops/registry/model_schema.md +++ b/docs/user_guides/mlops/registry/model_schema.md @@ -12,49 +12,49 @@ In this guide you will learn how to attach a model schema to your model. A model ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Registry handle -mr = project.get_model_registry() -``` + # get Hopsworks Model Registry handle + mr = project.get_model_registry() + ``` ### Step 2: Create ModelSchema Create a ModelSchema for your inputs and outputs by passing in an example that your model is trained on and a valid prediction. Currently, we support `pandas.DataFrame, pandas.Series, numpy.ndarray, list`. -```python +=== "Python" + ```python + # Import a Schema and ModelSchema definition + from hsml.utils.model_schema import ModelSchema + from hsml.utils.schema import Schema -# Import a Schema and ModelSchema definition -from hsml.utils.model_schema import ModelSchema -from hsml.utils.schema import Schema + # Model inputs for MNIST dataset + inputs = [{'type': 'uint8', 'shape': [28, 28, 1], 'description': 'grayscale representation of 28x28 MNIST images'}] -# Model inputs for MNIST dataset -inputs = [{'type': 'uint8', 'shape': [28, 28, 1], 'description': 'grayscale representation of 28x28 MNIST images'}] + # Build the input schema + input_schema = Schema(inputs) -# Build the input schema -input_schema = Schema(inputs) + # Model outputs + outputs = [{'type': 'float32', 'shape': [10]}] -# Model outputs -outputs = [{'type': 'float32', 'shape': [10]}] + # Build the output schema + output_schema = Schema(outputs) -# Build the output schema -output_schema = Schema(outputs) - -# Create ModelSchema object -model_schema = ModelSchema(input_schema=input_schema, output_schema=output_schema) - -``` + # Create ModelSchema object + model_schema = ModelSchema(input_schema=input_schema, output_schema=output_schema) + ``` ### Step 3: Set model_schema parameter Set the `model_schema` parameter in the `create_model` function and call `save()` to attaching it to the model and register it in the registry. -```python - -model = mr.tensorflow.create_model(name="mnist", - model_schema=model_schema) -model.save("./model") -``` +=== "Python" + ```python + model = mr.tensorflow.create_model(name="mnist", + model_schema=model_schema) + model.save("./model") + ``` diff --git a/docs/user_guides/mlops/serving/api-protocol.md b/docs/user_guides/mlops/serving/api-protocol.md index 4eae81f3c..684b84ad1 100644 --- a/docs/user_guides/mlops/serving/api-protocol.md +++ b/docs/user_guides/mlops/serving/api-protocol.md @@ -28,7 +28,7 @@ A simplified creation form will appear including the most common deployment fiel

- Advance options + Advance options
Advanced options. Go to advanced deployment creation form

@@ -39,16 +39,13 @@ Enabling gRPC as the API protocol for a model deployment requires KServe as the

- KServe enabled in advanced deployment form + KServe enabled in advanced deployment form
Enable KServe in the advanced deployment form

Then, you can select the API protocol to be enabled in your model deployment. -!!! info "Only one API protocol can be enabled for a model (they cannot support both gRPC and REST)" - Currently, KServe model deployments are limited to one API protocol at a time. Therefore, only one of REST or gRPC API protocols can be enabled at the same time on the same model deployment. You can change the API protocol of existing deployments. -

Select gRPC API protocol @@ -56,40 +53,44 @@ Then, you can select the API protocol to be enabled in your model deployment.

+!!! info "Only one API protocol can be enabled in a model deployment (they cannot support both gRPC and REST)" + Currently, KServe model deployments are limited to one API protocol at a time. Therefore, only one of REST or gRPC API protocols can be enabled at the same time on the same model deployment. You cannot change the API protocol of existing deployments. + Once you are done with the changes, click on `Create new deployment` at the bottom of the page to create the deployment for your model. ## Code ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Registry handle -mr = project.get_model_registry() + # get Hopsworks Model Registry handle + mr = project.get_model_registry() -# get Hopsworks Model Serving handle -ms = project.get_model_serving() -``` + # get Hopsworks Model Serving handle + ms = project.get_model_serving() + ``` ### Step 2: Create a deployment with a specific API protocol -```python - -my_model = mr.get_model("my_model", version=1) +=== "Python" + ```python + my_model = mr.get_model("my_model", version=1) -my_predictor = ms.create_predictor(my_model, - api_protocol="GRPC" # defaults to "REST" - ) -my_predictor.deploy() + my_predictor = ms.create_predictor(my_model, + api_protocol="GRPC" # defaults to "REST" + ) + my_predictor.deploy() -# or + # or -my_deployment = ms.create_deployment(my_predictor) -my_deployment.save() -``` + my_deployment = ms.create_deployment(my_predictor) + my_deployment.save() + ``` ### API Reference diff --git a/docs/user_guides/mlops/serving/deployment-state.md b/docs/user_guides/mlops/serving/deployment-state.md index 571cff6d2..35f037670 100644 --- a/docs/user_guides/mlops/serving/deployment-state.md +++ b/docs/user_guides/mlops/serving/deployment-state.md @@ -61,41 +61,42 @@ Additionally, you can find the nº of instances currently running by scrolling d ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Serving handle -ms = project.get_model_serving() -``` + # get Hopsworks Model Serving handle + ms = project.get_model_serving() + ``` ### Step 2: Retrieve an existing deployment -```python - -deployment = ms.get_deployment("mydeployment") -``` +=== "Python" + ```python + deployment = ms.get_deployment("mydeployment") + ``` ### Step 3: Inspect deployment state -```python - -state = deployment.get_state() +=== "Python" + ```python + state = deployment.get_state() -state.describe() -``` + state.describe() + ``` ### Step 4: Check nº of running instances -```python - -# nº of predictor instances -deployment.resources.describe() +=== "Python" + ```python + # nº of predictor instances + deployment.resources.describe() -# nº of transformer instances -deployment.transformer.resources.describe() -``` + # nº of transformer instances + deployment.transformer.resources.describe() + ``` ### API Reference diff --git a/docs/user_guides/mlops/serving/deployment.md b/docs/user_guides/mlops/serving/deployment.md index 84cd16243..cab182975 100644 --- a/docs/user_guides/mlops/serving/deployment.md +++ b/docs/user_guides/mlops/serving/deployment.md @@ -7,12 +7,13 @@ In this guide, you will learn how to create a new deployment for a trained model !!! warning This guide assumes that a model has already been trained and saved into the Model Registry. To learn how to create a model in the Model Registry, see [Model Registry Guide](../registry/frameworks/tf.md) -Deployments are used to unify the different components involved in making one or more trained models online and accessible to compute predictions on demand. In each deployment, there are three main components to consider: +Deployments are used to unify the different components involved in making one or more trained models online and accessible to compute predictions on demand. For each deployment, there are four concepts to consider: !!! info "" - 1. [Model artifact](#model-artifact) - 2. [Predictor](#predictor) - 3. [Transformer](#transformer) + 1. [Model files](#model-files) + 2. [Artifact files](#artifact-files) + 3. [Predictor](#predictor) + 4. [Transformer](#transformer) ## GUI @@ -40,21 +41,21 @@ After selecting the model, the rest of fields are filled automatically. We pick !!! notice "Deployment name validation rules" A valid deployment name can only contain characters a-z, A-Z and 0-9. -!!! info "Predictor script for Python models" - For Python models, you can select a custom [predictor script](#predictor) to load and run the trained model by clicking on `From project` or `Upload new file`, to choose an existing script in the project file system or upload a new script, respectively. +!!! info "Predictor script for Python models and LLMs" + For Python models and LLMs, you must select a custom [predictor script](#predictor) that loads and runs the trained model by clicking on `From project` or `Upload new file`, to choose an existing script in the project file system or upload a new script, respectively. If you prefer, change the name of the deployment, model version or [artifact version](#model-artifact). Then, click on `Create new deployment` to create the deployment for your model.

- Select the model framework + Select the model framework
Select the model framework

- Select the model + Select the model
Select the model

@@ -65,7 +66,7 @@ Optionally, you can access and adjust other parameters of the deployment configu

- Advance options + Advance options
Advanced options. Go to advanced deployment creation form

@@ -117,65 +118,85 @@ After that, click on the new deployment to access the overview page. ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Registry handle -mr = project.get_model_registry() -``` + # get Hopsworks Model Registry handle + mr = project.get_model_registry() + ``` ### Step 2: Create deployment Retrieve the trained model you want to deploy. -```python - -my_model = mr.get_model("my_model", version=1) -``` +=== "Python" + ```python + my_model = mr.get_model("my_model", version=1) + ``` #### Option A: Using the model object -```python - -my_deployment = my_model.deploy() -``` +=== "Python" + ```python + my_deployment = my_model.deploy() + ``` #### Option B: Using the Model Serving handle -```python - -# get Hopsworks Model Serving handle -ms = project.get_model_serving() +=== "Python" + ```python + # get Hopsworks Model Serving handle + ms = project.get_model_serving() -my_predictor = ms.create_predictor(my_model) -my_deployment = my_predictor.deploy() + my_predictor = ms.create_predictor(my_model) + my_deployment = my_predictor.deploy() -# or -my_deployment = ms.create_deployment(my_predictor) -my_deployment.save() -``` + # or + my_deployment = ms.create_deployment(my_predictor) + my_deployment.save() + ``` ### API Reference [Model Serving](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/model-serving/model_serving_api/) -## Model Artifact +## Model Files -A model artifact is a package containing all of the necessary files for the deployment of a model. It includes the model file(s) and/or custom scripts for loading the model (predictor script) or transforming the model inputs at inference time (the transformer script). +Model files are the files exported when a specific version of a model is saved to the model registry (see [Model Registry](../registry/index.md)). These files are ==unique for each model version, but shared across model deployments== created for the same version of the model. + +Inside a model deployment, the local path to the model files is stored in the `MODEL_FILES_PATH` environment variable (see [environment variables](../serving/predictor.md#environment-variables)). Moreover, you can explore the model files under the `/Models///Files` directory using the File Browser. + +!!! warning + All files under `/Models` are managed by Hopsworks. Changes to model files cannot be reverted and can have an impact on existing model deployments. -When a new deployment is created, a model artifact is generated in two cases: +## Artifact Files + +Artifact files are files involved in the correct startup and running of the model deployment. The most important files are the **predictor** and **transformer scripts**. The former is used to load and run the model for making predictions. The latter is typically used to transform model inputs at inference time. + +Every model deployment runs a specific version of the artifact files, commonly referred to as artifact version. ==One or more model deployments can use the same artifact version== (i.e., same predictor and transformer scripts). Artifact versions are unique for the same model version. + +When a new deployment is created, a new artifact version is generated in two cases: - the artifact version in the predictor is set to `CREATE` (see [Artifact Version](../predictor/#artifact_version)) - no model artifact with the same files has been created before. +Inside a model deployment, the local path to the artifact files is stored in the `ARTIFACT_FILES_PATH` environment variable (see [environment variables](../serving/predictor.md#environment-variables)). Moreover, you can explore the artifact files under the `/Models///Artifacts/` directory using the File Browser. + +!!! warning + All files under `/Models` are managed by Hopsworks. Changes to artifact files cannot be reverted and can have an impact on existing model deployments. + +!!! tip "Additional files" + Currently, the artifact files only include predictor and transformer scripts. Support for additional files (e.g., configuration files or other resources) is coming soon. + ## Predictor Predictors are responsible for running the model server that loads the trained model, listens to inference requests and returns prediction results. To learn more about predictors, see the [Predictor Guide](predictor.md) !!! note - Currently, only one predictor is supported in a deployment. Support for multiple predictors (the inference graphs) is coming soon. + Only one predictor is supported in a deployment. !!! info Model artifacts are assigned an incremental version number, being `0` the version reserved for model artifacts that do not contain predictor or transformer scripts (i.e., shared artifacts containing only the model files). @@ -186,19 +207,3 @@ Transformers are used to apply transformations on the model inputs before sendin !!! warning Transformers are only supported in KServe deployments. - -## Inference logger - -Inference loggers are deployment components that log inference requests into a Kafka topic for later analysis. To learn about the different logging modes, see the [Inference Logger Guide](inference-logger.md) - -## Inference batcher - -Inference batcher are deployment component that apply batching to the incoming inference requests for a better throughput-latency trade-off. To learn about the different configuration available for the inference batcher, see the [Inference Batcher Guide](inference-batcher.md). - -## Resources - -Resources include the number of replicas for the deployment as well as the resources (i.e., memory, CPU, GPU) to be allocated per replica. To learn about the different combinations available, see the [Resources Guide](resources.md). - -## API protocol - -Hopsworks supports both REST and gRPC as the API protocols to send inference requests to model deployments. In general, you use gRPC when you need lower latency inference requests. To learn more about the REST and gRPC API protocols for model deployments, see the [API Protocol Guide](api-protocol.md). diff --git a/docs/user_guides/mlops/serving/index.md b/docs/user_guides/mlops/serving/index.md index 4635f3816..1ab46e000 100644 --- a/docs/user_guides/mlops/serving/index.md +++ b/docs/user_guides/mlops/serving/index.md @@ -2,7 +2,7 @@ ## Deployment -Assuming you have already created a model in the Model Registry, a deployment can now be created to prepare a model artifact for this model and make it accessible for running predictions behind a REST or gRPC endpoint. Follow the [Deployment Creation Guide](deployment.md) to create a Deployment for your model. +Assuming you have already created a model in the [Model Registry](../registry/index.md), a deployment can now be created to prepare a model artifact for this model and make it accessible for running predictions behind a REST or gRPC endpoint. Follow the [Deployment Creation Guide](deployment.md) to create a Deployment for your model. ### Predictor @@ -12,6 +12,10 @@ Predictors are responsible for running a model server that loads a trained model Transformers are used to apply transformations on the model inputs before sending them to the predictor for making predictions using the model, see the [Transformer Guide](transformer.md). +### Resource Allocation + +Configure the resources to be allocated for predictor and transformer in a model deployment, see the [Resource Allocation Guide](resources.md). + ### Inference Batcher Configure the predictor to batch inference requests, see the [Inference Batcher Guide](inference-batcher.md). @@ -19,3 +23,7 @@ Configure the predictor to batch inference requests, see the [Inference Batcher ### Inference Logger Configure the predictor to log inference requests and predictions, see the [Inference Logger Guide](inference-logger.md). + +### Troubleshooting + +Inspect the model server logs to troubleshoot your model deployments, see the [Troubleshooting Guide](troubleshooting.md). \ No newline at end of file diff --git a/docs/user_guides/mlops/serving/inference-batcher.md b/docs/user_guides/mlops/serving/inference-batcher.md index d2994682c..665078d80 100644 --- a/docs/user_guides/mlops/serving/inference-batcher.md +++ b/docs/user_guides/mlops/serving/inference-batcher.md @@ -25,7 +25,7 @@ A simplified creation form will appear including the most common deployment fiel

- Advance options + Advance options
Advanced options. Go to advanced deployment creation form

@@ -49,48 +49,50 @@ Once you are done with the changes, click on `Create new deployment` at the bott ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Registry handle -mr = project.get_model_registry() + # get Hopsworks Model Registry handle + mr = project.get_model_registry() -# get Hopsworks Model Serving handle -ms = project.get_model_serving() -``` + # get Hopsworks Model Serving handle + ms = project.get_model_serving() + ``` ### Step 2: Define an inference logger -```python +=== "Python" + ```python + from hsml.inference_batcher import InferenceBatcher -from hsml.inference_batcher import InferenceBatcher - -my_batcher = InferenceBatcher(enabled=True, - # optional - max_batch_size=32, - max_latency=5000, # milliseconds - timeout=5 # seconds - ) -``` + my_batcher = InferenceBatcher(enabled=True, + # optional + max_batch_size=32, + max_latency=5000, # milliseconds + timeout=5 # seconds + ) + ``` ### Step 3: Create a deployment with the inference batcher -```python +=== "Python" + ```python -my_model = mr.get_model("my_model", version=1) + my_model = mr.get_model("my_model", version=1) -my_predictor = ms.create_predictor(my_model, - inference_batcher=my_batcher - ) -my_predictor.deploy() + my_predictor = ms.create_predictor(my_model, + inference_batcher=my_batcher + ) + my_predictor.deploy() -# or + # or -my_deployment = ms.create_deployment(my_predictor) -my_deployment.save() -``` + my_deployment = ms.create_deployment(my_predictor) + my_deployment.save() + ``` ### API Reference diff --git a/docs/user_guides/mlops/serving/inference-logger.md b/docs/user_guides/mlops/serving/inference-logger.md index 94116a688..809da1867 100644 --- a/docs/user_guides/mlops/serving/inference-logger.md +++ b/docs/user_guides/mlops/serving/inference-logger.md @@ -29,7 +29,7 @@ A simplified creation form will appear including the most common deployment fiel

- Advance options + Advance options
Advanced options. Go to advanced deployment creation form

@@ -42,7 +42,7 @@ If you decide to create a new topic, select the number of partitions and number

- Inference logger in advanced deployment form + Inference logger in advanced deployment form
Inference logging configuration with a new kafka topic

@@ -55,33 +55,35 @@ Once you are done with the changes, click on `Create new deployment` at the bott ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Registry handle -mr = project.get_model_registry() + # get Hopsworks Model Registry handle + mr = project.get_model_registry() -# get Hopsworks Model Serving handle -ms = project.get_model_serving() -``` + # get Hopsworks Model Serving handle + ms = project.get_model_serving() + ``` ### Step 2: Define an inference logger -```python +=== "Python" + ```python -from hsml.inference_logger import InferenceLogger -from hsml.kafka_topic import KafkaTopic + from hsml.inference_logger import InferenceLogger + from hsml.kafka_topic import KafkaTopic -new_topic = KafkaTopic(name="CREATE", - # optional - num_partitions=1, - num_replicas=1 - ) + new_topic = KafkaTopic(name="CREATE", + # optional + num_partitions=1, + num_replicas=1 + ) -my_logger = InferenceLogger(kafka_topic=new_topic, mode="ALL") -``` + my_logger = InferenceLogger(kafka_topic=new_topic, mode="ALL") + ``` !!! notice "Use dict for simpler code" Similarly, you can create the same logger with: @@ -93,20 +95,20 @@ my_logger = InferenceLogger(kafka_topic=new_topic, mode="ALL") ### Step 3: Create a deployment with the inference logger -```python +=== "Python" + ```python + my_model = mr.get_model("my_model", version=1) -my_model = mr.get_model("my_model", version=1) + my_predictor = ms.create_predictor(my_model, + inference_logger=my_logger + ) + my_predictor.deploy() -my_predictor = ms.create_predictor(my_model, - inference_logger=my_logger - ) -my_predictor.deploy() + # or -# or - -my_deployment = ms.create_deployment(my_predictor) -my_deployment.save() -``` + my_deployment = ms.create_deployment(my_predictor) + my_deployment.save() + ``` ### API Reference diff --git a/docs/user_guides/mlops/serving/predictor.md b/docs/user_guides/mlops/serving/predictor.md index 668a8abbf..2ad777ec5 100644 --- a/docs/user_guides/mlops/serving/predictor.md +++ b/docs/user_guides/mlops/serving/predictor.md @@ -12,12 +12,13 @@ Predictors are the main component of deployments. They are responsible for runni !!! info "" 1. [Model server](#model-server) 2. [Serving tool](#serving-tool) - 3. [Custom script](#custom-script) - 4. [Transformer](#transformer) - 5. [Inference Logger](#inference-logger) - 6. [Inference Batcher](#inference-batcher) - 7. [Resources](#resources) - 8. [API protocol](#api-protocol) + 3. [User-provided script](#user-provided-script) + 4. [Python environments](#python-environments) + 5. [Transformer](#transformer) + 6. [Inference Logger](#inference-logger) + 7. [Inference Batcher](#inference-batcher) + 8. [Resources](#resources) + 9. [API protocol](#api-protocol) ## GUI @@ -34,24 +35,24 @@ If you have at least one model already trained and saved in the Model Registry, Once in the deployments page, you can create a new deployment by either clicking on `New deployment` (if there are no existing deployments) or on `Create new deployment` it the top-right corner. Both options will open the deployment creation form. -### Step 2: Choose a framework +### Step 2: Choose a backend -A simplified creation form will appear, including the most common deployment fields from all available configurations. The first step is to select your model to serve, which is done by first selecting the framework the model was registered as in the model registry. +A simplified creation form will appear, including the most common deployment fields from all available configurations. The first step is to choose a ==backend== for your model deployment. The backend will filter the models shown below according to the framework that the model was registered with in the model registry. -For example if you registered the model as a TensorFlow model using `ModelRegistry.tensorflow.create_model(...)` you select `Tensorflow` in the dropdown. +For example if you registered the model as a TensorFlow model using `ModelRegistry.tensorflow.create_model(...)` you select `Tensorflow Serving` in the dropdown.

- Select the model framework -
Select the model framework
+ Select the model framework +
Select the backend

-All models registered for a specific framework will be listed in the model dropdown. +All models compatible with the selected backend will be listed in the model dropdown.

- Select the model + Select the model
Select the model

@@ -64,7 +65,7 @@ For python models, if you want to use your own [predictor script](#step-2-option

- Predictor script in the simplified deployment form + Predictor script in the simplified deployment form
Select a predictor script in the simplified deployment form

@@ -79,7 +80,7 @@ To create your own it is recommended to [clone](../../projects/python/python_env

- Predictor script in the simplified deployment form + Predictor script in the simplified deployment form
Select an environment for the predictor script

@@ -90,7 +91,7 @@ Other configuration such as the serving tool, is part of the advanced options of

- Advance options + Advance options
Advanced options. Go to advanced deployment creation form

@@ -99,7 +100,7 @@ Here, you change the [serving tool](#serving-tool) for your deployment by enabli

- KServe in advanced deployment form + KServe in advanced deployment form
KServe checkbox in the advanced deployment form

@@ -121,36 +122,72 @@ Once you are done with the changes, click on `Create new deployment` at the bott ### Step 1: Connect to Hopsworks -```python -import hopsworks - -project = hopsworks.login() +=== "Python" + ```python + import hopsworks -# get Hopsworks Model Registry handle -mr = project.get_model_registry() + project = hopsworks.login() -# get Hopsworks Model Serving handle -ms = project.get_model_serving() -``` + # get Hopsworks Model Registry handle + mr = project.get_model_registry() -### Step 2 (Optional): Implement predictor script + # get Hopsworks Model Serving handle + ms = project.get_model_serving() + ``` -=== "Python" +### Step 2 (Optional): Implement a predictor script +=== "Predictor" ``` python - class Predict(object): + class Predictor(): def __init__(self): - """ Initialization code goes here: - - Download the model artifact - - Load the model - """ + """ Initialization code goes here""" pass def predict(self, inputs): """ Serve predictions using the trained model""" pass ``` +=== "Generate (vLLM deployments only)" + ``` python + from typing import Iterable, AsyncIterator, Union + + from vllm import LLM + + from kserve.protocol.rest.openai import ( + CompletionRequest, + ChatPrompt, + ChatCompletionRequestMessage, + ) + from kserve.protocol.rest.openai.types import Completion + + class Predictor(): + + def __init__(self): + """ Initialization code goes here""" + # initialize vLLM backend + self.llm = LLM(os.environ["MODEL_FILES_PATH]) + + # initialize tokenizer if needed + # self.tokenizer = ... + + def apply_chat_template( + self, + messages: Iterable[ChatCompletionRequestMessage,], + ) -> ChatPrompt: + pass + + async def create_completion( + self, request: CompletionRequest + ) -> Union[Completion, AsyncIterator[Completion]]: + """Generate responses using the LLM""" + + # Completion: used for returning a single answer (batch) + # AsyncIterator[Completion]: used for returning a stream of answers + + pass + ``` !!! info "Jupyter magic" In a jupyter notebook, you can add `%%writefile my_predictor.py` at the top of the cell to save it as a local file. @@ -159,36 +196,36 @@ ms = project.get_model_serving() !!! info "You can also use the UI to upload your predictor script. See [above](#step-3-advanced-deployment-form)" -```python - -uploaded_file_path = dataset_api.upload("my_predictor.py", "Resources", overwrite=True) -predictor_script_path = os.path.join("/Projects", project.name, uploaded_file_path) -``` +=== "Python" + ```python + uploaded_file_path = dataset_api.upload("my_predictor.py", "Resources", overwrite=True) + predictor_script_path = os.path.join("/Projects", project.name, uploaded_file_path) + ``` ### Step 4: Define predictor -```python - -my_model = mr.get_model("my_model", version=1) +=== "Python" + ```python + my_model = mr.get_model("my_model", version=1) -my_predictor = ms.create_predictor(my_model, - # optional - model_server="PYTHON", - serving_tool="KSERVE", - script_file=predictor_script_path - ) -``` + my_predictor = ms.create_predictor(my_model, + # optional + model_server="PYTHON", + serving_tool="KSERVE", + script_file=predictor_script_path + ) + ``` ### Step 5: Create a deployment with the predictor -```python - -my_deployment = my_predictor.deploy() +=== "Python" + ```python + my_deployment = my_predictor.deploy() -# or -my_deployment = ms.create_deployment(my_predictor) -my_deployment.save() -``` + # or + my_deployment = ms.create_deployment(my_predictor) + my_deployment.save() + ``` ### API Reference @@ -196,49 +233,53 @@ my_deployment.save() ## Model Server -Hopsworks Model Serving currently supports deploying models with a Flask server for python-based models or TensorFlow Serving for TensorFlow / Keras models. Support for TorchServe for running PyTorch models is coming soon. Today, you can deploy PyTorch models as python-based models. +Hopsworks Model Serving supports deploying models with a Flask server for python-based models, TensorFlow Serving for TensorFlow / Keras models and vLLM for Large Language Models (LLMs). Today, you can deploy PyTorch models as python-based models. ??? info "Show supported model servers" - | Model Server | Supported | ML Frameworks | - | ------------------ | --------- | ------------------------------------------------ | - | Flask | ✅ | python-based (scikit-learn, xgboost, pytorch...) | - | TensorFlow Serving | ✅ | keras, tensorflow | - | TorchServe | ❌ | pytorch | + | Model Server | Supported | ML Models and Frameworks | + | ------------------ | --------- | ----------------------------------------------------------------------------------------------- | + | Flask | ✅ | python-based (scikit-learn, xgboost, pytorch...) | + | TensorFlow Serving | ✅ | keras, tensorflow | + | TorchServe | ❌ | pytorch | + | vLLM | ✅ | vLLM-supported models (see [list](https://docs.vllm.ai/en/latest/models/supported_models.html)) | ## Serving tool -In Hopsworks, model servers can be deployed in three different ways: directly on Docker, on Kubernetes deployments or using KServe inference services. -Although the same models can be deployed in either of our two serving tools (Python or KServe), the use of KServe is highly recommended. The following is a comparative table showing the features supported by each of them. +In Hopsworks, model servers are deployed on Kubernetes. There are two options for deploying models on Kubernetes: using [KServe](https://kserve.github.io/website/latest/) inference services or Kubernetes built-in deployments. ==KServe is the recommended way to deploy models in Hopsworks==. + +The following is a comparative table showing the features supported by each of them. ??? info "Show serving tools comparison" - | Feature / requirement | Docker | Kubernetes (enterprise) | KServe (enterprise) | - | ---------------------------------------------- | ------------ | ----------------------- | --------------------------- | - | Autoscaling (scale-out) | ❌ | ✅ | ✅ | - | Resource allocation | ➖ fixed | ➖ min. resources | ✅ min / max. resources | - | Inference logging | ➖ simple | ➖ simple | ✅ fine-grained | - | Inference batching | ➖ partially | ➖ partially | ✅ | - | Scale-to-zero | ❌ | ❌ | ✅ after 30s of inactivity) | - | Transformers | ❌ | ❌ | ✅ | - | Low-latency predictions | ❌ | ❌ | ✅ | - | Multiple models | ❌ | ❌ | ➖ (python-based) | - | Custom predictor required
(python-only) | ✅ | ✅ | ❌ | - -## Custom script - -Depending on the model server and serving tool used in the deployment, you can provide your own python script to load the model and make predictions. - -??? info "Show supported custom predictors" - - | Serving tool | Model server | Custom predictor script | - | ------------ | ------------------ | ----------------------- | - | Docker | Flask | ✅ (required) | - | | TensorFlow Serving | ❌ | - | Kubernetes | Flask | ✅ (required) | - | | TensorFlow Serving | ❌ | - | KServe | Flask | ✅ (only required for artifacts with multiple models) | - | | TensorFlow Serving | ❌ | + | Feature / requirement | Kubernetes (enterprise) | KServe (enterprise) | + | ----------------------------------------------------- | ----------------------- | ------------------------- | + | Autoscaling (scale-out) | ✅ | ✅ | + | Resource allocation | ➖ min. resources | ✅ min / max. resources | + | Inference logging | ➖ simple | ✅ fine-grained | + | Inference batching | ➖ partially | ✅ | + | Scale-to-zero | ❌ | ✅ after 30s of inactivity | + | Transformers | ❌ | ✅ | + | Low-latency predictions | ❌ | ✅ | + | Multiple models | ❌ | ➖ (python-based) | + | User-provided predictor required
(python-only) | ✅ | ❌ | + +## User-provided script + +Depending on the model server and serving platform used in the model deployment, you can (or need) to provide your own python script to load the model and make predictions. +This script is referred to as **predictor script**, and is included in the [artifact files](../serving/deployment.md#artifact-files) of the model deployment. + +The predictor script needs to implement a given template depending on the model server of the model deployment. See the templates in [Step 2](#step-2-optional-implement-a-predictor-script). + +??? info "Show supported user-provided predictors" + + | Serving tool | Model server | User-provided predictor script | + | ------------ | ------------------ | ---------------------------------------------------- | + | Kubernetes | Flask server | ✅ (required) | + | | TensorFlow Serving | ❌ | + | KServe | Fast API | ✅ (only required for artifacts with multiple models) | + | | TensorFlow Serving | ❌ | + | | vLLM | ✅ (required) | ### Environment variables @@ -246,19 +287,37 @@ A number of different environment variables is available in the predictor to eas ??? info "Show environment variables" - | Name | Description | - | ------------ | ------------------ | - | ARTIFACT_FILES_PATH | Local path to the model artifact files | - | DEPLOYMENT_NAME | Name of the current deployment | - | MODEL_NAME | Name of the model being served by the current deployment | - | MODEL_VERSION | Version of the model being served by the current deployment | - | ARTIFACT_VERSION | Version of the model artifact being served by the current deployment | + | Name | Description | + | ------------------- | -------------------------------------------------------------------- | + | MODEL_FILES_PATH | Local path to the model files | + | ARTIFACT_FILES_PATH | Local path to the artifact files | + | DEPLOYMENT_NAME | Name of the current deployment | + | MODEL_NAME | Name of the model being served by the current deployment | + | MODEL_VERSION | Version of the model being served by the current deployment | + | ARTIFACT_VERSION | Version of the model artifact being served by the current deployment | + +## Python environments + +Depending on the model server and serving tool used in the model deployment, you can select the Python environment where the predictor and transformer scripts will run. To create a new Python environment see [Python Environments](../../projects/python/python_env_overview.md). + +??? info "Show supported Python environments" + + | Serving tool | Model server | Editable | Predictor | Transformer | + | ------------ | ------------------ | -------- | ----------------------------------- | ------------------------------ | + | Kubernetes | Flask server | ❌ | `pandas-inference-pipeline` only | ❌ | + | | TensorFlow Serving | ❌ | (official) tensorflow serving image | ❌ | + | KServe | Fast API | ✅ | any `inference-pipeline` image | any `inference-pipeline` image | + | | TensorFlow Serving | ✅ | (official) tensorflow serving image | any `inference-pipeline` image | + | | vLLM | ✅ | `vllm-inference-pipeline` only | any `inference-pipeline` image | + +!!! note + The selected Python environment is used for both predictor and transformer. Support for selecting a different Python environment for the predictor and transformer is coming soon. ## Transformer Transformers are used to apply transformations on the model inputs before sending them to the predictor for making predictions using the model. To learn more about transformers, see the [Transformer Guide](transformer.md). -!!! warning +!!! note Transformers are only supported in KServe deployments. ## Inference logger diff --git a/docs/user_guides/mlops/serving/resources.md b/docs/user_guides/mlops/serving/resources.md index 34039fae5..d79bb7edb 100644 --- a/docs/user_guides/mlops/serving/resources.md +++ b/docs/user_guides/mlops/serving/resources.md @@ -25,7 +25,7 @@ A simplified creation form will appear including the most common deployment fiel

- Advance options + Advance options
Advanced options. Go to advanced deployment creation form

@@ -50,58 +50,61 @@ Once you are done with the changes, click on `Create new deployment` at the bott ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Registry handle -mr = project.get_model_registry() + # get Hopsworks Model Registry handle + mr = project.get_model_registry() -# get Hopsworks Model Serving handle -ms = project.get_model_serving() -``` + # get Hopsworks Model Serving handle + ms = project.get_model_serving() + ``` ### Step 2: Define the predictor resource configuration -```python -from hsml.resources import PredictorResources, Resources +=== "Python" + ```python + from hsml.resources import PredictorResources, Resources -minimum_res = Resources(cores=1, memory=128, gpus=1) -maximum_res = Resources(cores=2, memory=256, gpus=1) + minimum_res = Resources(cores=1, memory=128, gpus=1) + maximum_res = Resources(cores=2, memory=256, gpus=1) -predictor_res = PredictorResources(num_instances=1, requests=minimum_res, limits=maximum_res) -``` + predictor_res = PredictorResources(num_instances=1, requests=minimum_res, limits=maximum_res) + ``` ### Step 3 (Optional): Define the transformer resource configuration -```python -from hsml.resources import TransformerResources +=== "Python" + ```python + from hsml.resources import TransformerResources -minimum_res = Resources(cores=1, memory=128, gpus=1) -maximum_res = Resources(cores=2, memory=256, gpus=1) + minimum_res = Resources(cores=1, memory=128, gpus=1) + maximum_res = Resources(cores=2, memory=256, gpus=1) -transformer_res = TransformerResources(num_instances=2, requests=minimum_res, limits=maximum_res) -``` + transformer_res = TransformerResources(num_instances=2, requests=minimum_res, limits=maximum_res) + ``` ### Step 4: Create a deployment with the resource configuration -```python +=== "Python" + ```python + my_model = mr.get_model("my_model", version=1) -my_model = mr.get_model("my_model", version=1) + my_predictor = ms.create_predictor(my_model, + resources=predictor_res, + # transformer=Transformer(script_file, + # resources=transformer_res) + ) + my_predictor.deploy() -my_predictor = ms.create_predictor(my_model, - resources=predictor_res, - # transformer=Transformer(script_file, - # resources=transformer_res) - ) -my_predictor.deploy() + # or -# or - -my_deployment = ms.create_deployment(my_predictor) -my_deployment.save() -``` + my_deployment = ms.create_deployment(my_predictor) + my_deployment.save() + ``` ### API Reference diff --git a/docs/user_guides/mlops/serving/transformer.md b/docs/user_guides/mlops/serving/transformer.md index 31332bed7..bd851c304 100644 --- a/docs/user_guides/mlops/serving/transformer.md +++ b/docs/user_guides/mlops/serving/transformer.md @@ -4,7 +4,7 @@ In this guide, you will learn how to configure a transformer in a deployment. -Transformers are used to apply transformations on the model inputs before sending them to the predictor for making predictions using the model. They run on a built-in Flask server provided by Hopsworks and require a custom python script implementing the [Transformer class](#step-2-implement-transformer-script). +Transformers are used to apply transformations on the model inputs before sending them to the predictor for making predictions using the model. They run on a built-in Flask server provided by Hopsworks and require a user-provided python script implementing the [Transformer class](#step-2-implement-transformer-script). ???+ warning Transformers are only supported in deployments using KServe as serving tool. @@ -12,7 +12,7 @@ Transformers are used to apply transformations on the model inputs before sendin A transformer has two configurable components: !!! info "" - 1. [Custom script](#step-2-implement-transformer-script) + 1. [User-provided script](#step-2-implement-transformer-script) 5. [Resources](#resources) See examples of transformer scripts in the serving [example notebooks](https://github.com/logicalclocks/hops-examples/blob/master/notebooks/ml/serving). @@ -38,7 +38,7 @@ A simplified creation form will appear including the most common deployment fiel

- Advance options + Advance options
Advanced options. Go to advanced deployment creation form

@@ -49,7 +49,7 @@ Transformers require KServe as the serving platform for the deployment. Make sur

- KServe enabled in advanced deployment form + KServe enabled in advanced deployment form
Enable KServe in the advanced deployment form

@@ -59,7 +59,7 @@ Otherwise, you can click on `Upload new file` to upload the transformer script n

- Transformer script in advanced deployment form + Transformer script in advanced deployment form
Choose a transformer script in the advanced deployment form

@@ -86,26 +86,26 @@ Once you are done with the changes, click on `Create new deployment` at the bott ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Dataset API instance -dataset_api = project.get_dataset_api() + # get Dataset API instance + dataset_api = project.get_dataset_api() -# get Hopsworks Model Serving handle -ms = project.get_model_serving() -``` + # get Hopsworks Model Serving handle + ms = project.get_model_serving() + ``` ### Step 2: Implement transformer script -=== "Python" - +=== "Transformer" ```python - class Transformer(object): + class Transformer(): def __init__(self): - """ Initialization code goes here """ + """ Initialization code goes here""" pass def preprocess(self, inputs): @@ -124,36 +124,36 @@ ms = project.get_model_serving() !!! info "You can also use the UI to upload your transformer script. See [above](#step-3-advanced-deployment-form)" -```python - -uploaded_file_path = dataset_api.upload("my_transformer.py", "Resources", overwrite=True) -transformer_script_path = os.path.join("/Projects", project.name, uploaded_file_path) -``` +=== "Python" + ```python + uploaded_file_path = dataset_api.upload("my_transformer.py", "Resources", overwrite=True) + transformer_script_path = os.path.join("/Projects", project.name, uploaded_file_path) + ``` ### Step 4: Define a transformer -```python - -my_transformer = ms.create_transformer(script_file=uploaded_file_path) +=== "Python" + ```python + my_transformer = ms.create_transformer(script_file=uploaded_file_path) -# or + # or -from hsml.transformer import Transformer + from hsml.transformer import Transformer -my_transformer = Transformer(script_file) -``` + my_transformer = Transformer(script_file) + ``` ### Step 5: Create a deployment with the transformer -```python - -my_predictor = ms.create_predictor(transformer=my_transformer) -my_deployment = my_predictor.deploy() +=== "Python" + ```python + my_predictor = ms.create_predictor(transformer=my_transformer) + my_deployment = my_predictor.deploy() -# or -my_deployment = ms.create_deployment(my_predictor, transformer=my_transformer) -my_deployment.save() -``` + # or + my_deployment = ms.create_deployment(my_predictor, transformer=my_transformer) + my_deployment.save() + ``` ### API Reference @@ -169,10 +169,10 @@ A number of different environment variables is available in the transformer to e ??? info "Show environment variables" - | Name | Description | - | ------------ | ------------------ | - | ARTIFACT_FILES_PATH | Local path to the model artifact files | - | DEPLOYMENT_NAME | Name of the current deployment | - | MODEL_NAME | Name of the model being served by the current deployment | - | MODEL_VERSION | Version of the model being served by the current deployment | - | ARTIFACT_VERSION | Version of the model artifact being served by the current deployment | + | Name | Description | + | ------------------- | -------------------------------------------------------------------- | + | ARTIFACT_FILES_PATH | Local path to the model artifact files | + | DEPLOYMENT_NAME | Name of the current deployment | + | MODEL_NAME | Name of the model being served by the current deployment | + | MODEL_VERSION | Version of the model being served by the current deployment | + | ARTIFACT_VERSION | Version of the model artifact being served by the current deployment | diff --git a/docs/user_guides/mlops/serving/troubleshooting.md b/docs/user_guides/mlops/serving/troubleshooting.md index 18833204d..78521e79b 100644 --- a/docs/user_guides/mlops/serving/troubleshooting.md +++ b/docs/user_guides/mlops/serving/troubleshooting.md @@ -99,38 +99,38 @@ Once in the OpenSearch Dashboards, you can search for keywords, apply multiple f ### Step 1: Connect to Hopsworks -```python -import hopsworks +=== "Python" + ```python + import hopsworks -project = hopsworks.login() + project = hopsworks.login() -# get Hopsworks Model Serving handle -ms = project.get_model_serving() -``` + # get Hopsworks Model Serving handle + ms = project.get_model_serving() + ``` ### Step 2: Retrieve an existing deployment -```python - -deployment = ms.get_deployment("mydeployment") -``` +=== "Python" + ```python + deployment = ms.get_deployment("mydeployment") + ``` ### Step 3: Get current deployment's predictor state -```python - -state = deployment.get_state() +=== "Python" + ```python + state = deployment.get_state() -state.describe() -``` + state.describe() + ``` ### Step 4: Explore transient logs -```python - -deployment.get_logs(component="predictor|transformer", tail=10) -``` - +=== "Python" + ```python + deployment.get_logs(component="predictor|transformer", tail=10) + ``` ### API Reference diff --git a/docs/user_guides/mlops/vector_database/index.md b/docs/user_guides/mlops/vector_database/index.md index 5e9de0455..ab8d0d2ab 100644 --- a/docs/user_guides/mlops/vector_database/index.md +++ b/docs/user_guides/mlops/vector_database/index.md @@ -15,118 +15,114 @@ In this guide, you will learn how to create a simple recommendation application, ### Step 1: Get the OpenSearch API -```python +=== "Python" + ```python + import hopsworks -import hopsworks + project = hopsworks.login() -project = hopsworks.login() - -opensearch_api = project.get_opensearch_api() - -``` + opensearch_api = project.get_opensearch_api() + ``` ### Step 2: Configure the opensearch-py client -```python +=== "Python" + ```python + from opensearchpy import OpenSearch -from opensearchpy import OpenSearch - -client = OpenSearch(**opensearch_api.get_default_py_config()) - -``` + client = OpenSearch(**opensearch_api.get_default_py_config()) + ``` ### Step 3: Create an index Create an index to use by calling `opensearch_api.get_project_index(..)`. -```python - -knn_index_name = opensearch_api.get_project_index("demo_knn_index") - -index_body = { - "settings": { - "knn": True, - "knn.algo_param.ef_search": 100, - }, - "mappings": { - "properties": { - "my_vector1": { - "type": "knn_vector", - "dimension": 2 +=== "Python" + ```python + knn_index_name = opensearch_api.get_project_index("demo_knn_index") + + index_body = { + "settings": { + "knn": True, + "knn.algo_param.ef_search": 100, + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 2 + } } } } -} -response = client.indices.create(knn_index_name, body=index_body) + response = client.indices.create(knn_index_name, body=index_body) -print(response) - -``` + print(response) + ``` ### Step 4: Bulk ingestion of vectors Ingest 10 vectors in a bulk fashion to the index. These vectors represent the list of vectors to calculate the similarity for. -```python - -from opensearchpy.helpers import bulk -import random - -actions = [ - { - "_index": knn_index_name, - "_id": count, - "_source": { - "my_vector1": [random.uniform(0, 10), random.uniform(0, 10)], +=== "Python" + ```python + from opensearchpy.helpers import bulk + import random + + actions = [ + { + "_index": knn_index_name, + "_id": count, + "_source": { + "my_vector1": [random.uniform(0, 10), random.uniform(0, 10)], + } } - } - for count in range(0, 10) -] + for count in range(0, 10) + ] -bulk( - client, - actions, -) - -``` + bulk( + client, + actions, + ) + ``` ### Step 5: Score vector similarity Score the vector `[2.5, 3]` and find the 3 most similar vectors. -```python - -# Define the search request -query = { - "size": 3, - "query": { - "knn": { - "my_vector1": { - "vector": [2.5, 3], - "k": 3 +=== "Python" + ```python + # Define the search request + query = { + "size": 3, + "query": { + "knn": { + "my_vector1": { + "vector": [2.5, 3], + "k": 3 + } } } } -} - -# Perform the similarity search -response = client.search( - body = query, - index = knn_index_name -) -# Pretty print response -import pprint -pp = pprint.PrettyPrinter() -pp.pprint(response) + # Perform the similarity search + response = client.search( + body = query, + index = knn_index_name + ) -``` + # Pretty print response + import pprint + pp = pprint.PrettyPrinter() + pp.pprint(response) + ``` `Output` from the above script shows the score for each of the three most similar vectors that have been indexed. `[4.798869166444522, 4.069064892468535]` is the most similar vector to `[2.5, 3]` with a score of `0.1346312`. +=== "Bash" ```bash 2022-05-30 09:55:50,529 INFO: POST https://10.0.2.15:9200/my_project_demo_knn_index/_search [status:200 request:0.017s] @@ -153,8 +149,6 @@ pp.pprint(response) 'total': {'relation': 'eq', 'value': 3}}, 'timed_out': False, 'took': 9} - - ``` ### API Reference diff --git a/mkdocs.yml b/mkdocs.yml index c6d234991..e8e786a11 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -178,6 +178,7 @@ nav: - Frameworks: - TensorFlow: user_guides/mlops/registry/frameworks/tf.md - Scikit-learn: user_guides/mlops/registry/frameworks/skl.md + - LLM: user_guides/mlops/registry/frameworks/llm.md - Python: user_guides/mlops/registry/frameworks/python.md - Model Schema: user_guides/mlops/registry/model_schema.md - Input Example: user_guides/mlops/registry/input_example.md