From 20e5e6039f708c983fcaa1df7a2367d126179bac Mon Sep 17 00:00:00 2001 From: kenneth Date: Wed, 3 Jul 2024 15:19:53 +0200 Subject: [PATCH 1/8] feature logging --- .../fs/feature_view/feature_logging.md | 206 ++++++++++++++++++ 1 file changed, 206 insertions(+) create mode 100644 docs/user_guides/fs/feature_view/feature_logging.md diff --git a/docs/user_guides/fs/feature_view/feature_logging.md b/docs/user_guides/fs/feature_view/feature_logging.md new file mode 100644 index 000000000..6892fc9e0 --- /dev/null +++ b/docs/user_guides/fs/feature_view/feature_logging.md @@ -0,0 +1,206 @@ +# User Guide: Feature Logging with Feature View + +Feature logging is essential for tracking and auditing the data your models use. This guide explains how to log features and predictions, and retrieve and manage these logs with feature view in Hopsworks. + +## Logging Features and Predictions + +After you have trained a model, logging the features it uses and the predictions it makes is crucial. This helps track what data was used during inference and allows for validation of predictions later. You can log either transformed or untransformed features or both. + +### Enabling Feature Logging + +To enable logging, set `enabled_logging=True` when creating the feature view. Two feature groups will be created for storing transformed and untransformed features. The logged features will be written to the offline feature store every hour by scheduled materialization jobs which are created automatically. + +```python +feature_view = fs.create_feature_view("name", query, enabled_logging=True) +``` + +Alternatively, you can call `feature_view.enable_logging()` for an existing feature view. Or, calling `feature_view.log()` will implicitly enable logging if it is not already enabled. + +### Logging Features and Predictions + +You can log either transformed or untransformed features or both. Inference helper columns are returned and logged as untransformed features, but they are not logged as transformed features. Prediction can be optionally provided as a column in the feature DataFrame or separately in the `prediction` argument. This is useful for logging real-time features and predictions which are often in type `list`, avoiding the need to ensure feature order of the labels. + +The time of calling `feature_view.log` is automatically logged, enabling filtering by logging time when retrieving logs. + +You can also log predictions, and optionally the training dataset version and the model used for prediction. Training dataset version will also be logged if it is cached after you provide the training dataset version when calling `fv.init_serving(...)` or `fv.init_batch_scoring(...)`. + +#### Example 1: Log Features Only + +You have a DataFrame of features you want to log. + +```python +import pandas as pd + +features = pd.DataFrame({ + "feature1": [1.1, 2.2, 3.3], + "feature2": [4.4, 5.5, 6.6] +}) + +# Log features +feature_view.log(features) +``` + +#### Example 2: Log Features, Predictions, and Model + +You can also log predictions, and optionally the training dataset and the model used for prediction. + +```python +predictions = pd.DataFrame({ + "prediction": [0, 1, 0] +}) + +# Log features and predictions +feature_view.log(features, + prediction=predictions, + training_dataset_version=1, + hsml_model=Model(1, "model", version=1) +) +``` + +#### Example 3: Log Both Transformed and Untransformed Features + +**Batch Features** +```python +untransformed_df = fv.get_batch_data(transform=False) +# then apply the transformations after: +transformed_df = fv.transform(untransformed_df) +# Log untransformed features +feature_view.log(untransformed_df) +# Log transformed features +feature_view.log(transformed_df, transformed=True) +``` + +**Real-time Features** +```python +untransformed_vector = fv.get_feature_vector({"id": 1}, transform=False) +# then apply the transformations after: +transformed_vector = fv.transform(untransformed_vector) +# Log untransformed features +feature_view.log(untransformed_vector) +# Log transformed features +feature_view.log(transformed_vector, transformed=True) +``` + +## Retrieving the Log Timeline + +To audit and review the data logs, you might want to retrieve the timeline of log entries. This helps understand when data was logged and monitor the logging process. + +### Retrieve Log Timeline + +Get the latest 10 log entries. + +```python +# Retrieve the latest 10 log entries +log_timeline = feature_view.get_log_timeline(limit=10) +print(log_timeline) +``` + +## Reading Log Entries + +You may need to read specific log entries for analysis, such as entries within a particular time range or for a specific model version and training dataset version. + +### Read All Log Entries + +Read all log entries for comprehensive analysis. The output will return all values of the same primary keys instead of just the latest value. + +```python +# Read all log entries +log_entries = feature_view.read_log() +print(log_entries) +``` + +### Read Log Entries Within a Time Range + +Focus on logs within a specific time frame. You can specify `start_time` and `end_time` for filtering, but the time columns will not be returned in the DataFrame. + +```python +# Read log entries from January 2022 +log_entries = feature_view.read_log(start_time="2022-01-01", end_time="2022-01-31") +print(log_entries) +``` + +### Read Log Entries by Training Dataset Version + +Analyze logs from a particular version of the training dataset. The training dataset version column will be returned in the DataFrame. + +```python +# Read log entries of training dataset version 1 +log_entries = feature_view.read_log(training_dataset_version=1) +print(log_entries) +``` + +### Read Log Entries by HSML Model + +Analyze logs from a particular name and version of the HSML model. The HSML model column will be returned in the DataFrame. + +```python +# Read log entries of a specific HSML model +log_entries = feature_view.read_log(hsml_model=Model(1, "model", version=1)) +print(log_entries) +``` + +### Read Log Entries by Custom Filter + +Provide filters which work similarly to the filter method in the `Query` class. The filter should be part of the query in the feature view. + +```python +# Read log entries where feature1 is greater than 0 +log_entries = feature_view.read_log(filter=fg.feature1 > 0) +print(log_entries) +``` + +## Pausing and Resuming Logging + +During maintenance or updates, you might need to pause logging to save computation resources. + +### Pause Logging + +Pause the schedule of the materialization job for writing logs to the offline store. + +```python +# Pause logging +feature_view.pause_logging() +``` + +### Resume Logging + +Resume the schedule of the materialization job for writing logs to the offline store. + +```python +# Resume logging +feature_view.resume_logging() +``` + +## Materializing Logs + +Besides the scheduled materialization job, you can materialize logs from Kafka to the offline store on demand. This does not pause the scheduled job. + +### Materialize Logs + +Materialize logs and optionally wait for the process to complete. + +```python +# Materialize logs and wait for completion +materialization_result = feature_view.materialize_log(wait=True) +print(materialization_result) +``` + +## Deleting Logs + +When log data is no longer needed, you might want to delete it to free up space and maintain data hygiene. This operation deletes the feature groups and recreate a new one. Scheduled materialization job and log timeline are reset as well. + +### Delete Logs + +Remove all log entries, optionally specifying whether to delete transformed/untransformed logs. + +```python +# Delete all log entries +feature_view.delete_log() + +# Delete only transformed log entries +feature_view.delete_log(transformed=True) +``` + +## Summary + +Feature logging is a crucial part of maintaining and monitoring your machine learning workflows. By following these examples, you can effectively log, retrieve, and delete logs to keep your data pipeline robust and auditable. \ No newline at end of file From 968c3a48c24f423f3056140abc677cd85d113fa0 Mon Sep 17 00:00:00 2001 From: kenneth Date: Tue, 9 Jul 2024 13:29:49 +0200 Subject: [PATCH 2/8] transformed feature --- .../fs/feature_view/feature_logging.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/user_guides/fs/feature_view/feature_logging.md b/docs/user_guides/fs/feature_view/feature_logging.md index 6892fc9e0..a69aa4a35 100644 --- a/docs/user_guides/fs/feature_view/feature_logging.md +++ b/docs/user_guides/fs/feature_view/feature_logging.md @@ -4,25 +4,25 @@ Feature logging is essential for tracking and auditing the data your models use. ## Logging Features and Predictions -After you have trained a model, logging the features it uses and the predictions it makes is crucial. This helps track what data was used during inference and allows for validation of predictions later. You can log either transformed or untransformed features or both. +After you have trained a model, logging the features it uses and the predictions it makes is crucial. This helps track what data was used during inference and allows for validation of predictions later. You can log either transformed or/and untransformed features. ### Enabling Feature Logging -To enable logging, set `enabled_logging=True` when creating the feature view. Two feature groups will be created for storing transformed and untransformed features. The logged features will be written to the offline feature store every hour by scheduled materialization jobs which are created automatically. +To enable logging, set `logging_enabled=True` when creating the feature view. Two feature groups will be created for storing transformed and untransformed features, but they are not visible in the UI. The logged features will be written to the offline feature store every hour by scheduled materialization jobs which are created automatically. ```python -feature_view = fs.create_feature_view("name", query, enabled_logging=True) +feature_view = fs.create_feature_view("name", query, logging_enabled=True) ``` Alternatively, you can call `feature_view.enable_logging()` for an existing feature view. Or, calling `feature_view.log()` will implicitly enable logging if it is not already enabled. ### Logging Features and Predictions -You can log either transformed or untransformed features or both. Inference helper columns are returned and logged as untransformed features, but they are not logged as transformed features. Prediction can be optionally provided as a column in the feature DataFrame or separately in the `prediction` argument. This is useful for logging real-time features and predictions which are often in type `list`, avoiding the need to ensure feature order of the labels. +You can log either transformed or/and untransformed features. To get untransformed features, you can specify `transform=False` in `feature_view.get_batch_data` or `feature_view.get_feature_vector(s)`. Inference helper columns are returned along with the untransformed features. To get the transformed features, you can call `feature_view.transform_batch_data` or `feature_view.transform_feature_vector(s)`. Inference helper columns are not returned as transformed features. [link to transformed features]() -The time of calling `feature_view.log` is automatically logged, enabling filtering by logging time when retrieving logs. +You can also log predictions, and optionally the training dataset version and the model used for prediction. Prediction can be optionally provided as a column in the feature DataFrame or separately in the `prediction` argument. This is useful for logging real-time features and predictions which are often in type `list`, avoiding the need to ensure feature order of the labels. Training dataset version will also be logged if it is cached after you provide the training dataset version when calling `feature_view.init_serving(...)` or `feature_view.init_batch_scoring(...)`. -You can also log predictions, and optionally the training dataset version and the model used for prediction. Training dataset version will also be logged if it is cached after you provide the training dataset version when calling `fv.init_serving(...)` or `fv.init_batch_scoring(...)`. +The time of calling `feature_view.log` is automatically logged, enabling filtering by logging time when retrieving logs. #### Example 1: Log Features Only @@ -63,7 +63,7 @@ feature_view.log(features, ```python untransformed_df = fv.get_batch_data(transform=False) # then apply the transformations after: -transformed_df = fv.transform(untransformed_df) +transformed_df = fv.transform_batch_data(untransformed_df) # Log untransformed features feature_view.log(untransformed_df) # Log transformed features @@ -74,7 +74,7 @@ feature_view.log(transformed_df, transformed=True) ```python untransformed_vector = fv.get_feature_vector({"id": 1}, transform=False) # then apply the transformations after: -transformed_vector = fv.transform(untransformed_vector) +transformed_vector = fv.transform_feature_vector(untransformed_vector) # Log untransformed features feature_view.log(untransformed_vector) # Log transformed features @@ -187,7 +187,7 @@ print(materialization_result) ## Deleting Logs -When log data is no longer needed, you might want to delete it to free up space and maintain data hygiene. This operation deletes the feature groups and recreate a new one. Scheduled materialization job and log timeline are reset as well. +When log data is no longer needed, you might want to delete it to free up space and maintain data hygiene. This operation deletes the feature groups and recreates new ones. Scheduled materialization job and log timeline are reset as well. ### Delete Logs From a5db52b85e4774341d7e911049ea43f5be48e3c8 Mon Sep 17 00:00:00 2001 From: kenneth Date: Tue, 9 Jul 2024 13:54:17 +0200 Subject: [PATCH 3/8] fix --- docs/user_guides/fs/feature_view/feature_logging.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/user_guides/fs/feature_view/feature_logging.md b/docs/user_guides/fs/feature_view/feature_logging.md index a69aa4a35..bd5e5482f 100644 --- a/docs/user_guides/fs/feature_view/feature_logging.md +++ b/docs/user_guides/fs/feature_view/feature_logging.md @@ -18,6 +18,8 @@ Alternatively, you can call `feature_view.enable_logging()` for an existing feat ### Logging Features and Predictions +You can log features and predictions by calling `feature_view.log`. The logged features are written periodically to the offline store. If you need it to be available immediately, call `feature_view.materialize_log`. + You can log either transformed or/and untransformed features. To get untransformed features, you can specify `transform=False` in `feature_view.get_batch_data` or `feature_view.get_feature_vector(s)`. Inference helper columns are returned along with the untransformed features. To get the transformed features, you can call `feature_view.transform_batch_data` or `feature_view.transform_feature_vector(s)`. Inference helper columns are not returned as transformed features. [link to transformed features]() You can also log predictions, and optionally the training dataset version and the model used for prediction. Prediction can be optionally provided as a column in the feature DataFrame or separately in the `prediction` argument. This is useful for logging real-time features and predictions which are often in type `list`, avoiding the need to ensure feature order of the labels. Training dataset version will also be logged if it is cached after you provide the training dataset version when calling `feature_view.init_serving(...)` or `feature_view.init_batch_scoring(...)`. From 5e2e10750b72d17cbed15866ad4fcaafeeeeeba4 Mon Sep 17 00:00:00 2001 From: kenneth Date: Tue, 9 Jul 2024 13:57:33 +0200 Subject: [PATCH 4/8] add to index --- mkdocs.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/mkdocs.yml b/mkdocs.yml index a75666539..5872a45b1 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -99,6 +99,7 @@ nav: - Feature Monitoring: - Getting started: user_guides/fs/feature_view/feature_monitoring.md - Advanced guide: user_guides/fs/feature_monitoring/feature_monitoring_advanced.md + - Feature Logging: user_guides/fs/feature_view/feature_logging.md - Vector Similarity Search: user_guides/fs/vector_similarity_search.md - Compute Engines: user_guides/fs/compute_engines.md - Client Integrations: From 5d2a6a577ba07ffb5cc0c2780a81eecf8127711b Mon Sep 17 00:00:00 2001 From: kenneth Date: Tue, 16 Jul 2024 17:28:53 +0200 Subject: [PATCH 5/8] update transformed features --- docs/user_guides/fs/feature_view/feature_logging.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/user_guides/fs/feature_view/feature_logging.md b/docs/user_guides/fs/feature_view/feature_logging.md index bd5e5482f..0e90c1584 100644 --- a/docs/user_guides/fs/feature_view/feature_logging.md +++ b/docs/user_guides/fs/feature_view/feature_logging.md @@ -20,7 +20,7 @@ Alternatively, you can call `feature_view.enable_logging()` for an existing feat You can log features and predictions by calling `feature_view.log`. The logged features are written periodically to the offline store. If you need it to be available immediately, call `feature_view.materialize_log`. -You can log either transformed or/and untransformed features. To get untransformed features, you can specify `transform=False` in `feature_view.get_batch_data` or `feature_view.get_feature_vector(s)`. Inference helper columns are returned along with the untransformed features. To get the transformed features, you can call `feature_view.transform_batch_data` or `feature_view.transform_feature_vector(s)`. Inference helper columns are not returned as transformed features. [link to transformed features]() +You can log either transformed or/and untransformed features. To get untransformed features, you can specify `transform=False` in `feature_view.get_batch_data` or `feature_view.get_feature_vector(s)`. Inference helper columns are returned along with the untransformed features. If you have On-Demand features as well, call `feature_view.compute_on_demand_features` to get the on demand features before calling `feature_view.log`.To get the transformed features, you can call `feature_view.transform` and pass the untransformed feature with the on-demand feature. You can also log predictions, and optionally the training dataset version and the model used for prediction. Prediction can be optionally provided as a column in the feature DataFrame or separately in the `prediction` argument. This is useful for logging real-time features and predictions which are often in type `list`, avoiding the need to ensure feature order of the labels. Training dataset version will also be logged if it is cached after you provide the training dataset version when calling `feature_view.init_serving(...)` or `feature_view.init_batch_scoring(...)`. @@ -65,7 +65,7 @@ feature_view.log(features, ```python untransformed_df = fv.get_batch_data(transform=False) # then apply the transformations after: -transformed_df = fv.transform_batch_data(untransformed_df) +transformed_df = fv.transform(untransformed_df) # Log untransformed features feature_view.log(untransformed_df) # Log transformed features @@ -76,7 +76,7 @@ feature_view.log(transformed_df, transformed=True) ```python untransformed_vector = fv.get_feature_vector({"id": 1}, transform=False) # then apply the transformations after: -transformed_vector = fv.transform_feature_vector(untransformed_vector) +transformed_vector = fv.transform(untransformed_vector) # Log untransformed features feature_view.log(untransformed_vector) # Log transformed features From 950ed054046bb86f1240e18fdc3209eb8e7abfa6 Mon Sep 17 00:00:00 2001 From: kenneth Date: Tue, 16 Jul 2024 18:07:51 +0200 Subject: [PATCH 6/8] address comments --- .../fs/feature_view/feature_logging.md | 32 +++++++++---------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/docs/user_guides/fs/feature_view/feature_logging.md b/docs/user_guides/fs/feature_view/feature_logging.md index 0e90c1584..0b00930f5 100644 --- a/docs/user_guides/fs/feature_view/feature_logging.md +++ b/docs/user_guides/fs/feature_view/feature_logging.md @@ -1,10 +1,10 @@ -# User Guide: Feature Logging with Feature View +# User Guide: Feature and Prediction Logging with a Feature View -Feature logging is essential for tracking and auditing the data your models use. This guide explains how to log features and predictions, and retrieve and manage these logs with feature view in Hopsworks. +Feature logging is essential for debugging, monitoring, and auditing the data your models use. This guide explains how to log features and predictions, and retrieve and manage these logs with feature view in Hopsworks. ## Logging Features and Predictions -After you have trained a model, logging the features it uses and the predictions it makes is crucial. This helps track what data was used during inference and allows for validation of predictions later. You can log either transformed or/and untransformed features. +After you have trained a model, you can log the features it uses and the predictions with the feature view used to create the training data for the model. You can log either transformed or/and untransformed features values. ### Enabling Feature Logging @@ -14,7 +14,7 @@ To enable logging, set `logging_enabled=True` when creating the feature view. Tw feature_view = fs.create_feature_view("name", query, logging_enabled=True) ``` -Alternatively, you can call `feature_view.enable_logging()` for an existing feature view. Or, calling `feature_view.log()` will implicitly enable logging if it is not already enabled. +Alternatively, you can enable logging on an existing feature view by calling `feature_view.enable_logging()`. Also, calling `feature_view.log()` will implicitly enable logging if it has not already been enabled. ### Logging Features and Predictions @@ -22,9 +22,9 @@ You can log features and predictions by calling `feature_view.log`. The logged f You can log either transformed or/and untransformed features. To get untransformed features, you can specify `transform=False` in `feature_view.get_batch_data` or `feature_view.get_feature_vector(s)`. Inference helper columns are returned along with the untransformed features. If you have On-Demand features as well, call `feature_view.compute_on_demand_features` to get the on demand features before calling `feature_view.log`.To get the transformed features, you can call `feature_view.transform` and pass the untransformed feature with the on-demand feature. -You can also log predictions, and optionally the training dataset version and the model used for prediction. Prediction can be optionally provided as a column in the feature DataFrame or separately in the `prediction` argument. This is useful for logging real-time features and predictions which are often in type `list`, avoiding the need to ensure feature order of the labels. Training dataset version will also be logged if it is cached after you provide the training dataset version when calling `feature_view.init_serving(...)` or `feature_view.init_batch_scoring(...)`. +Predictions can be optionally provided as one or more columns in the DataFrame containing the features or separately in the `predictions` argument. There must be the same number of prediction columns as there are labels in the feature view. It is required to provide predictions in the `predictions` argument if you provide the features as `list` instead of pandas `dataframe`. The training dataset version will also be logged if you have called either `feature_view.init_serving(...)` or `feature_view.init_batch_scoring(...)`. -The time of calling `feature_view.log` is automatically logged, enabling filtering by logging time when retrieving logs. +The wallclock time of calling `feature_view.log` is automatically logged, enabling filtering by logging time when retrieving logs. #### Example 1: Log Features Only @@ -85,11 +85,11 @@ feature_view.log(transformed_vector, transformed=True) ## Retrieving the Log Timeline -To audit and review the data logs, you might want to retrieve the timeline of log entries. This helps understand when data was logged and monitor the logging process. +To audit and review the feature/prediction logs, you might want to retrieve the timeline of log entries. This helps understand when data was logged and monitor the logs. ### Retrieve Log Timeline -Get the latest 10 log entries. +A log timeline is the hudi commit timeline of the logging feature group. ```python # Retrieve the latest 10 log entries @@ -101,7 +101,7 @@ print(log_timeline) You may need to read specific log entries for analysis, such as entries within a particular time range or for a specific model version and training dataset version. -### Read All Log Entries +### Read all Log Entries Read all log entries for comprehensive analysis. The output will return all values of the same primary keys instead of just the latest value. @@ -111,9 +111,9 @@ log_entries = feature_view.read_log() print(log_entries) ``` -### Read Log Entries Within a Time Range +### Read Log Entries within a Time Range -Focus on logs within a specific time frame. You can specify `start_time` and `end_time` for filtering, but the time columns will not be returned in the DataFrame. +Focus on logs within a specific time range. You can specify `start_time` and `end_time` for filtering, but the time columns will not be returned in the DataFrame. You can provide the `start/end_time` as `datetime`, `date`, `int`, or `str` type. Accepted date format are: `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f` ```python # Read log entries from January 2022 @@ -131,17 +131,17 @@ log_entries = feature_view.read_log(training_dataset_version=1) print(log_entries) ``` -### Read Log Entries by HSML Model +### Read Log Entries by Model in Hopsworks Analyze logs from a particular name and version of the HSML model. The HSML model column will be returned in the DataFrame. ```python # Read log entries of a specific HSML model -log_entries = feature_view.read_log(hsml_model=Model(1, "model", version=1)) +log_entries = feature_view.read_log(hopsworks_model=Model(1, "model", version=1)) print(log_entries) ``` -### Read Log Entries by Custom Filter +### Read Log Entries using a Custom Filter Provide filters which work similarly to the filter method in the `Query` class. The filter should be part of the query in the feature view. @@ -193,7 +193,7 @@ When log data is no longer needed, you might want to delete it to free up space ### Delete Logs -Remove all log entries, optionally specifying whether to delete transformed/untransformed logs. +Remove all log entries (both transformed and untransformed logs), optionally specifying whether to delete transformed (transformed=True) or untransformed (transformed=False) logs. ```python # Delete all log entries @@ -205,4 +205,4 @@ feature_view.delete_log(transformed=True) ## Summary -Feature logging is a crucial part of maintaining and monitoring your machine learning workflows. By following these examples, you can effectively log, retrieve, and delete logs to keep your data pipeline robust and auditable. \ No newline at end of file +Feature logging is a crucial part of maintaining and monitoring your machine learning workflows. By following these examples, you can effectively log, retrieve, and delete logs, as well as manage the lifecycle of log materialization jobs, adding observability for your AI system and making it auditable. \ No newline at end of file From 67b0bdd57093bd6e51a715aa8e0d0bbc30a047f3 Mon Sep 17 00:00:00 2001 From: kenneth Date: Mon, 29 Jul 2024 16:41:11 +0200 Subject: [PATCH 7/8] update --- docs/user_guides/fs/feature_view/feature_logging.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/user_guides/fs/feature_view/feature_logging.md b/docs/user_guides/fs/feature_view/feature_logging.md index 0b00930f5..1433679aa 100644 --- a/docs/user_guides/fs/feature_view/feature_logging.md +++ b/docs/user_guides/fs/feature_view/feature_logging.md @@ -22,7 +22,7 @@ You can log features and predictions by calling `feature_view.log`. The logged f You can log either transformed or/and untransformed features. To get untransformed features, you can specify `transform=False` in `feature_view.get_batch_data` or `feature_view.get_feature_vector(s)`. Inference helper columns are returned along with the untransformed features. If you have On-Demand features as well, call `feature_view.compute_on_demand_features` to get the on demand features before calling `feature_view.log`.To get the transformed features, you can call `feature_view.transform` and pass the untransformed feature with the on-demand feature. -Predictions can be optionally provided as one or more columns in the DataFrame containing the features or separately in the `predictions` argument. There must be the same number of prediction columns as there are labels in the feature view. It is required to provide predictions in the `predictions` argument if you provide the features as `list` instead of pandas `dataframe`. The training dataset version will also be logged if you have called either `feature_view.init_serving(...)` or `feature_view.init_batch_scoring(...)`. +Predictions can be optionally provided as one or more columns in the DataFrame containing the features or separately in the `predictions` argument. There must be the same number of prediction columns as there are labels in the feature view. It is required to provide predictions in the `predictions` argument if you provide the features as `list` instead of pandas `dataframe`. The training dataset version will also be logged if you have called either `feature_view.init_serving(...)` or `feature_view.init_batch_scoring(...)` or if the provided model has a training dataset version. The wallclock time of calling `feature_view.log` is automatically logged, enabling filtering by logging time when retrieving logs. @@ -63,7 +63,7 @@ feature_view.log(features, **Batch Features** ```python -untransformed_df = fv.get_batch_data(transform=False) +untransformed_df = fv.get_batch_data(transformed=False) # then apply the transformations after: transformed_df = fv.transform(untransformed_df) # Log untransformed features @@ -137,7 +137,7 @@ Analyze logs from a particular name and version of the HSML model. The HSML mode ```python # Read log entries of a specific HSML model -log_entries = feature_view.read_log(hopsworks_model=Model(1, "model", version=1)) +log_entries = feature_view.read_log(model=Model(1, "model", version=1)) print(log_entries) ``` @@ -175,7 +175,7 @@ feature_view.resume_logging() ## Materializing Logs -Besides the scheduled materialization job, you can materialize logs from Kafka to the offline store on demand. This does not pause the scheduled job. +Besides the scheduled materialization job, you can materialize logs from Kafka to the offline store on demand. This does not pause the scheduled job. By default, it materializes both transformed and untransformed logs, optionally specifying whether to materialize transformed (transformed=True) or untransformed (transformed=False) logs. ### Materialize Logs @@ -184,7 +184,8 @@ Materialize logs and optionally wait for the process to complete. ```python # Materialize logs and wait for completion materialization_result = feature_view.materialize_log(wait=True) -print(materialization_result) +# Materialize only transformed log entries +feature_view.materialize_log(wait=True, transformed=True) ``` ## Deleting Logs From cbdd1f9cc44e017c69b11a6c32fd00226911192f Mon Sep 17 00:00:00 2001 From: kenneth Date: Tue, 30 Jul 2024 08:24:36 +0200 Subject: [PATCH 8/8] update code --- docs/user_guides/fs/feature_view/feature_logging.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/user_guides/fs/feature_view/feature_logging.md b/docs/user_guides/fs/feature_view/feature_logging.md index 1433679aa..b4afb4e5c 100644 --- a/docs/user_guides/fs/feature_view/feature_logging.md +++ b/docs/user_guides/fs/feature_view/feature_logging.md @@ -53,9 +53,9 @@ predictions = pd.DataFrame({ # Log features and predictions feature_view.log(features, - prediction=predictions, + predictions=predictions, training_dataset_version=1, - hsml_model=Model(1, "model", version=1) + model=Model(1, "model", version=1) ) ```