Skip to content

Commit e1223d2

Browse files
authored
[FSTORE-1424] Feature logging (logicalclocks#396)
* feature logging * transformed feature * fix * add to index * update transformed features * address comments * update * update code
1 parent 1b6b21b commit e1223d2

File tree

2 files changed

+210
-0
lines changed

2 files changed

+210
-0
lines changed
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# User Guide: Feature and Prediction Logging with a Feature View
2+
3+
Feature logging is essential for debugging, monitoring, and auditing the data your models use. This guide explains how to log features and predictions, and retrieve and manage these logs with feature view in Hopsworks.
4+
5+
## Logging Features and Predictions
6+
7+
After you have trained a model, you can log the features it uses and the predictions with the feature view used to create the training data for the model. You can log either transformed or/and untransformed features values.
8+
9+
### Enabling Feature Logging
10+
11+
To enable logging, set `logging_enabled=True` when creating the feature view. Two feature groups will be created for storing transformed and untransformed features, but they are not visible in the UI. The logged features will be written to the offline feature store every hour by scheduled materialization jobs which are created automatically.
12+
13+
```python
14+
feature_view = fs.create_feature_view("name", query, logging_enabled=True)
15+
```
16+
17+
Alternatively, you can enable logging on an existing feature view by calling `feature_view.enable_logging()`. Also, calling `feature_view.log()` will implicitly enable logging if it has not already been enabled.
18+
19+
### Logging Features and Predictions
20+
21+
You can log features and predictions by calling `feature_view.log`. The logged features are written periodically to the offline store. If you need it to be available immediately, call `feature_view.materialize_log`.
22+
23+
You can log either transformed or/and untransformed features. To get untransformed features, you can specify `transform=False` in `feature_view.get_batch_data` or `feature_view.get_feature_vector(s)`. Inference helper columns are returned along with the untransformed features. If you have On-Demand features as well, call `feature_view.compute_on_demand_features` to get the on demand features before calling `feature_view.log`.To get the transformed features, you can call `feature_view.transform` and pass the untransformed feature with the on-demand feature.
24+
25+
Predictions can be optionally provided as one or more columns in the DataFrame containing the features or separately in the `predictions` argument. There must be the same number of prediction columns as there are labels in the feature view. It is required to provide predictions in the `predictions` argument if you provide the features as `list` instead of pandas `dataframe`. The training dataset version will also be logged if you have called either `feature_view.init_serving(...)` or `feature_view.init_batch_scoring(...)` or if the provided model has a training dataset version.
26+
27+
The wallclock time of calling `feature_view.log` is automatically logged, enabling filtering by logging time when retrieving logs.
28+
29+
#### Example 1: Log Features Only
30+
31+
You have a DataFrame of features you want to log.
32+
33+
```python
34+
import pandas as pd
35+
36+
features = pd.DataFrame({
37+
"feature1": [1.1, 2.2, 3.3],
38+
"feature2": [4.4, 5.5, 6.6]
39+
})
40+
41+
# Log features
42+
feature_view.log(features)
43+
```
44+
45+
#### Example 2: Log Features, Predictions, and Model
46+
47+
You can also log predictions, and optionally the training dataset and the model used for prediction.
48+
49+
```python
50+
predictions = pd.DataFrame({
51+
"prediction": [0, 1, 0]
52+
})
53+
54+
# Log features and predictions
55+
feature_view.log(features,
56+
predictions=predictions,
57+
training_dataset_version=1,
58+
model=Model(1, "model", version=1)
59+
)
60+
```
61+
62+
#### Example 3: Log Both Transformed and Untransformed Features
63+
64+
**Batch Features**
65+
```python
66+
untransformed_df = fv.get_batch_data(transformed=False)
67+
# then apply the transformations after:
68+
transformed_df = fv.transform(untransformed_df)
69+
# Log untransformed features
70+
feature_view.log(untransformed_df)
71+
# Log transformed features
72+
feature_view.log(transformed_df, transformed=True)
73+
```
74+
75+
**Real-time Features**
76+
```python
77+
untransformed_vector = fv.get_feature_vector({"id": 1}, transform=False)
78+
# then apply the transformations after:
79+
transformed_vector = fv.transform(untransformed_vector)
80+
# Log untransformed features
81+
feature_view.log(untransformed_vector)
82+
# Log transformed features
83+
feature_view.log(transformed_vector, transformed=True)
84+
```
85+
86+
## Retrieving the Log Timeline
87+
88+
To audit and review the feature/prediction logs, you might want to retrieve the timeline of log entries. This helps understand when data was logged and monitor the logs.
89+
90+
### Retrieve Log Timeline
91+
92+
A log timeline is the hudi commit timeline of the logging feature group.
93+
94+
```python
95+
# Retrieve the latest 10 log entries
96+
log_timeline = feature_view.get_log_timeline(limit=10)
97+
print(log_timeline)
98+
```
99+
100+
## Reading Log Entries
101+
102+
You may need to read specific log entries for analysis, such as entries within a particular time range or for a specific model version and training dataset version.
103+
104+
### Read all Log Entries
105+
106+
Read all log entries for comprehensive analysis. The output will return all values of the same primary keys instead of just the latest value.
107+
108+
```python
109+
# Read all log entries
110+
log_entries = feature_view.read_log()
111+
print(log_entries)
112+
```
113+
114+
### Read Log Entries within a Time Range
115+
116+
Focus on logs within a specific time range. You can specify `start_time` and `end_time` for filtering, but the time columns will not be returned in the DataFrame. You can provide the `start/end_time` as `datetime`, `date`, `int`, or `str` type. Accepted date format are: `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`
117+
118+
```python
119+
# Read log entries from January 2022
120+
log_entries = feature_view.read_log(start_time="2022-01-01", end_time="2022-01-31")
121+
print(log_entries)
122+
```
123+
124+
### Read Log Entries by Training Dataset Version
125+
126+
Analyze logs from a particular version of the training dataset. The training dataset version column will be returned in the DataFrame.
127+
128+
```python
129+
# Read log entries of training dataset version 1
130+
log_entries = feature_view.read_log(training_dataset_version=1)
131+
print(log_entries)
132+
```
133+
134+
### Read Log Entries by Model in Hopsworks
135+
136+
Analyze logs from a particular name and version of the HSML model. The HSML model column will be returned in the DataFrame.
137+
138+
```python
139+
# Read log entries of a specific HSML model
140+
log_entries = feature_view.read_log(model=Model(1, "model", version=1))
141+
print(log_entries)
142+
```
143+
144+
### Read Log Entries using a Custom Filter
145+
146+
Provide filters which work similarly to the filter method in the `Query` class. The filter should be part of the query in the feature view.
147+
148+
```python
149+
# Read log entries where feature1 is greater than 0
150+
log_entries = feature_view.read_log(filter=fg.feature1 > 0)
151+
print(log_entries)
152+
```
153+
154+
## Pausing and Resuming Logging
155+
156+
During maintenance or updates, you might need to pause logging to save computation resources.
157+
158+
### Pause Logging
159+
160+
Pause the schedule of the materialization job for writing logs to the offline store.
161+
162+
```python
163+
# Pause logging
164+
feature_view.pause_logging()
165+
```
166+
167+
### Resume Logging
168+
169+
Resume the schedule of the materialization job for writing logs to the offline store.
170+
171+
```python
172+
# Resume logging
173+
feature_view.resume_logging()
174+
```
175+
176+
## Materializing Logs
177+
178+
Besides the scheduled materialization job, you can materialize logs from Kafka to the offline store on demand. This does not pause the scheduled job. By default, it materializes both transformed and untransformed logs, optionally specifying whether to materialize transformed (transformed=True) or untransformed (transformed=False) logs.
179+
180+
### Materialize Logs
181+
182+
Materialize logs and optionally wait for the process to complete.
183+
184+
```python
185+
# Materialize logs and wait for completion
186+
materialization_result = feature_view.materialize_log(wait=True)
187+
# Materialize only transformed log entries
188+
feature_view.materialize_log(wait=True, transformed=True)
189+
```
190+
191+
## Deleting Logs
192+
193+
When log data is no longer needed, you might want to delete it to free up space and maintain data hygiene. This operation deletes the feature groups and recreates new ones. Scheduled materialization job and log timeline are reset as well.
194+
195+
### Delete Logs
196+
197+
Remove all log entries (both transformed and untransformed logs), optionally specifying whether to delete transformed (transformed=True) or untransformed (transformed=False) logs.
198+
199+
```python
200+
# Delete all log entries
201+
feature_view.delete_log()
202+
203+
# Delete only transformed log entries
204+
feature_view.delete_log(transformed=True)
205+
```
206+
207+
## Summary
208+
209+
Feature logging is a crucial part of maintaining and monitoring your machine learning workflows. By following these examples, you can effectively log, retrieve, and delete logs, as well as manage the lifecycle of log materialization jobs, adding observability for your AI system and making it auditable.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ nav:
9999
- Feature Monitoring:
100100
- Getting started: user_guides/fs/feature_view/feature_monitoring.md
101101
- Advanced guide: user_guides/fs/feature_monitoring/feature_monitoring_advanced.md
102+
- Feature Logging: user_guides/fs/feature_view/feature_logging.md
102103
- Vector Similarity Search: user_guides/fs/vector_similarity_search.md
103104
- Compute Engines: user_guides/fs/compute_engines.md
104105
- Client Integrations:

0 commit comments

Comments
 (0)