You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ads/feature_store/docs/source/feature_group.rst
+16-13Lines changed: 16 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -128,19 +128,18 @@ Materialise Stream
128
128
You can call the ``materialise_stream() -> FeatureGroupJob`` method of the ``FeatureGroup`` instance to load the streaming data to feature group. To persist the feature_group and save feature_group data along the metadata in the feature store, call the ``materialise_stream()``
129
129
130
130
The ``.materialise_stream()`` method takes the following parameter:
131
-
- ``input_dataframe``: Features in Streaming Dataframe to be saved.
132
-
- ``query_name``: It is possible to optionally specify a name for the query to make it easier to recognise in the Spark UI. Defaults to ``None``.
133
-
- ``ingestion_mode``: Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
134
-
- ``"append"``: Only the new rows in the streaming DataFrame/Dataset will be written to the sink. If the query doesn’t contain aggregations, it will be equivalent to
135
-
- append mode. Defaults to ``"append"``.
136
-
- ``"complete"``: All the rows in the streaming DataFrame/Dataset will be written to the sink every time there is some update.
137
-
- ``"update"``: only the rows that were updated in the streaming DataFrame/Dataset will be written to the sink every time there are some updates.
138
-
- ``await_termination``: Waits for the termination of this query, either by query.stop() or by an exception. If the query has terminated with an exception, then the exception will be thrown. If timeout is set, it returns whether the query has terminated or not within the timeout seconds. Defaults to ``False``.
139
-
- ``timeout``: Only relevant in combination with ``await_termination=True``.
140
-
- Defaults to ``None``.
141
-
- ``checkpoint_dir``: Checkpoint directory location. This will be used to as a reference to from where to resume the streaming job. If ``None`` then hsfs will construct as "insert_stream_" + online_topic_name. Defaults to ``None``.
142
-
- ``write_options``: Additional write options for Spark as key-value pairs.
143
-
- Defaults to ``{}``.
131
+
- ``input_dataframe``: Features in Streaming Dataframe to be saved.
132
+
- ``query_name``: It is possible to optionally specify a name for the query to make it easier to recognise in the Spark UI. Defaults to ``None``.
133
+
- ``ingestion_mode``: Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
134
+
- ``append``: Only the new rows in the streaming DataFrame/Dataset will be written to the sink. If the query doesn’t contain aggregations, it will be equivalent to append mode. Defaults to ``"append"``.
135
+
- ``complete``: All the rows in the streaming DataFrame/Dataset will be written to the sink every time there is some update.
136
+
- ``update``: only the rows that were updated in the streaming DataFrame/Dataset will be written to the sink every time there are some updates.
137
+
- ``await_termination``: Waits for the termination of this query, either by ``query.stop()`` or by an exception. If the query has terminated with an exception, then the exception will be thrown. If timeout is set, it returns whether the query has terminated or not within the timeout seconds. Defaults to ``False``.
138
+
- ``timeout``: Only relevant in combination with ``await_termination=True``.
139
+
- Defaults to ``None``.
140
+
- ``checkpoint_dir``: Checkpoint directory location. This will be used to as a reference to from where to resume the streaming job. Defaults to ``None``.
141
+
- ``write_options``: Additional write options for Spark as key-value pairs.
142
+
- Defaults to ``{}``.
144
143
145
144
.. seealso::
146
145
:ref:`Feature Group Job`
@@ -200,6 +199,9 @@ With a ``FeatureGroup`` instance, You can save the expectation details using ``w
200
199
.. image:: figures/validation.png
201
200
202
201
.. code-block:: python3
202
+
from great_expectations.core import ExpectationSuite, ExpectationConfiguration
203
+
from ads.feature_store.common.enums import TransformationMode, ExpectationType
204
+
from ads.feature_store.feature_group import FeatureGroup
203
205
204
206
expectation_suite = ExpectationSuite(
205
207
expectation_suite_name="expectation_suite_name"
@@ -248,6 +250,7 @@ feature group or it can be updated later as well.
248
250
.. code-block:: python3
249
251
250
252
# Define statistics configuration for selected features
253
+
from ads.feature_store.statistics_config import StatisticsConfig
0 commit comments