Skip to content

Commit f229684

Browse files
authored
validation output changes (#255)
2 parents bce5a0c + d759bc0 commit f229684

13 files changed

+126
-34
lines changed

ads/feature_store/dataset.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -887,11 +887,7 @@ def get_validation_output(self, job_id: str = None) -> "ValidationOutput":
887887
validation_output = (
888888
output_details.get("validationOutput") if output_details else None
889889
)
890-
validation_output_json = (
891-
json.loads(validation_output) if validation_output else None
892-
)
893-
894-
return ValidationOutput(validation_output_json)
890+
return ValidationOutput(validation_output)
895891

896892
@classmethod
897893
def list_df(cls, compartment_id: str = None, **kwargs) -> "pandas.DataFrame":

ads/feature_store/docs/source/dataset.rst

Lines changed: 44 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -119,28 +119,60 @@ With a Dataset instance, we can get the last dataset job details using ``get_las
119119

120120
.. code-block:: python3
121121
122-
# Fetch validation results for a dataset
123122
dataset_job = dataset.get_last_job()
124-
df = dataset_job.get_validation_output().to_dataframe()
125-
df.show()
126123
127124
Save expectation entity
128125
=======================
129-
Feature store allows you to define expectations on data being materialized into feature group instance. With a ``FeatureGroup`` instance, we can save the expectation entity using ``save_expectation()``
130-
131-
132-
.. image:: figures/validation.png
126+
Feature store allows you to define expectations on data being materialized into dataset instance.With a ``Dataset`` instance, You can save the expectation details using ``with_expectation_suite()`` with parameters
133127

134-
The ``.save_expectation()`` method takes the following optional parameter:
135-
136-
- ``expectation: Expectation``. Expectation of great expectation
128+
- ``expectation_suite: ExpectationSuite``. ExpectationSuit of great expectation
137129
- ``expectation_type: ExpectationType``. Type of expectation
138130
- ``ExpectationType.STRICT``: Fail the job if expectation not met
139131
- ``ExpectationType.LENIENT``: Pass the job even if expectation not met
140132

133+
.. note::
134+
135+
Great Expectations is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
136+
137+
.. image:: figures/validation.png
138+
141139
.. code-block:: python3
142140
143-
feature_group.save_expectation(expectation_suite, expectation_type="STRICT")
141+
expectation_suite = ExpectationSuite(
142+
expectation_suite_name="expectation_suite_name"
143+
)
144+
expectation_suite.add_expectation(
145+
ExpectationConfiguration(
146+
expectation_type="expect_column_values_to_not_be_null",
147+
kwargs={"column": "<column>"},
148+
)
149+
150+
dataset_resource = (
151+
Dataset()
152+
.with_description("dataset description")
153+
.with_compartment_id(<compartment_id>)
154+
.with_name(<name>)
155+
.with_entity_id(entity_id)
156+
.with_feature_store_id(feature_store_id)
157+
.with_query(f"SELECT * FROM `{entity_id}`.{feature_group_name}")
158+
.with_expectation_suite(
159+
expectation_suite=expectation_suite,
160+
expectation_type=ExpectationType.STRICT,
161+
)
162+
)
163+
164+
You can call the ``get_validation_output()`` method of the Dataset instance to fetch validation results for a specific ingestion job.
165+
The ``get_validation_output()`` method takes the following optional parameter:
166+
167+
- ``job_id: string``. Id of dataset job
168+
169+
``get_validation_output().to_pandas()`` will output the validation results for each expectation as pandas dataframe
170+
171+
.. image:: figures/dataset_validation_results.png
172+
173+
``get_validation_output().to_summary()`` will output the overall summary of validation as pandas dataframe.
174+
175+
.. image:: figures/dataset_validation_summary.png
144176

145177
.. seealso::
146178

@@ -186,7 +218,7 @@ The ``get_statistics()`` method takes the following optional parameter:
186218
# Fetch stats results for a dataset job
187219
df = dataset.get_statistics(job_id).to_pandas()
188220
189-
.. image:: figures/stats_1.png
221+
.. image:: figures/dataset_statistics.png
190222

191223
.. seealso::
192224

ads/feature_store/docs/source/feature_group.rst

Lines changed: 41 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -152,22 +152,55 @@ Feature store provides an API similar to Pandas to join feature groups together
152152
153153
Save expectation entity
154154
=======================
155-
Feature store allows you to define expectations on data being materialized into feature group instance. With a ``FeatureGroup`` instance, we can save the expectation entity using ``save_expectation()``
155+
With a ``FeatureGroup`` instance, You can save the expectation details using ``with_expectation_suite()`` with parameters
156156

157-
158-
.. image:: figures/validation.png
159-
160-
The ``.save_expectation()`` method takes the following optional parameter:
161-
162-
- ``expectation: Expectation``. Expectation of great expectation
157+
- ``expectation_suite: ExpectationSuite``. ExpectationSuit of great expectation
163158
- ``expectation_type: ExpectationType``. Type of expectation
164159
- ``ExpectationType.STRICT``: Fail the job if expectation not met
165160
- ``ExpectationType.LENIENT``: Pass the job even if expectation not met
166161

162+
.. note::
163+
164+
Great Expectations is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
165+
166+
.. image:: figures/validation.png
167+
167168
.. code-block:: python3
168169
169-
feature_group.save_expectation(expectation_suite, expectation_type="STRICT")
170+
expectation_suite = ExpectationSuite(
171+
expectation_suite_name="expectation_suite_name"
172+
)
173+
expectation_suite.add_expectation(
174+
ExpectationConfiguration(
175+
expectation_type="expect_column_values_to_not_be_null",
176+
kwargs={"column": "<column>"},
177+
)
178+
179+
feature_group_resource = (
180+
FeatureGroup()
181+
.with_feature_store_id(feature_store.id)
182+
.with_primary_keys(["<key>"])
183+
.with_name("<name>")
184+
.with_entity_id(entity.id)
185+
.with_compartment_id(<compartment_id>)
186+
.with_schema_details_from_dataframe(<datframe>)
187+
.with_expectation_suite(
188+
expectation_suite=expectation_suite,
189+
expectation_type=ExpectationType.STRICT,
190+
)
191+
)
192+
193+
You can call the ``get_validation_output()`` method of the FeatureGroup instance to fetch validation results for a specific ingestion job.
194+
The ``get_validation_output()`` method takes the following optional parameter:
195+
196+
- ``job_id: string``. Id of feature group job
197+
``get_validation_output().to_pandas()`` will output the validation results for each expectation as pandas dataframe
198+
199+
.. image:: figures/validation_results.png
200+
201+
``get_validation_output().to_summary()`` will output the overall summary of validation as pandas dataframe.
170202

203+
.. image:: figures/validation_summary.png
171204
.. seealso::
172205

173206
:ref:`Feature Validation`

ads/feature_store/docs/source/feature_validation.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Feature validation is the process of checking the quality and accuracy of the fe
77
Feature store allows you to define expectation on the data which is being materialized into feature group and dataset. This is achieved using open source library Great Expectations.
88

99
.. note::
10-
`Great Expectations <https://docs.greatexpectations.io/docs/>`_ is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
10+
`Great Expectations <https://docs.greatexpectations.io/docs/0.15.50/>`_ is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
1111

1212

1313
Expectations
@@ -50,5 +50,7 @@ Expectation Suite is a collection of verifiable assertions i.e. expectations abo
5050
.. code-block:: python3
5151
5252
# Create an Expectation Suite
53-
suite = context.add_expectation_suite(expectation_suite_name="example_suite")
54-
suite.add_expectation(expect_config)
53+
expectation_suite = ExpectationSuite(
54+
expectation_suite_name=<expectation_suite_name>
55+
)
56+
expectation_suite.add_expectation(expect_config)
Loading
Loading
Loading
Loading
Loading

ads/feature_store/feature_group.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1288,11 +1288,7 @@ def get_validation_output(self, job_id: str = None) -> "ValidationOutput":
12881288
output_details.get("validationOutput") if output_details else None
12891289
)
12901290

1291-
validation_output_json = (
1292-
json.loads(validation_output) if validation_output else None
1293-
)
1294-
1295-
return ValidationOutput(validation_output_json)
1291+
return ValidationOutput(validation_output)
12961292

12971293
def __getattr__(self, name):
12981294
try:

0 commit comments

Comments
 (0)