Skip to content

Commit 5b00331

Browse files
committed
stats and validation doc update
1 parent 5950ac1 commit 5b00331

File tree

2 files changed

+39
-31
lines changed

2 files changed

+39
-31
lines changed

ads/feature_store/docs/source/feature_validation.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Feature Validation
44
*************
55

66
Feature validation is the process of checking the quality and accuracy of the features used in a machine learning model. This is important because features that are not accurate or reliable can lead to poor model performance.
7-
Feature store allows you to define expectation on the data which is being materialized into feature group & dataset. This is achieved using open source library Great Expectations.
7+
Feature store allows you to define expectation on the data which is being materialized into feature group and dataset. This is achieved using open source library Great Expectations.
88

99
.. note::
1010
`Great Expectations <https://docs.greatexpectations.io/docs/>`_ is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve communication about data between teams. Software developers have long known that automated testing is essential for managing complex codebases.
@@ -51,4 +51,4 @@ Expectation Suite is a collection of verifiable assertions i.e. expectations abo
5151
5252
# Create an Expectation Suite
5353
suite = context.add_expectation_suite(expectation_suite_name="example_suite")
54-
suite.add_expectation(expect_config)
54+
suite.add_expectation(expect_config)

ads/feature_store/docs/source/statistics.rst

Lines changed: 37 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -12,32 +12,40 @@ to derive insights about the data quality. These statistical metrics are compute
1212

1313
The statistical metrics that are computed by feature store depend on the feature type.
1414

15-
Metrics for categorical data
16-
17-
- Count
18-
- TopKFrequentElements
19-
- TypeMetric
20-
- DuplicateCount
21-
- Mode
22-
- DistinctCount
23-
24-
Metrics for numerical data
25-
26-
- Skewness
27-
- StandardDeviation
28-
- Min
29-
- IsConstantFeature
30-
- IQR
31-
- Range
32-
- ProbabilityDistribution
33-
- Variance
34-
- TypeMetric
35-
- FrequencyDistribution
36-
- Count
37-
- Max
38-
- DistinctCount
39-
- Sum
40-
- IsQuasiConstantFeature
41-
- Quartiles
42-
- Mean
43-
- Kurtosis
15+
+------------------------+-----------------------+
16+
| Numerical Metrics | Categorical Metrics |
17+
+========================+=======================+
18+
| Skewness | Count |
19+
+------------------------+-----------------------+
20+
| StandardDeviation | TopKFrequentElements |
21+
+------------------------+-----------------------+
22+
| Min | TypeMetric |
23+
+------------------------+-----------------------+
24+
| IsConstantFeature | DuplicateCount |
25+
+------------------------+-----------------------+
26+
| IQR | Mode |
27+
+------------------------+-----------------------+
28+
| Range | DistinctCount |
29+
+------------------------+-----------------------+
30+
| ProbabilityDistribution| |
31+
+------------------------+-----------------------+
32+
| Variance | |
33+
+------------------------+-----------------------+
34+
| FrequencyDistribution | |
35+
+------------------------+-----------------------+
36+
| Count | |
37+
+------------------------+-----------------------+
38+
| Max | |
39+
+------------------------+-----------------------+
40+
| DistinctCount | |
41+
+------------------------+-----------------------+
42+
| Sum | |
43+
+------------------------+-----------------------+
44+
| IsQuasiConstantFeature | |
45+
+------------------------+-----------------------+
46+
| Quartiles | |
47+
+------------------------+-----------------------+
48+
| Mean | |
49+
+------------------------+-----------------------+
50+
| Kurtosis | |
51+
+------------------------+-----------------------+

0 commit comments

Comments
 (0)