Skip to content

Commit 7198747

Browse files
committed
added docs
1 parent 489a254 commit 7198747

File tree

2 files changed

+149
-0
lines changed

2 files changed

+149
-0
lines changed

ads/feature_store/docs/source/release_notes.rst

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,43 @@
44
Release Notes
55
=============
66

7+
1.0.4
8+
-----
9+
.. note::
10+
11+
.. list-table::
12+
:header-rows: 1
13+
14+
* - Package Name
15+
- Latest Version
16+
- Notes
17+
* - Conda pack
18+
- `fs_pyspark32_p38_cpu_v2`
19+
-
20+
* - SERVICE_VERSION
21+
- 0.1.256.master
22+
-
23+
* - ADS_VERSION
24+
- oracle-ads==2.9.0rc0
25+
- `https://github.com/oracle/accelerated-data-science/releases/tag/v2.9.0rc0`
26+
* - Terraform Stack
27+
- `link <https://objectstorage.us-ashburn-1.oraclecloud.com/p/vZogtXWwHqbkGLeqyKiqBmVxdbR4MK4nyOBqDsJNVE4sHGUY5KFi4T3mOFGA3FOy/n/idogsu2ylimg/b/oci-feature-store/o/beta/terraform/feature-store-terraform.zip>`__
28+
-
29+
30+
31+
Release notes: November 15, 2023
32+
33+
* [FEATURE] Streaming data-frame support in ``FeatureGroup`` and ``Dataset`` construct
34+
* [FEATURE] Support for custom spark transformations using ``Transformation`` construct
35+
* [MAINTENANCE] Decouple feature store client with oci client
36+
* [MAINTENANCE] Upgrade of ``mlm`` version to 1.0.2
37+
* [MAINTENANCE] Upgrade of ``great-expectations`` version to 0.17.19
38+
* [UI] Addition of dataset jobs UI tab
39+
* [UI] Addition of feature group jobs UI tab
40+
* [UI] Addition of feature store landing page
41+
* [DOCS] Addition of class documentation for feature store
42+
* [CONDA] Release of feature store v2 conda pack ``fspyspark32_p38_cpu_v2``
43+
744
1.0.3
845
-----
946
.. note::

ads/feature_store/docs/source/transformation.rst

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,118 @@ Transformation
33

44
Transformations in a feature store refers to the operations and processes applied to raw data to create, modify or derive new features that can be used as inputs for ML Models. These transformations are crucial for improving the quality, relevance and usefulness of features which in turn can enhance the performance of ml models. It is an object that represents a transformation applied on the feature group and can be a pandas transformation or spark sql transformation.
55

6+
* ``TransformationMode.PANDAS``: Pandas Transformation allows users to do the transformation using native pandas functionality.
7+
* ``TransformationMode.SQL``: Spark SQL brings native support for SQL to Spark. Users generally can give the spark transformation that they wish to do using spark SQL.
8+
* ``TransformationMode.SPARK``: Spark Transformation allows users to do the transformation using native spark functionality.
9+
10+
.. tabs::
11+
12+
.. code-tab:: Python3
13+
:caption: TransformationMode.SQL
14+
15+
def transactions_df(transactions_batch):
16+
sql_query = f"select id, cc_num, amount from {transactions_batch}"
17+
return sql_query
18+
19+
transformation = (
20+
Transformation()
21+
.with_description("Feature store description")
22+
.with_compartment_id(os.environ["PROJECT_COMPARTMENT_OCID"])
23+
.with_display_name("FeatureStore")
24+
.with_feature_store_id(feature_store.id)
25+
.with_transformation_mode(TransformationMode.SQL)
26+
.with_source_code_function(transactions_df)
27+
)
28+
transformation.create()
29+
30+
.. code-tab:: Python3
31+
:caption: TransformationMode.PANDAS
32+
33+
def chained_transformation(patient_result_df, **transformation_args):
34+
def label_encoder_transformation(patient_result_df, **transformation_args):
35+
from sklearn.preprocessing import LabelEncoder
36+
# creating instance of labelencoder
37+
labelencoder = LabelEncoder()
38+
result_df = patient_result_df.copy()
39+
column_labels= transformation_args.get("label_encode_column")
40+
if isinstance(column_labels,list):
41+
for col in column_labels:
42+
result_df[col] = labelencoder.fit_transform(result_df[col])
43+
elif isinstance(column_labels, str):
44+
result_df[column_labels] = labelencoder.fit_transform(result_df[column_labels])
45+
else:
46+
return None
47+
return result_df
48+
49+
def min_max_scaler(patient_result_df, **transformation_args):
50+
from sklearn.preprocessing import MinMaxScaler
51+
final_result_df = patient_result_df.copy()
52+
scaler = MinMaxScaler(feature_range=(0, 1))
53+
column_labels= transformation_args.get("scaling_column_labels")
54+
final_result_df[column_labels] = scaler.fit_transform(final_result_df[column_labels])
55+
return patient_result_df
56+
57+
def feature_removal(input_df, **transformation_args):
58+
output_df = input_df.copy()
59+
output_df.drop(transformation_args.get("redundant_feature_label"), axis=1, inplace=True)
60+
return output_df
61+
62+
out1 = label_encoder_transformation(patient_result_df, **transformation_args)
63+
out2 = min_max_scaler(out1, **transformation_args)
64+
return feature_removal(out2, **transformation_args)
65+
66+
transformation_args = {
67+
"label_encode_column": ["SOURCE"],
68+
"scaling_column_labels": [],
69+
"redundant_feature_label": ["MCH", "MCHC", "MCV"]
70+
}
71+
72+
from ads.feature_store.transformation import Transformation,TransformationMode
73+
74+
transformation = (
75+
Transformation()
76+
.with_display_name("chained_transformation")
77+
.with_feature_store_id(feature_store.id)
78+
.with_source_code_function(chained_transformation)
79+
.with_transformation_mode(TransformationMode.PANDAS)
80+
.with_description("transformation to perform feature engineering")
81+
.with_compartment_id(compartment_id)
82+
)
83+
84+
transformation.create()
85+
86+
87+
.. code-tab:: Python3
88+
:caption: TransformationMode.SPARK
89+
90+
def credit_score_transformation(credit_score):
91+
import pyspark.sql.functions as F
92+
93+
# Create a new Spark DataFrame that contains the transformed credit score.
94+
transformed_credit_score = credit_score.select(
95+
"user_id",
96+
"date",
97+
F.when(F.col("credit_score").cast("int") > 500, 1).otherwise(0).alias("credit_score")
98+
)
99+
100+
# Return the new Spark DataFrame.
101+
return transformed_credit_score
102+
103+
from ads.feature_store.transformation import Transformation,TransformationMode
104+
105+
transformation = (
106+
Transformation()
107+
.with_display_name("spark_transformation")
108+
.with_feature_store_id(feature_store.id)
109+
.with_source_code_function(credit_score_transformation)
110+
.with_transformation_mode(TransformationMode.SPARK)
111+
.with_description("transformation to perform feature engineering")
112+
.with_compartment_id(compartment_id)
113+
)
114+
115+
transformation.create()
116+
117+
6118
Define
7119
======
8120

0 commit comments

Comments
 (0)