Skip to content

Commit cca3c72

Browse files
committed
ODSC-39261: adding quick start for AutoML model
1 parent 193311c commit cca3c72

File tree

8 files changed

+411
-406
lines changed

8 files changed

+411
-406
lines changed

docs/source/ads.model_framework.rst

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,6 @@ ads.model.framework package
44
Submodules
55
----------
66

7-
ads.model.framework.automl\_model module
8-
----------------------------------------
9-
10-
.. automodule:: ads.model.framework.automl_model
11-
:members:
12-
:undoc-members:
13-
:show-inheritance:
14-
157
ads.model.framework.lightgbm\_model module
168
------------------------------------------
179

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
You can call the ``.summary_status()`` method after a model serialization instance such as ``AutoMLModel``, ``GenericModel``, ``SklearnModel``, ``TensorFlowModel``, or ``PyTorchModel`` is created. The ``.summary_status()`` method returns a Pandas dataframe that guides you through the entire workflow. It shows which methods are available to call and which ones aren't. Plus it outlines what each method does. If extra actions are required, it also shows those actions.
1+
You can call the ``.summary_status()`` method after a model serialization instance such as ``GenericModel``, ``SklearnModel``, ``TensorFlowModel``, or ``PyTorchModel`` is created. The ``.summary_status()`` method returns a Pandas dataframe that guides you through the entire workflow. It shows which methods are available to call and which ones aren't. Plus it outlines what each method does. If extra actions are required, it also shows those actions.
22

33
The following image displays an example summary status table created after a user initiates a model instance. The table's Step column displays a Status of Done for the initiate step. And the ``Details`` column explains what the initiate step did such as generating a ``score.py`` file. The Step column also displays the ``prepare()``, ``verify()``, ``save()``, ``deploy()``, and ``predict()`` methods for the model. The Status column displays which method is available next. After the initiate step, the ``prepare()`` method is available. The next step is to call the ``prepare()`` method.

docs/source/user_guide/model_registration/frameworks/automlmodel.rst

Lines changed: 202 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -3,79 +3,13 @@
33
AutoMLModel
44
***********
55

6-
See `API Documentation <../../../ads.model_framework.html#ads.model.framework.automl_model.AutoMLModel>`__
6+
.. note::
77

8-
Overview
9-
========
8+
The ``ads.model.framework.automl_model.AutoMLModel`` class is deprecated. See this :ref:link <_Oralce_AutoMlx>`` for more detailed information.
109

11-
The ``ads.model.framework.automl_model.AutoMLModel`` class in ADS is designed to rapidly get your AutoML model into production. The ``.prepare()`` method creates the model artifacts needed to deploy the model without you having to configure it or write code. The ``.prepare()`` method serializes the model and generates a ``runtime.yaml`` and a ``score.py`` file that you can later customize.
10+
To deploy an AutoMlx model, use `GenericModel <../../../ads.model.html#ads.model.generic_model.GenericModel>`__ class.
1211

13-
.. include:: ../_template/overview.rst
14-
15-
The following steps take your trained ``AutoML`` model and deploy it into production with a few lines of code.
16-
17-
18-
**Creating an Oracle Labs AutoML Model**
19-
20-
Train a model using AutoMLx.
21-
22-
.. code-block:: python3
23-
24-
import pandas as pd
25-
import numpy as np
26-
import tempfile
27-
from sklearn.metrics import roc_auc_score, confusion_matrix, make_scorer, f1_score
28-
from sklearn.linear_model import LogisticRegression
29-
from sklearn.compose import make_column_selector as selector
30-
from sklearn.impute import SimpleImputer
31-
from sklearn.preprocessing import StandardScaler, OneHotEncoder
32-
from sklearn.compose import ColumnTransformer
33-
from sklearn.pipeline import Pipeline
34-
from sklearn.datasets import fetch_openml
35-
from sklearn.model_selection import train_test_split
36-
37-
import ads
38-
import automl
39-
from automl import init
40-
from ads.model import AutoMLModel
41-
from ads.common.model_metadata import UseCaseType
42-
from ads.model.framework.automl_model import AutoMLModel
43-
44-
dataset = fetch_openml(name='adult', as_frame=True)
45-
df, y = dataset.data, dataset.target
46-
47-
# Several of the columns are incorrectly labeled as category type in the original dataset
48-
numeric_columns = ['age', 'capitalgain', 'capitalloss', 'hoursperweek']
49-
for col in df.columns:
50-
if col in numeric_columns:
51-
df[col] = df[col].astype(int)
52-
53-
54-
X_train, X_test, y_train, y_test = train_test_split(df,
55-
y.map({'>50K': 1, '<=50K': 0}).astype(int),
56-
train_size=0.7,
57-
random_state=0)
58-
59-
init(engine='local')
60-
est = automl.Pipeline(task='classification')
61-
est.fit(X_train, y_train)
62-
63-
Initialize
64-
==========
65-
66-
Instantiate an ``AutoMLModel()`` object with an ``AutoML`` model. Each instance accepts the following parameters:
67-
68-
* ``artifact_dir: str``: Artifact directory to store the files needed for deployment.
69-
* ``auth: (Dict, optional)``: Defaults to ``None``. The default authentication is set using the ``ads.set_auth`` API. To override the default, use ``ads.common.auth.api_keys()`` or ``ads.common.auth.resource_principal()`` and create the appropriate authentication signer and the ``**kwargs`` required to instantiate the ``IdentityClient`` object.
70-
* ``estimator: (Callable)``: Trained AutoML model.
71-
* ``properties: (ModelProperties, optional)``: Defaults to ``None``. The ``ModelProperties`` object required to save and deploy a model.
72-
73-
.. include:: ../_template/initialize.rst
74-
75-
Summary Status
76-
==============
77-
78-
.. include:: ../_template/summary_status.rst
12+
The following example take your trained ``AutoML`` model using ``GenericModel`` and deploy it into production with a few lines of code.
7913

8014

8115
Example
@@ -85,23 +19,14 @@ Example
8519
8620
import pandas as pd
8721
import numpy as np
88-
import tempfile
89-
from sklearn.metrics import roc_auc_score, confusion_matrix, make_scorer, f1_score
90-
from sklearn.linear_model import LogisticRegression
91-
from sklearn.compose import make_column_selector as selector
92-
from sklearn.impute import SimpleImputer
93-
from sklearn.preprocessing import StandardScaler, OneHotEncoder
94-
from sklearn.compose import ColumnTransformer
95-
from sklearn.pipeline import Pipeline
9622
from sklearn.datasets import fetch_openml
9723
from sklearn.model_selection import train_test_split
9824
9925
import ads
10026
import automl
10127
from automl import init
102-
from ads.model import AutoMLModel
28+
from ads.model import GenericModel
10329
from ads.common.model_metadata import UseCaseType
104-
from ads.model.framework.automl_model import AutoMLModel
10530
10631
dataset = fetch_openml(name='adult', as_frame=True)
10732
df, y = dataset.data, dataset.target
@@ -112,7 +37,6 @@ Example
11237
if col in numeric_columns:
11338
df[col] = df[col].astype(int)
11439
115-
11640
X_train, X_test, y_train, y_test = train_test_split(df,
11741
y.map({'>50K': 1, '<=50K': 0}).astype(int),
11842
train_size=0.7,
@@ -123,16 +47,207 @@ Example
12347
est.fit(X_train, y_train)
12448
12549
ads.set_auth("resource_principal")
126-
artifact_dir = tempfile.mkdtemp()
127-
automl_model = AutoMLModel(estimator=model, artifact_dir=artifact_dir)
50+
automl_model = GenericModel(estimator=est, artifact_dir="automl_model_artifact")
12851
automl_model.prepare(inference_conda_env="automlx_p38_cpu_v1",
12952
training_conda_env="automlx_p38_cpu_v1",
13053
use_case_type=UseCaseType.BINARY_CLASSIFICATION,
13154
X_sample=X_test,
13255
force_overwrite=True)
133-
automl_model.verify(X_test.iloc[:2])
56+
57+
58+
Open ``automl_model_artifact/score.py`` and edit the code to instantiate the model class. The edits are highlighted -
59+
60+
.. code-block:: python3
61+
:emphasize-lines: 21,30
62+
63+
# score.py 1.0 generated by ADS 2.8.1 on 20230226_214703
64+
import json
65+
import os
66+
import sys
67+
import importlib
68+
from cloudpickle import cloudpickle
69+
from functools import lru_cache
70+
from io import StringIO
71+
import logging
72+
import sys
73+
import automl
74+
import pandas as pd
75+
import numpy as np
76+
77+
model_name = 'model.pkl'
78+
79+
"""
80+
Inference script. This script is used for prediction by scoring server when schema is known.
81+
"""
82+
83+
def init_automl_logger():
84+
logger = logging.getLogger("automl")
85+
handler = logging.StreamHandler(sys.stdout)
86+
handler.setLevel(logging.ERROR)
87+
formatter = logging.Formatter(
88+
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
89+
)
90+
handler.setFormatter(formatter)
91+
logger.addHandler(handler)
92+
automl.init(engine="local", engine_opts={"n_jobs": 1}, logger=logger)
93+
94+
95+
@lru_cache(maxsize=10)
96+
def load_model(model_file_name=model_name):
97+
"""
98+
Loads model from the serialized format
99+
100+
Returns
101+
-------
102+
model: a model instance on which predict API can be invoked
103+
"""
104+
init_automl_logger()
105+
model_dir = os.path.dirname(os.path.realpath(__file__))
106+
if model_dir not in sys.path:
107+
sys.path.insert(0, model_dir)
108+
contents = os.listdir(model_dir)
109+
if model_file_name in contents:
110+
print(f'Start loading {model_file_name} from model directory {model_dir} ...')
111+
with open(os.path.join(os.path.dirname(os.path.realpath(__file__)), model_file_name), "rb") as file:
112+
loaded_model = cloudpickle.load(file)
113+
114+
print("Model is successfully loaded.")
115+
return loaded_model
116+
else:
117+
raise Exception(f'{model_file_name} is not found in model directory {model_dir}')
118+
119+
@lru_cache(maxsize=1)
120+
def fetch_data_type_from_schema(input_schema_path=os.path.join(os.path.dirname(os.path.realpath(__file__)), "input_schema.json")):
121+
"""
122+
Returns data type information fetch from input_schema.json.
123+
124+
Parameters
125+
----------
126+
input_schema_path: path of input schema.
127+
128+
Returns
129+
-------
130+
data_type: data type fetch from input_schema.json.
131+
132+
"""
133+
data_type = {}
134+
if os.path.exists(input_schema_path):
135+
schema = json.load(open(input_schema_path))
136+
for col in schema['schema']:
137+
data_type[col['name']] = col['dtype']
138+
else:
139+
print("input_schema has to be passed in in order to recover the same data type. pass `X_sample` in `ads.model.framework.automl_model.AutoMLModel.prepare` function to generate the input_schema. Otherwise, the data type might be changed after serialization/deserialization.")
140+
return data_type
141+
142+
def deserialize(data, input_schema_path, task=None):
143+
"""
144+
Deserialize json serialization data to data in original type when sent to predict.
145+
146+
Parameters
147+
----------
148+
data: serialized input data.
149+
input_schema_path: path of input schema.
150+
task: Machine learning task, supported: classification, regression, anomaly_detection, forecasting. Defaults to None.
151+
152+
Returns
153+
-------
154+
data: deserialized input data.
155+
156+
"""
157+
158+
if isinstance(data, bytes):
159+
return pd.read_json(StringIO(data.decode("utf-8")))
160+
161+
data_type = data.get('data_type', '') if isinstance(data, dict) else ''
162+
json_data = data.get('data', data) if isinstance(data, dict) else data
163+
164+
if task and task == "forecasting":
165+
if data_type:
166+
data_type = data_type.split("'")[1]
167+
try:
168+
module, spec = ".".join(data_type.split(".")[:-1]), data_type.split(".")[-1]
169+
lib = importlib.import_module(name=module)
170+
func = getattr(lib, spec)
171+
return pd.DataFrame(index=func(json_data))
172+
except:
173+
logging.warning("Cannot autodetect the type of the input data. By default, convert input data to pd.DatetimeIndex and feed the model an empty pandas DataFrame with index as input data. If assumption is not correct, modify the score.py and check with .verify() before saving model with .save().")
174+
return pd.DataFrame(index=pd.DatetimeIndex(json_data))
175+
if "pandas.core.series.Series" in data_type:
176+
return pd.Series(json_data)
177+
if "pandas.core.frame.DataFrame" in data_type or isinstance(json_data, str):
178+
return pd.read_json(json_data, dtype=fetch_data_type_from_schema(input_schema_path))
179+
if isinstance(json_data, dict):
180+
return pd.DataFrame.from_dict(json_data)
181+
182+
return json_data
183+
184+
def pre_inference(data, input_schema_path, task=None):
185+
"""
186+
Preprocess data
187+
188+
Parameters
189+
----------
190+
data: Data format as expected by the predict API of the core estimator.
191+
input_schema_path: path of input schema.
192+
task: Machine learning task, supported: classification, regression, anomaly_detection, forecasting. Defaults to None.
193+
194+
Returns
195+
-------
196+
data: Data format after any processing.
197+
198+
"""
199+
data = deserialize(data, input_schema_path, task)
200+
return data
201+
202+
def post_inference(yhat):
203+
"""
204+
Post-process the model results
205+
206+
Parameters
207+
----------
208+
yhat: Data format after calling model.predict.
209+
210+
Returns
211+
-------
212+
yhat: Data format after any processing.
213+
214+
"""
215+
if isinstance(yhat, pd.core.frame.DataFrame):
216+
yhat = yhat.values
217+
return yhat.tolist()
218+
219+
def predict(data, model=load_model(), input_schema_path=os.path.join(os.path.dirname(os.path.realpath(__file__)), "input_schema.json")):
220+
"""
221+
Returns prediction given the model and data to predict
222+
223+
Parameters
224+
----------
225+
model: Model instance returned by load_model API
226+
data: Data format as expected by the predict API of the core estimator. For eg. in case of sckit models it could be numpy array/List of list/Pandas DataFrame
227+
input_schema_path: path of input schema.
228+
229+
Returns
230+
-------
231+
predictions: Output from scoring server
232+
Format: {'prediction': output from model.predict method}
233+
234+
"""
235+
task = model.task if hasattr(model, "task") else None
236+
features = pre_inference(data, input_schema_path, task)
237+
yhat = post_inference(
238+
model.predict(features)
239+
)
240+
return {'prediction': yhat}
241+
242+
243+
Verify score.py changes by running inference locally
244+
.. code-block:: python3
245+
246+
automl_model.verify(X_test.iloc[:2], auto_serialize_data=True)
247+
248+
Save model and Deploy the model. After it is successfully deployed, invoke the endpoint by calling .predict() function.
249+
.. code-block:: python3
250+
134251
model_id = automl_model.save(display_name='Demo AutoMLModel model')
135252
deploy = automl_model.deploy(display_name='Demo AutoMLModel deployment')
136-
automl_model.predict(X_test.iloc[:2])
137-
automl_model.delete_deployment(wait_for_completion=True)
138-
ModelCatalog(compartment_id=os.environ['NB_SESSION_COMPARTMENT_OCID']).delete_model(model_id)
253+
automl_model.predict(X_test.iloc[:2], auto_serialize_data=True)

0 commit comments

Comments
 (0)