Skip to content

Commit 666baf1

Browse files
committed
adding examples
1 parent 65cbcd1 commit 666baf1

File tree

2 files changed

+299
-3
lines changed

2 files changed

+299
-3
lines changed

docs/source/user_guide/model_registration/frameworks/genericmodel.rst

Lines changed: 298 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,3 +111,301 @@ You can also use the shortcut ``.prepare_save_deploy()`` instead of calling ``.p
111111
112112
# To delete the deployed endpoint uncomment the line below
113113
# model.delete_deployment(wait_for_completion=True)
114+
115+
116+
Example -- CatBoost
117+
===================
118+
119+
Here is a more realistic example using CatBoost model.
120+
121+
.. code-block:: python3
122+
123+
import tempfile
124+
import ads
125+
from ads.model.generic_model import GenericModel
126+
from catboost import CatBoostRegressor
127+
128+
ads.set_auth(auth="resource_principal")
129+
130+
# Initialize data
131+
132+
X_train = [[1, 4, 5, 6],
133+
[4, 5, 6, 7],
134+
[30, 40, 50, 60]]
135+
136+
X_test = [[2, 4, 6, 8],
137+
[1, 4, 50, 60]]
138+
139+
y_train = [10, 20, 30]
140+
141+
# Initialize CatBoostRegressor
142+
catboost_estimator = CatBoostRegressor(iterations=2,
143+
learning_rate=1,
144+
depth=2)
145+
# Train a CatBoostRegressor model
146+
catboost_estimator.fit(X_train, y_train)
147+
148+
# Get predictions
149+
preds = catboost_estimator.predict(X_test)
150+
151+
# Instantiate ads.model.generic_model.GenericModel using the trained Custom Model using the trained CatBoost Classifier model
152+
catboost_model = GenericModel(estimator=catboost_estimator,
153+
artifact_dir=tempfile.mkdtemp(),
154+
model_save_serializer="cloudpickle",
155+
model_input_serializer="json")
156+
157+
# Autogenerate score.py, pickled model, runtime.yaml, input_schema.json and output_schema.json
158+
catboost_model.prepare(
159+
inference_conda_env="oci://bucket@namespace/path/to/your/conda/pack",
160+
inference_python_version="your_python_version",
161+
X_sample=X_train,
162+
y_sample=y_train,
163+
)
164+
165+
# Verify generated artifacts. Payload looks like this: [[2, 4, 6, 8], [1, 4, 50, 60]]
166+
catboost_model.verify(X_test, auto_serialize_data=True)
167+
168+
# Register CatBoostRegressor model
169+
model_id = catboost_model.save(display_name="CatBoost Model")
170+
catboost_model.deploy()
171+
catboost_model.predict(X_test)
172+
catboost_model.delete_deployment(wait_for_completion=True)
173+
ModelCatalog(compartment_id=os.environ['NB_SESSION_COMPARTMENT_OCID']).delete_model(model_id)
174+
175+
You can also use the shortcut ``.prepare_save_deploy()`` instead of calling ``.prepare()``, ``.save()`` and ``.deploy()`` seperately.
176+
177+
.. code-block:: python3
178+
179+
import tempfile
180+
from ads.catalog.model import ModelCatalog
181+
from ads.model.generic_model import GenericModel
182+
183+
class Toy:
184+
def predict(self, x):
185+
return x ** 2
186+
estimator = Toy()
187+
188+
model = GenericModel(estimator=estimator)
189+
model.summary_status()
190+
# If you are running the code inside a notebook session and using a service pack, `inference_conda_env` can be omitted.
191+
model.prepare_save_deploy(inference_conda_env="dataexpl_p37_cpu_v3")
192+
model.verify(2)
193+
model.predict(2)
194+
model.delete_deployment(wait_for_completion=True)
195+
ModelCatalog(compartment_id=os.environ['NB_SESSION_COMPARTMENT_OCID']).delete_model(model.model_id)
196+
197+
198+
Example -- Save Your Own Model
199+
==============================
200+
201+
By default, the ``serialize`` in ``GenericModel`` class is True, and it will serialize the model using cloudpickle. However, you can set ``serialize=False`` to disable it. And serialize the model on your own. You just need to copy the serialized model into the ``.artifact_dir``. This example shows step by step how you can do that.
202+
The example is illustrated using an AutoMLx model.
203+
204+
.. code-block:: python3
205+
206+
import automl
207+
import ads
208+
from automl import init
209+
from sklearn.datasets import fetch_openml
210+
from sklearn.model_selection import train_test_split
211+
from ads.model import GenericModel
212+
213+
dataset = fetch_openml(name='adult', as_frame=True)
214+
df, y = dataset.data, dataset.target
215+
216+
# Several of the columns are incorrectly labeled as category type in the original dataset
217+
numeric_columns = ['age', 'capitalgain', 'capitalloss', 'hoursperweek']
218+
for col in df.columns:
219+
if col in numeric_columns:
220+
df[col] = df[col].astype(int)
221+
222+
223+
X_train, X_test, y_train, y_test = train_test_split(df,
224+
y.map({'>50K': 1, '<=50K': 0}).astype(int),
225+
train_size=0.7,
226+
random_state=0)
227+
228+
X_train.shape, X_test.shape
229+
230+
# create a AutoMLx model
231+
init(engine='local')
232+
233+
est = automl.Pipeline(task='classification')
234+
est.fit(X_train, y_train)
235+
236+
# Authentication
237+
ads.set_auth(auth="resource_principal")
238+
239+
# Serialize your model. You can choose your own way to serialize your model.
240+
import cloudpickle
241+
with open("./model.pkl", "wb") as f:
242+
cloudpickle.dump(est, f)
243+
244+
model = GenericModel(est, artifact_dir = "model_artifact_folder", serialize=False)
245+
model.prepare(inference_conda_env="automlx_p38_cpu_v1",force_overwrite=True, model_file_name="model.pkl", X_sample=X_test)
246+
247+
Now copy the model.pkl file and paste into the ``model_artifact_folder`` folder. And open the score.py in the ``model_artifact_folder`` folder and add implement the ``load_model`` function. You can also edit ``pre_inference`` and ``post_inference`` function. Below is an example implementation of the score.py.
248+
Replace your score.py with the code below.
249+
250+
.. code-block:: python3
251+
:emphasize-lines: 28, 29, 30, 31, 122
252+
253+
# score.py 1.0 generated by ADS 2.8.2 on 20230301_065458
254+
import os
255+
import sys
256+
import json
257+
from functools import lru_cache
258+
259+
model_name = 'model.pkl'
260+
261+
262+
"""
263+
Inference script. This script is used for prediction by scoring server when schema is known.
264+
"""
265+
266+
@lru_cache(maxsize=10)
267+
def load_model(model_file_name=model_name):
268+
"""
269+
Loads model from the serialized format
270+
271+
Returns
272+
-------
273+
model: a model instance on which predict API can be invoked
274+
"""
275+
model_dir = os.path.dirname(os.path.realpath(__file__))
276+
if model_dir not in sys.path:
277+
sys.path.insert(0, model_dir)
278+
contents = os.listdir(model_dir)
279+
if model_file_name in contents:
280+
import cloudpickle
281+
with open(os.path.join(model_dir, model_name), "rb") as f:
282+
model = cloudpickle.load(f)
283+
return model
284+
else:
285+
raise Exception(f'{model_file_name} is not found in model directory {model_dir}')
286+
287+
@lru_cache(maxsize=1)
288+
def fetch_data_type_from_schema(input_schema_path=os.path.join(os.path.dirname(os.path.realpath(__file__)), "input_schema.json")):
289+
"""
290+
Returns data type information fetch from input_schema.json.
291+
292+
Parameters
293+
----------
294+
input_schema_path: path of input schema.
295+
296+
Returns
297+
-------
298+
data_type: data type fetch from input_schema.json.
299+
300+
"""
301+
data_type = {}
302+
if os.path.exists(input_schema_path):
303+
schema = json.load(open(input_schema_path))
304+
for col in schema['schema']:
305+
data_type[col['name']] = col['dtype']
306+
else:
307+
print("input_schema has to be passed in in order to recover the same data type. pass `X_sample` in `ads.model.framework.sklearn_model.SklearnModel.prepare` function to generate the input_schema. Otherwise, the data type might be changed after serialization/deserialization.")
308+
return data_type
309+
310+
def deserialize(data, input_schema_path):
311+
"""
312+
Deserialize json serialization data to data in original type when sent to predict.
313+
314+
Parameters
315+
----------
316+
data: serialized input data.
317+
input_schema_path: path of input schema.
318+
319+
Returns
320+
-------
321+
data: deserialized input data.
322+
323+
"""
324+
325+
import pandas as pd
326+
import numpy as np
327+
import base64
328+
from io import BytesIO
329+
if isinstance(data, bytes):
330+
return data
331+
332+
data_type = data.get('data_type', '') if isinstance(data, dict) else ''
333+
json_data = data.get('data', data) if isinstance(data, dict) else data
334+
335+
if "numpy.ndarray" in data_type:
336+
load_bytes = BytesIO(base64.b64decode(json_data.encode('utf-8')))
337+
return np.load(load_bytes, allow_pickle=True)
338+
if "pandas.core.series.Series" in data_type:
339+
return pd.Series(json_data)
340+
if "pandas.core.frame.DataFrame" in data_type or isinstance(json_data, str):
341+
return pd.read_json(json_data, dtype=fetch_data_type_from_schema(input_schema_path))
342+
if isinstance(json_data, dict):
343+
return pd.DataFrame.from_dict(json_data)
344+
return json_data
345+
346+
def pre_inference(data, input_schema_path):
347+
"""
348+
Preprocess data
349+
350+
Parameters
351+
----------
352+
data: Data format as expected by the predict API of the core estimator.
353+
input_schema_path: path of input schema.
354+
355+
Returns
356+
-------
357+
data: Data format after any processing.
358+
359+
"""
360+
return deserialize(data, input_schema_path)
361+
362+
def post_inference(yhat):
363+
"""
364+
Post-process the model results
365+
366+
Parameters
367+
----------
368+
yhat: Data format after calling model.predict.
369+
370+
Returns
371+
-------
372+
yhat: Data format after any processing.
373+
374+
"""
375+
return yhat.tolist()
376+
377+
def predict(data, model=load_model(), input_schema_path=os.path.join(os.path.dirname(os.path.realpath(__file__)), "input_schema.json")):
378+
"""
379+
Returns prediction given the model and data to predict
380+
381+
Parameters
382+
----------
383+
model: Model instance returned by load_model API.
384+
data: Data format as expected by the predict API of the core estimator. For eg. in case of sckit models it could be numpy array/List of list/Pandas DataFrame.
385+
input_schema_path: path of input schema.
386+
387+
Returns
388+
-------
389+
predictions: Output from scoring server
390+
Format: {'prediction': output from model.predict method}
391+
392+
"""
393+
features = pre_inference(data, input_schema_path)
394+
yhat = post_inference(
395+
model.predict(features)
396+
)
397+
return {'prediction': yhat}
398+
399+
Save the score.py and now call verify to check if it works locally.
400+
401+
.. code-block:: python3
402+
403+
model.verify(X_test.iloc[:2], auto_serialize_data=True)
404+
405+
After verify run successfully, you can save the model to model catalog, deploy and call predict to invoke the endpoint.
406+
407+
.. code-block:: python3
408+
409+
model_id = model.save(display_name='Demo AutoMLModel model')
410+
deploy = model.deploy(display_name='Demo AutoMLModel deployment')
411+
model.predict(X_test.iloc[:2].to_json())

docs/source/user_guide/model_serialization/genericmodel.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -139,12 +139,10 @@ By default, the ``GenericModel`` serializes to a pickle file. The following exam
139139
.. code-block:: python3
140140
141141
import tempfile
142-
143142
import ads
144143
from ads.model.generic_model import GenericModel
145144
from catboost import CatBoostRegressor
146145
147-
148146
ads.set_auth(auth="resource_principal")
149147
150148
# Initialize data
@@ -182,7 +180,7 @@ By default, the ``GenericModel`` serializes to a pickle file. The following exam
182180
y_sample=y_train,
183181
)
184182
185-
# Verify generated artifacts
183+
# Verify generated artifacts. Payload looks like this: [[2, 4, 6, 8], [1, 4, 50, 60]]
186184
catboost_model.verify(X_test, auto_serialize_data=True)
187185
188186
# Register CatBoostRegressor model

0 commit comments

Comments
 (0)