Skip to content

Commit 4975b2a

Browse files
committed
Merge branch '2.8.2' of github.com:oracle/accelerated-data-science into TC-23836/release_note
2 parents 92bfdf5 + 692b45d commit 4975b2a

19 files changed

+850
-51
lines changed

docs/source/ads.model.framework.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,14 @@ ads.model.framework.xgboost\_model module
6060
:undoc-members:
6161
:show-inheritance:
6262

63+
ads.model.framework.huggingface\_model module
64+
-----------------------------------------
65+
66+
.. automodule:: ads.model.framework.huggingface_model
67+
:members:
68+
:undoc-members:
69+
:show-inheritance:
70+
6371
Module contents
6472
---------------
6573

docs/source/user_guide/model_deployment/predict.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ Model framework serialization
9393
pytorch_model.predict(byte_im)['prediction'][0][:10]
9494
9595
pytorch_model.delete_deployment(wait_for_completion=True)
96-
ModelCatalog(compartment_id=os.environ['NB_SESSION_COMPARTMENT_OCID']).delete_model(model_id)
96+
pytorch_model.delete()
9797
9898
9999
The change needed in `score.py`:

docs/source/user_guide/model_registration/framework_specific_instruction.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
frameworks/sparkpipelinemodel
1010
frameworks/lightgbmmodel
1111
frameworks/xgboostmodel
12+
frameworks/huggingfacemodel
1213
frameworks/automlmodel
1314
frameworks/genericmodel
14-
15+

docs/source/user_guide/model_registration/frameworks/genericmodel.rst

Lines changed: 283 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,3 +111,286 @@ You can also use the shortcut ``.prepare_save_deploy()`` instead of calling ``.p
111111
112112
# To delete the deployed endpoint uncomment the line below
113113
# model.delete_deployment(wait_for_completion=True)
114+
115+
116+
Example -- CatBoost
117+
===================
118+
119+
Here is a more realistic example using CatBoost model.
120+
121+
.. code-block:: python3
122+
123+
import tempfile
124+
import ads
125+
from ads.model.generic_model import GenericModel
126+
from catboost import CatBoostRegressor
127+
128+
ads.set_auth(auth="resource_principal")
129+
130+
# Initialize data
131+
132+
X_train = [[1, 4, 5, 6],
133+
[4, 5, 6, 7],
134+
[30, 40, 50, 60]]
135+
136+
X_test = [[2, 4, 6, 8],
137+
[1, 4, 50, 60]]
138+
139+
y_train = [10, 20, 30]
140+
141+
# Initialize CatBoostRegressor
142+
catboost_estimator = CatBoostRegressor(iterations=2,
143+
learning_rate=1,
144+
depth=2)
145+
# Train a CatBoostRegressor model
146+
catboost_estimator.fit(X_train, y_train)
147+
148+
# Get predictions
149+
preds = catboost_estimator.predict(X_test)
150+
151+
# Instantiate ads.model.generic_model.GenericModel using the trained Custom Model using the trained CatBoost Classifier model
152+
catboost_model = GenericModel(estimator=catboost_estimator,
153+
artifact_dir=tempfile.mkdtemp(),
154+
model_save_serializer="cloudpickle",
155+
model_input_serializer="json")
156+
157+
# Autogenerate score.py, pickled model, runtime.yaml, input_schema.json and output_schema.json
158+
catboost_model.prepare(
159+
inference_conda_env="oci://bucket@namespace/path/to/your/conda/pack",
160+
inference_python_version="your_python_version",
161+
X_sample=X_train,
162+
y_sample=y_train,
163+
)
164+
165+
# Verify generated artifacts. Payload looks like this: [[2, 4, 6, 8], [1, 4, 50, 60]]
166+
catboost_model.verify(X_test, auto_serialize_data=True)
167+
168+
# Register CatBoostRegressor model
169+
model_id = catboost_model.save(display_name="CatBoost Model")
170+
catboost_model.deploy()
171+
catboost_model.predict(X_test)
172+
catboost_model.delete_deployment(wait_for_completion=True)
173+
catboost_model.delete() # delete the model
174+
175+
176+
Example -- Save Your Own Model
177+
==============================
178+
179+
By default, the ``serialize`` in ``GenericModel`` class is True, and it will serialize the model using cloudpickle. However, you can set ``serialize=False`` to disable it. And serialize the model on your own. You just need to copy the serialized model into the ``.artifact_dir``. This example shows step by step how you can do that.
180+
The example is illustrated using an AutoMLx model.
181+
182+
.. code-block:: python3
183+
184+
import tempfile
185+
from ads import set_auth
186+
from ads.model import GenericModel
187+
from sklearn.datasets import load_iris
188+
from sklearn.linear_model import LogisticRegression
189+
from sklearn.model_selection import train_test_split
190+
191+
set_auth(auth="resource_principal")
192+
193+
# Load dataset and Prepare train and test split
194+
iris = load_iris()
195+
X, y = iris.data, iris.target
196+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
197+
198+
# Train a LogisticRegression model
199+
sklearn_estimator = LogisticRegression()
200+
sklearn_estimator.fit(X_train, y_train)
201+
202+
# Serialize your model. You can choose your own way to serialize your model.
203+
import cloudpickle
204+
with open("./model.pkl", "wb") as f:
205+
cloudpickle.dump(sklearn_estimator, f)
206+
207+
model = GenericModel(sklearn_estimator, artifact_dir = "model_artifact_folder", serialize=False)
208+
model.prepare(inference_conda_env="generalml_p38_cpu_v1",force_overwrite=True, model_file_name="model.pkl", X_sample=X_test)
209+
210+
Now copy the model.pkl file and paste into the ``model_artifact_folder`` folder. And open the score.py in the ``model_artifact_folder`` folder to add implementation of the ``load_model`` function. You can also add your preprocessing steps in ``pre_inference`` function and postprocessing steps in ``post_inference`` function. Below is an example implementation of the score.py.
211+
Replace your score.py with the code below.
212+
213+
.. code-block:: python3
214+
:emphasize-lines: 28, 29, 30, 31, 123
215+
216+
# score.py 1.0 generated by ADS 2.8.2 on 20230301_065458
217+
import os
218+
import sys
219+
import json
220+
from functools import lru_cache
221+
222+
model_name = 'model.pkl'
223+
224+
225+
"""
226+
Inference script. This script is used for prediction by scoring server when schema is known.
227+
"""
228+
229+
@lru_cache(maxsize=10)
230+
def load_model(model_file_name=model_name):
231+
"""
232+
Loads model from the serialized format
233+
234+
Returns
235+
-------
236+
model: a model instance on which predict API can be invoked
237+
"""
238+
model_dir = os.path.dirname(os.path.realpath(__file__))
239+
if model_dir not in sys.path:
240+
sys.path.insert(0, model_dir)
241+
contents = os.listdir(model_dir)
242+
if model_file_name in contents:
243+
import cloudpickle
244+
with open(os.path.join(model_dir, model_name), "rb") as f:
245+
model = cloudpickle.load(f)
246+
return model
247+
else:
248+
raise Exception(f'{model_file_name} is not found in model directory {model_dir}')
249+
250+
@lru_cache(maxsize=1)
251+
def fetch_data_type_from_schema(input_schema_path=os.path.join(os.path.dirname(os.path.realpath(__file__)), "input_schema.json")):
252+
"""
253+
Returns data type information fetch from input_schema.json.
254+
255+
Parameters
256+
----------
257+
input_schema_path: path of input schema.
258+
259+
Returns
260+
-------
261+
data_type: data type fetch from input_schema.json.
262+
263+
"""
264+
data_type = {}
265+
if os.path.exists(input_schema_path):
266+
schema = json.load(open(input_schema_path))
267+
for col in schema['schema']:
268+
data_type[col['name']] = col['dtype']
269+
else:
270+
print("input_schema has to be passed in in order to recover the same data type. pass `X_sample` in `ads.model.framework.sklearn_model.SklearnModel.prepare` function to generate the input_schema. Otherwise, the data type might be changed after serialization/deserialization.")
271+
return data_type
272+
273+
def deserialize(data, input_schema_path):
274+
"""
275+
Deserialize json serialization data to data in original type when sent to predict.
276+
277+
Parameters
278+
----------
279+
data: serialized input data.
280+
input_schema_path: path of input schema.
281+
282+
Returns
283+
-------
284+
data: deserialized input data.
285+
286+
"""
287+
288+
import pandas as pd
289+
import numpy as np
290+
import base64
291+
from io import BytesIO
292+
if isinstance(data, bytes):
293+
return data
294+
295+
data_type = data.get('data_type', '') if isinstance(data, dict) else ''
296+
json_data = data.get('data', data) if isinstance(data, dict) else data
297+
298+
if "numpy.ndarray" in data_type:
299+
load_bytes = BytesIO(base64.b64decode(json_data.encode('utf-8')))
300+
return np.load(load_bytes, allow_pickle=True)
301+
if "pandas.core.series.Series" in data_type:
302+
return pd.Series(json_data)
303+
if "pandas.core.frame.DataFrame" in data_type or isinstance(json_data, str):
304+
return pd.read_json(json_data, dtype=fetch_data_type_from_schema(input_schema_path))
305+
if isinstance(json_data, dict):
306+
return pd.DataFrame.from_dict(json_data)
307+
return json_data
308+
309+
def pre_inference(data, input_schema_path):
310+
"""
311+
Preprocess data
312+
313+
Parameters
314+
----------
315+
data: Data format as expected by the predict API of the core estimator.
316+
input_schema_path: path of input schema.
317+
318+
Returns
319+
-------
320+
data: Data format after any processing.
321+
322+
"""
323+
return deserialize(data, input_schema_path)
324+
325+
def post_inference(yhat):
326+
"""
327+
Post-process the model results
328+
329+
Parameters
330+
----------
331+
yhat: Data format after calling model.predict.
332+
333+
Returns
334+
-------
335+
yhat: Data format after any processing.
336+
337+
"""
338+
return yhat.tolist()
339+
340+
def predict(data, model=load_model(), input_schema_path=os.path.join(os.path.dirname(os.path.realpath(__file__)), "input_schema.json")):
341+
"""
342+
Returns prediction given the model and data to predict
343+
344+
Parameters
345+
----------
346+
model: Model instance returned by load_model API.
347+
data: Data format as expected by the predict API of the core estimator. For eg. in case of sckit models it could be numpy array/List of list/Pandas DataFrame.
348+
input_schema_path: path of input schema.
349+
350+
Returns
351+
-------
352+
predictions: Output from scoring server
353+
Format: {'prediction': output from model.predict method}
354+
355+
"""
356+
features = pre_inference(data, input_schema_path)
357+
yhat = post_inference(
358+
model.predict(features)
359+
)
360+
return {'prediction': yhat}
361+
362+
Save the score.py and now call ``.verify()`` to check if it works locally.
363+
364+
.. code-block:: python3
365+
366+
model.verify(X_test[:2], auto_serialize_data=True)
367+
368+
After verify run successfully, you can save the model to model catalog, deploy and call predict to invoke the endpoint.
369+
370+
.. code-block:: python3
371+
372+
model_id = model.save(display_name='Demo AutoMLModel model')
373+
deploy = model.deploy(display_name='Demo AutoMLModel deployment')
374+
model.predict(X_test[:2].tolist())
375+
376+
You can also use the shortcut ``.prepare_save_deploy()`` instead of calling ``.prepare()``, ``.save()`` and ``.deploy()`` seperately.
377+
378+
.. code-block:: python3
379+
380+
import tempfile
381+
from ads.catalog.model import ModelCatalog
382+
from ads.model.generic_model import GenericModel
383+
384+
class Toy:
385+
def predict(self, x):
386+
return x ** 2
387+
estimator = Toy()
388+
389+
model = GenericModel(estimator=estimator)
390+
model.summary_status()
391+
# If you are running the code inside a notebook session and using a service pack, `inference_conda_env` can be omitted.
392+
model.prepare_save_deploy(inference_conda_env="dataexpl_p37_cpu_v3")
393+
model.verify(2)
394+
model.predict(2)
395+
model.delete_deployment(wait_for_completion=True)
396+
ModelCatalog(compartment_id=os.environ['NB_SESSION_COMPARTMENT_OCID']).delete_model(model.model_id)

0 commit comments

Comments
 (0)