Skip to content

Commit 1712c60

Browse files
add mlflow model
1 parent a2cd6bd commit 1712c60

13 files changed

+2155
-3
lines changed

Chapter5/machine_learning.ipynb

Lines changed: 257 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2570,6 +2570,262 @@
25702570
"source": [
25712571
"[Link to AutoGluon](https://bit.ly/45ljoOd)."
25722572
]
2573+
},
2574+
{
2575+
"cell_type": "markdown",
2576+
"id": "d8376b0e",
2577+
"metadata": {},
2578+
"source": [
2579+
"### Model Logging Made Easy: MLflow vs. Pickle"
2580+
]
2581+
},
2582+
{
2583+
"cell_type": "markdown",
2584+
"metadata": {},
2585+
"source": [
2586+
"Here is why using MLflow to log model is superior to using pickle to save model:\n",
2587+
"\n",
2588+
"1. Managing Library Versions:\n",
2589+
"- Problem: Different models may require different versions of the same library, which can lead to conflicts. Manually tracking and setting up the correct environment for each model is time-consuming and error-prone.\n",
2590+
"- Solution: By automatically logging dependencies, MLflow ensures that anyone can recreate the exact environment needed to run the model.\n",
2591+
"\n",
2592+
"2. Documenting Inputs and Outputs: \n",
2593+
"- Problem: Often, the expected inputs and outputs of a model are not well-documented, making it difficult for others to use the model correctly.\n",
2594+
"- Solution: By defining a clear schema for inputs and outputs, MLflow ensures that anyone using the model knows exactly what data to provide and what to expect in return.\n",
2595+
"\n",
2596+
"To demonstrate the advantages of MLflow, let’s implement a simple logistic regression model and log it."
2597+
]
2598+
},
2599+
{
2600+
"cell_type": "code",
2601+
"execution_count": 20,
2602+
"id": "644b25c0",
2603+
"metadata": {},
2604+
"outputs": [
2605+
{
2606+
"name": "stdout",
2607+
"output_type": "stream",
2608+
"text": [
2609+
"Saving data to runs:/f8b0fc900aa14cf0ade8d0165c5a9f11/model\n"
2610+
]
2611+
}
2612+
],
2613+
"source": [
2614+
"import mlflow\n",
2615+
"from mlflow.models import infer_signature\n",
2616+
"import numpy as np\n",
2617+
"from sklearn.linear_model import LogisticRegression\n",
2618+
"\n",
2619+
"with mlflow.start_run():\n",
2620+
" X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)\n",
2621+
" y = np.array([0, 0, 1, 1, 1, 0])\n",
2622+
" lr = LogisticRegression()\n",
2623+
" lr.fit(X, y)\n",
2624+
" signature = infer_signature(X, lr.predict(X))\n",
2625+
"\n",
2626+
" model_info = mlflow.sklearn.log_model(\n",
2627+
" sk_model=lr, artifact_path=\"model\", signature=signature\n",
2628+
" )\n",
2629+
"\n",
2630+
" print(f\"Saving data to {model_info.model_uri}\")"
2631+
]
2632+
},
2633+
{
2634+
"cell_type": "markdown",
2635+
"id": "28c9ff92",
2636+
"metadata": {},
2637+
"source": [
2638+
"The output indicates where the model has been saved. To use the logged model later, you can load it with the `model_uri`:"
2639+
]
2640+
},
2641+
{
2642+
"cell_type": "code",
2643+
"execution_count": 21,
2644+
"id": "f88b0415",
2645+
"metadata": {},
2646+
"outputs": [],
2647+
"source": [
2648+
"import mlflow\n",
2649+
"import numpy as np\n",
2650+
"\n",
2651+
"model_uri = \"runs:/1e20d72afccf450faa3b8a9806a97e83/model\"\n",
2652+
"sklearn_pyfunc = mlflow.pyfunc.load_model(model_uri=model_uri)\n",
2653+
"\n",
2654+
"data = np.array([-4, 1, 0, 10, -2, 1]).reshape(-1, 1)\n",
2655+
"\n",
2656+
"predictions = sklearn_pyfunc.predict(data)"
2657+
]
2658+
},
2659+
{
2660+
"cell_type": "markdown",
2661+
"id": "58ebe221",
2662+
"metadata": {},
2663+
"source": [
2664+
"Let's inspect the artifacts saved with the model:"
2665+
]
2666+
},
2667+
{
2668+
"cell_type": "code",
2669+
"execution_count": 22,
2670+
"id": "8acda9d6",
2671+
"metadata": {},
2672+
"outputs": [
2673+
{
2674+
"name": "stdout",
2675+
"output_type": "stream",
2676+
"text": [
2677+
"/Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5/mlruns/0/1e20d72afccf450faa3b8a9806a97e83/artifacts/model\n",
2678+
"MLmodel model.pkl requirements.txt\n",
2679+
"conda.yaml python_env.yaml\n"
2680+
]
2681+
}
2682+
],
2683+
"source": [
2684+
"%cd mlruns/0/1e20d72afccf450faa3b8a9806a97e83/artifacts/model\n",
2685+
"%ls"
2686+
]
2687+
},
2688+
{
2689+
"cell_type": "markdown",
2690+
"id": "ced58a8d",
2691+
"metadata": {},
2692+
"source": [
2693+
"The `MLmodel` file provides essential information about the model, including dependencies and input/output specifications:"
2694+
]
2695+
},
2696+
{
2697+
"cell_type": "code",
2698+
"execution_count": 23,
2699+
"id": "dc30e383",
2700+
"metadata": {},
2701+
"outputs": [
2702+
{
2703+
"name": "stdout",
2704+
"output_type": "stream",
2705+
"text": [
2706+
"artifact_path: model\n",
2707+
"flavors:\n",
2708+
" python_function:\n",
2709+
" env:\n",
2710+
" conda: conda.yaml\n",
2711+
" virtualenv: python_env.yaml\n",
2712+
" loader_module: mlflow.sklearn\n",
2713+
" model_path: model.pkl\n",
2714+
" predict_fn: predict\n",
2715+
" python_version: 3.11.6\n",
2716+
" sklearn:\n",
2717+
" code: null\n",
2718+
" pickled_model: model.pkl\n",
2719+
" serialization_format: cloudpickle\n",
2720+
" sklearn_version: 1.4.1.post1\n",
2721+
"mlflow_version: 2.15.0\n",
2722+
"model_size_bytes: 722\n",
2723+
"model_uuid: e7487bc3c4ab417c965144efcecaca2f\n",
2724+
"run_id: 1e20d72afccf450faa3b8a9806a97e83\n",
2725+
"signature:\n",
2726+
" inputs: '[{\"type\": \"tensor\", \"tensor-spec\": {\"dtype\": \"int64\", \"shape\": [-1, 1]}}]'\n",
2727+
" outputs: '[{\"type\": \"tensor\", \"tensor-spec\": {\"dtype\": \"int64\", \"shape\": [-1]}}]'\n",
2728+
" params: null\n",
2729+
"utc_time_created: '2024-08-02 20:58:16.516963'\n"
2730+
]
2731+
}
2732+
],
2733+
"source": [
2734+
"%cat MLmodel"
2735+
]
2736+
},
2737+
{
2738+
"cell_type": "markdown",
2739+
"id": "01ef7c12",
2740+
"metadata": {},
2741+
"source": [
2742+
"The `conda.yaml` and `python_env.yaml` files outline the environment dependencies, ensuring that the model runs in a consistent setup:"
2743+
]
2744+
},
2745+
{
2746+
"cell_type": "code",
2747+
"execution_count": 24,
2748+
"id": "1dce0181",
2749+
"metadata": {},
2750+
"outputs": [
2751+
{
2752+
"name": "stdout",
2753+
"output_type": "stream",
2754+
"text": [
2755+
"channels:\n",
2756+
"- conda-forge\n",
2757+
"dependencies:\n",
2758+
"- python=3.11.6\n",
2759+
"- pip<=24.2\n",
2760+
"- pip:\n",
2761+
" - mlflow==2.15.0\n",
2762+
" - cloudpickle==2.2.1\n",
2763+
" - numpy==1.23.5\n",
2764+
" - psutil==5.9.6\n",
2765+
" - scikit-learn==1.4.1.post1\n",
2766+
" - scipy==1.11.3\n",
2767+
"name: mlflow-env\n"
2768+
]
2769+
}
2770+
],
2771+
"source": [
2772+
"%cat conda.yaml"
2773+
]
2774+
},
2775+
{
2776+
"cell_type": "code",
2777+
"execution_count": 25,
2778+
"id": "16c2d3bc",
2779+
"metadata": {},
2780+
"outputs": [
2781+
{
2782+
"name": "stdout",
2783+
"output_type": "stream",
2784+
"text": [
2785+
"python: 3.11.6\n",
2786+
"build_dependencies:\n",
2787+
"- pip==24.2\n",
2788+
"- setuptools\n",
2789+
"- wheel==0.40.0\n",
2790+
"dependencies:\n",
2791+
"- -r requirements.txt\n"
2792+
]
2793+
}
2794+
],
2795+
"source": [
2796+
"%cat python_env.yaml\n"
2797+
]
2798+
},
2799+
{
2800+
"cell_type": "code",
2801+
"execution_count": 26,
2802+
"id": "b16b2916",
2803+
"metadata": {},
2804+
"outputs": [
2805+
{
2806+
"name": "stdout",
2807+
"output_type": "stream",
2808+
"text": [
2809+
"mlflow==2.15.0\n",
2810+
"cloudpickle==2.2.1\n",
2811+
"numpy==1.23.5\n",
2812+
"psutil==5.9.6\n",
2813+
"scikit-learn==1.4.1.post1\n",
2814+
"scipy==1.11.3"
2815+
]
2816+
}
2817+
],
2818+
"source": [
2819+
"%cat requirements.txt"
2820+
]
2821+
},
2822+
{
2823+
"cell_type": "markdown",
2824+
"id": "00f7bbf0",
2825+
"metadata": {},
2826+
"source": [
2827+
"[Learn more about MLFlow Models](https://bit.ly/46y6gpF)."
2828+
]
25732829
}
25742830
],
25752831
"metadata": {
@@ -2590,7 +2846,7 @@
25902846
"name": "python",
25912847
"nbconvert_exporter": "python",
25922848
"pygments_lexer": "ipython3",
2593-
"version": "3.11.4"
2849+
"version": "3.11.6"
25942850
},
25952851
"toc": {
25962852
"base_numbering": 1,

Chapter5/mlflow_sklearn_load_data.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
import mlflow
2+
import numpy as np
3+
4+
model_uri = "runs:/1e20d72afccf450faa3b8a9806a97e83/model"
5+
sklearn_pyfunc = mlflow.pyfunc.load_model(model_uri=model_uri)
6+
7+
data = np.array([-4, 1, 0, 10, -2, 1]).reshape(-1, 1)
8+
9+
predictions = sklearn_pyfunc.predict(data)

Chapter5/mlflow_sklearn_log_data.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
import mlflow
2+
from mlflow.models import infer_signature
3+
import numpy as np
4+
from sklearn.linear_model import LogisticRegression
5+
6+
with mlflow.start_run():
7+
X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
8+
y = np.array([0, 0, 1, 1, 1, 0])
9+
lr = LogisticRegression()
10+
lr.fit(X, y)
11+
signature = infer_signature(X, lr.predict(X))
12+
13+
model_info = mlflow.sklearn.log_model(
14+
sk_model=lr, artifact_path="model", signature=signature
15+
)
16+
17+
print(f"Saving data to {model_info.model_uri}")

0 commit comments

Comments
 (0)