Add in new generate_requirements_json.ipynb example.

smlindauer · smlindauer · commit 244d9d0f3ab4 · 2023-01-19T14:42:04.000-05:00
diff --git a/examples/data/hmeqModels/DecisionTreeClassifier/requirements.json b/examples/data/hmeqModels/DecisionTreeClassifier/requirements.json
@@ -1,14 +1,14 @@
 [
     {
-        "step": "install pandas",
-        "command": "pip install pandas==1.2.4"
+        "step": "install scikit-learn",
+        "command": "pip install scikit-learn==1.2.0"
     },
     {
         "step": "install numpy",
-        "command": "pip install numpy==1.20.1"
+        "command": "pip install numpy==1.23.5"
     },
     {
-        "step": "install sklearn",
-        "command": "pip install sklearn"
+        "step": "install pandas",
+        "command": "pip install pandas==1.5.3"
     }
-]
+]
diff --git a/examples/generate_requirements_json.ipynb b/examples/generate_requirements_json.ipynb
@@ -0,0 +1,196 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "09361f03-99d3-4cbf-a3ba-a75ca2c74b35",
+   "metadata": {},
+   "source": [
+    "Copyright © 2023, SAS Institute Inc., Cary, NC, USA.  All Rights Reserved.\n",
+    "SPDX-License-Identifier: Apache-2.0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9b8cb7c-1974-4af5-8992-d51f90fcfe5b",
+   "metadata": {},
+   "source": [
+    "# Automatic Generation of the requirements.json File\n",
+    "In order to validate Python models within a container publishing destination, the Python packages which contain the modules that are used in the Python score code file and its score resource files must be installed in the run-time container. You can install the packages when you publish a Python model or decision that contains a Python model to a container publishing destination by adding a `requirements.json` file that includes the package install statements to your model.\n",
+    "\n",
+    "This notebook provides an example execution and assessment of the create_requirements_json() function added in python-sasctl v1.8.0. The aim of this function is help to create the instructions (aka the `requirements.json` file) for a lightweight Python container in SAS Model Manager. Lightweight here meaning that the container will only install the packages found in the model's pickle files and python scripts.\n",
+    "\n",
+    "### **User Warnings**\n",
+    "The methods utilized in this function can determine package dependencies and versions from provided scripts and pickle files, but there are some stipulations that need to be considered:\n",
+    "\n",
+    "1. If run outside of the development environment that the model was created in, the create_requirements_json() function **CANNOT** determine the required package _versions_ accurately. \n",
+    "2. Not all Python packages have matching import and install names and as such some of the packages added to the requirements.json file may be incorrectly named (i.e. `import sklearn` vs `pip install scikit-learn`).\n",
+    "\n",
+    "As such, it is recommended that the user check over the requirements.json file for package name and version accuracy before deploying to a run-time container in SAS Model Manager."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef68334e-7fa3-481a-bc39-9aa6c389f925",
+   "metadata": {},
+   "source": [
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4613074a-a138-4d93-810a-1bbfca79e957",
+   "metadata": {},
+   "source": [
+    "As an example, let's create the requirements.json file for the HMEQ Decision Tree Classification model created and uploaded in pzmmModelImportExample.ipynb. Simply import the function and aim it at the model directory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "654a1382-9576-4215-bf47-ac7fc69428e5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pathlib import Path\n",
+    "from sasctl import pzmm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "1df8e5d3-c62e-4c35-993c-765a48d25444",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_dir = Path.cwd() / \"data/hmeqModels/DecisionTreeClassifier\"\n",
+    "requirements_json = pzmm.JSONFiles.create_requirements_json(model_dir)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ced96ece-8221-413f-a5b5-a03fa93be8fd",
+   "metadata": {},
+   "source": [
+    "Let's take a quick look at what packages were determined for the Decision Tree Classifier model:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "2e3b29e6-aef5-4a02-a54b-57bf7e853cf0",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[\n",
+      "    {\n",
+      "        \"command\": \"pip install sklearn\",\n",
+      "        \"step\": \"install sklearn\"\n",
+      "    },\n",
+      "    {\n",
+      "        \"command\": \"pip install numpy==1.23.5\",\n",
+      "        \"step\": \"install numpy\"\n",
+      "    },\n",
+      "    {\n",
+      "        \"command\": \"pip install pandas==1.5.3\",\n",
+      "        \"step\": \"install pandas\"\n",
+      "    }\n",
+      "]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import json\n",
+    "print(json.dumps(requirements_json, sort_keys=True, indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0b11bc8-a1f3-46ff-a232-90b93b1bdabc",
+   "metadata": {},
+   "source": [
+    "Note how we have returned the `sklearn` import, which is attempting to refer to the scikit-learn package, but would fail to install the correct package via `pip install sklearn` and also could not collect a package version.\n",
+    "\n",
+    "Let's modify the name and add the version in Python and rewrite the requirements.json file to match."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "49721dc9-38e2-4d63-86e1-6555b364f4d6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[\n",
+      "    {\n",
+      "        \"command\": \"pip install scikit-learn==1.2.0\",\n",
+      "        \"step\": \"install scikit-learn\"\n",
+      "    },\n",
+      "    {\n",
+      "        \"command\": \"pip install numpy==1.23.5\",\n",
+      "        \"step\": \"install numpy\"\n",
+      "    },\n",
+      "    {\n",
+      "        \"command\": \"pip install pandas==1.5.3\",\n",
+      "        \"step\": \"install pandas\"\n",
+      "    }\n",
+      "]\n"
+     ]
+    }
+   ],
+   "source": [
+    "scikit_learn_install = {\n",
+    "    \"command\": \"pip install scikit-learn==1.2.0\",\n",
+    "    \"step\": \"install scikit-learn\"\n",
+    "}\n",
+    "requirements_json[0].update(scikit_learn_install)\n",
+    "print(json.dumps(requirements_json, sort_keys=True, indent=4))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "90da05c4-cd05-423d-8626-97125937f72b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open(Path(model_dir) / \"requirements.json\", \"w\") as req_file:\n",
+    "    req_file.write(json.dumps(requirements_json, indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53e5f3ca-f990-4ca7-9e92-2505087ff985",
+   "metadata": {},
+   "source": [
+    "Now we have a complete and accurate requirements.json file for deploying models to containers in SAS Model Manager!"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "dev-py38",
+   "language": "python",
+   "name": "dev-py38"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

Original file line number	Diff line number	Diff line change
`@@ -1,14 +1,14 @@`
`1`	`1`	`[`
`2`	`2`	`{`
`3`		`- "step": "install pandas",`
`4`		`- "command": "pip install pandas==1.2.4"`
	`3`	`+ "step": "install scikit-learn",`
	`4`	`+ "command": "pip install scikit-learn==1.2.0"`
`5`	`5`	`},`
`6`	`6`	`{`
`7`	`7`	`"step": "install numpy",`
`8`		`- "command": "pip install numpy==1.20.1"`
	`8`	`+ "command": "pip install numpy==1.23.5"`
`9`	`9`	`},`
`10`	`10`	`{`
`11`		`- "step": "install sklearn",`
`12`		`- "command": "pip install sklearn"`
	`11`	`+ "step": "install pandas",`
	`12`	`+ "command": "pip install pandas==1.5.3"`
`13`	`13`	`}`
`14`		`-]`
	`14`	`+]`