added readme, testcases and config files

prasankh · prasankh · commit 5fbe92939ca7 · 2024-06-27T11:27:01.000+05:30
diff --git a/ads/opctl/operator/lowcode/anomaly/README.md b/ads/opctl/operator/lowcode/anomaly/README.md
@@ -58,7 +58,7 @@ The operator will run in your local environment without requiring any additional
 
 ## 4. Running anomaly detection on the local container
 
-To run the anomaly detection detection operator within a local container, follow these steps:
+To run the anomaly detection operator within a local container, follow these steps:
 
 Use the command below to build the anomaly detection container.
 
@@ -106,7 +106,7 @@ ads operator run -f ~/anomaly/anomaly.yaml --backend-config ~/anomaly/backend_op
 
 ## 5. Running anomaly detection in the Data Science job within container runtime
 
-To execute the anomaly detection detection operator within a Data Science job using container runtime, please follow the steps outlined below:
+To execute the anomaly detection operator within a Data Science job using container runtime, please follow the steps outlined below:
 
 You can use the following command to build the anomaly detection container. This step can be skipped if you have already done this for running the operator within a local container.
 
@@ -155,7 +155,7 @@ ads opctl watch <OCID>
 
 ## 6. Running anomaly detection in the Data Science job within conda runtime
 
-To execute the anomaly detection detection operator within a Data Science job using conda runtime, please follow the steps outlined below:
+To execute the anomaly detection operator within a Data Science job using conda runtime, please follow the steps outlined below:
 
 You can use the following command to build the anomaly detection conda environment.
 
diff --git a/ads/opctl/operator/lowcode/recommender/MLoperator b/ads/opctl/operator/lowcode/recommender/MLoperator
@@ -0,0 +1,16 @@
+type: recommender
+version: v1
+conda_type: service
+name: Recommender Operator
+gpu: no
+keywords:
+  - Recommender
+backends:
+  - job
+  - operator.local
+description: |
+  Recommender Systems are designed to suggest relevant items, products, or content to users based on their
+  preferences and behaviors. These systems are widely used in various industries such as e-commerce, entertainment,
+  and social media to enhance user experience by providing personalized recommendations. They help in increasing user
+  engagement, satisfaction, and sales by predicting what users might like or need based on their past interactions
+  and the preferences of similar users.
diff --git a/ads/opctl/operator/lowcode/recommender/README.md b/ads/opctl/operator/lowcode/recommender/README.md
@@ -0,0 +1,206 @@
+# Recommender Operator
+
+Recommender Systems are designed to suggest relevant items, products, or content to users based on their preferences and behaviors. These systems are widely used in various industries such as e-commerce, entertainment, and social media to enhance user experience by providing personalized recommendations. They help in increasing user engagement, satisfaction, and sales by predicting what users might like or need based on their past interactions and the preferences of similar users.
+
+
+Below are the steps to configure and run the Recommender Operator on different resources.
+
+## 1. Prerequisites
+
+Follow the [CLI Configuration](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/opctl/configure.html) steps from the ADS documentation. This step is mandatory as it sets up default values for different options while running the Recommender Operator on OCI Data Science jobs or OCI Data Flow applications. If you have previously done this and used a flexible shape, make sure to adjust `ml_job_config.ini` with shape config details and `docker_registry` information.
+
+- ocpus = 1
+- memory_in_gbs = 16
+- docker_registry = `<iad.ocir.io/namespace/>`
+
+## 2. Generating configs
+
+To generate starter configs, run the command below. This will create a list of YAML configs and place them in the `output` folder.
+
+```bash
+ads operator init -t recommender --overwrite --output ~/recommender/
+```
+
+The most important files expected to be generated are:
+
+- `recommender.yaml`: Contains recommender related configuration.
+- `backend_operator_local_python_config.yaml`: This includes a local backend configuration for running recommender in a local environment. The environment should be set up manually before running the operator.
+- `backend_operator_local_container_config.yaml`: This includes a local backend configuration for running recommender within a local container. The container should be built before running the operator. Please refer to the instructions below for details on how to accomplish this.
+- `backend_job_container_config.yaml`: Contains Data Science job-related config to run recommender in a Data Science job within a container (BYOC) runtime. The container should be built and published before running the operator. Please refer to the instructions below for details on how to accomplish this.
+- `backend_job_python_config.yaml`: Contains Data Science job-related config to run recommender in a Data Science job within a conda runtime. The conda should be built and published before running the operator.
+
+All generated configurations should be ready to use without the need for any additional adjustments. However, they are provided as starter kit configurations that can be customized as needed.
+
+## 3. Running recommender on the local conda environment
+
+To run recommender locally, create and activate a new conda environment (`ads-recommender`). Install all the required libraries listed in the `environment.yaml` file.
+
+```yaml
+- report-creator
+- cerberus
+- "git+https://github.com/oracle/accelerated-data-science.git@feature/recommender#egg=oracle-ads"
+```
+
+Please review the previously generated `recommender.yaml` file using the `init` command, and make any necessary adjustments to the input and output file locations. By default, it assumes that the files should be located in the same folder from which the `init` command was executed.
+
+Use the command below to verify the recommender config.
+
+```bash
+ads operator verify -f ~/recommender/recommender.yaml
+```
+
+Use the following command to run the recommender within the `ads-recommender` conda environment.
+
+```bash
+ads operator run -f ~/recommender/recommender.yaml -b local
+```
+
+The operator will run in your local environment without requiring any additional modifications.
+
+## 4. Running recommender on the local container
+
+To run the recommender operator within a local container, follow these steps:
+
+Use the command below to build the recommender container.
+
+```bash
+ads operator build-image -t recommender
+```
+
+This will create a new `recommender:v1` image, with `/etc/operator` as the designated working directory within the container.
+
+
+Check the `backend_operator_local_container_config.yaml` config file. By default, it should have a `volume` section with the `.oci` configs folder mounted.
+
+```yaml
+volume:
+  - "/Users/<user>/.oci:/root/.oci"
+```
+
+Mounting the OCI configs folder is only required if an OCI Object Storage bucket will be used to store the input recommender data or output recommender result. The input/output folders can also be mounted to the container.
+
+```yaml
+volume:
+  - /Users/<user>/.oci:/root/.oci
+  - /Users/<user>/recommender/data:/etc/operator/data
+  - /Users/<user>/recommender/result:/etc/operator/result
+```
+
+The full config can look like:
+```yaml
+kind: operator.local
+spec:
+  image: recommender:v1
+  volume:
+  - /Users/<user>/.oci:/root/.oci
+  - /Users/<user>/recommender/data:/etc/operator/data
+  - /Users/<user>/recommender/result:/etc/operator/result
+type: container
+version: v1
+```
+
+Run the recommender within a container using the command below:
+
+```bash
+ads operator run -f ~/recommender/recommender.yaml --backend-config ~/recommender/backend_operator_local_container_config.yaml
+```
+
+## 5. Running recommender in the Data Science job within container runtime
+
+To execute the recommender operator within a Data Science job using container runtime, please follow the steps outlined below:
+
+You can use the following command to build the recommender container. This step can be skipped if you have already done this for running the operator within a local container.
+
+```bash
+ads operator build-image -t recommender
+```
+
+This will create a new `recommender:v1` image, with `/etc/operator` as the designated working directory within the container.
+
+Publish the `recommender:v1` container to the [Oracle Container Registry](https://docs.public.oneportal.content.oci.oraclecloud.com/en-us/iaas/Content/Registry/home.htm). To become familiar with OCI, read the documentation links posted below.
+
+- [Access Container Registry](https://docs.public.oneportal.content.oci.oraclecloud.com/en-us/iaas/Content/Registry/Concepts/registryoverview.htm#access)
+- [Create repositories](https://docs.public.oneportal.content.oci.oraclecloud.com/en-us/iaas/Content/Registry/Tasks/registrycreatingarepository.htm#top)
+- [Push images](https://docs.public.oneportal.content.oci.oraclecloud.com/en-us/iaas/Content/Registry/Tasks/registrypushingimagesusingthedockercli.htm#Pushing_Images_Using_the_Docker_CLI)
+
+To publish `recommender:v1` to OCR, use the command posted below:
+
+```bash
+ads operator publish-image recommender:v1 --registry <iad.ocir.io/tenancy/>
+```
+
+After the container is published to OCR, it can be used within Data Science jobs service. Check the `backend_job_container_config.yaml` config file. It should contain pre-populated infrastructure and runtime sections. The runtime section should contain an image property, something like `image: iad.ocir.io/<tenancy>/recommender:v1`. More details about supported options can be found in the ADS Jobs documentation - [Run a Container](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/jobs/run_container.html).
+
+Adjust the `recommender.yaml` config with proper input/output folders. When the recommender is run in the Data Science job, it will not have access to local folders. Therefore, input data and output folders should be placed in the Object Storage bucket. Open the `recommender.yaml` and adjust the following fields:
+
+```yaml
+input_data:
+  url: oci://bucket@namespace/recommender/input_data/data.csv
+output_directory:
+  url: oci://bucket@namespace/recommender/result/
+```
+
+Run the recommender on the Data Science jobs using the command posted below:
+
+```bash
+ads operator run -f ~/recommender/recommender.yaml --backend-config ~/recommender/backend_job_container_config.yaml
+```
+
+The logs can be monitored using the `ads opctl watch` command.
+
+```bash
+ads opctl watch <OCID>
+```
+
+## 6. Running recommender in the Data Science job within conda runtime
+
+To execute the recommender operator within a Data Science job using conda runtime, please follow the steps outlined below:
+
+You can use the following command to build the recommender conda environment.
+
+```bash
+ads operator build-conda -t recommender
+```
+
+This will create a new `recommender_v1` conda environment and place it in the folder specified within `ads opctl configure` command.
+
+Use the command below to Publish the `recommender_v1` conda environment to the Object Storage bucket.
+
+```bash
+ads operator publish-conda -t recommender
+```
+More details about configuring CLI can be found here - [Configuring CLI](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/opctl/configure.html)
+
+
+After the conda environment is published to Object Storage, it can be used within Data Science jobs service. Check the `backend_job_python_config.yaml` config file. It should contain pre-populated infrastructure and runtime sections. The runtime section should contain a `conda` section.
+
+```yaml
+conda:
+  type: published
+  uri: oci://bucket@namespace/conda_environments/cpu/recommender/1/recommender_v1
+```
+
+More details about supported options can be found in the ADS Jobs documentation - [Run a Python Workload](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/jobs/run_python.html).
+
+Adjust the `recommender.yaml` config with proper input/output folders. When the recommender is run in the Data Science job, it will not have access to local folders. Therefore, input data and output folders should be placed in the Object Storage bucket. Open the `recommender.yaml` and adjust the following fields:
+
+```yaml
+input_data:
+  url: oci://bucket@namespace/recommender/input_data/data.csv
+output_directory:
+  url: oci://bucket@namespace/recommender/result/
+test_data:
+  url: oci://bucket@namespace/recommender/input_data/test.csv
+```
+
+Run the recommender on the Data Science jobs using the command posted below:
+
+```bash
+ads operator run -f ~/recommender/recommender.yaml --backend-config ~/recommender/backend_job_python_config.yaml
+```
+
+The logs can be monitored using the `ads opctl watch` command.
+
+```bash
+ads opctl watch <OCID>
+```
diff --git a/ads/opctl/operator/lowcode/recommender/cmd.py b/ads/opctl/operator/lowcode/recommender/cmd.py
@@ -28,10 +28,6 @@ def init(**kwargs: Dict) -> str:
         The YAML specification generated based on the schema.
     """
 
-    default_detector = [{"name": "<type>.<entity>", "action": "mask"}]
-
     return YamlGenerator(
         schema=_load_yaml_from_uri(__file__.replace("cmd.py", "schema.yaml"))
-    ).generate_example_dict(
-        values={"type": kwargs.get("type"), "detectors": default_detector}
-    )
+    ).generate_example_dict(values={"type": kwargs.get("type")})
diff --git a/ads/opctl/operator/lowcode/recommender/schema.yaml b/ads/opctl/operator/lowcode/recommender/schema.yaml
@@ -21,11 +21,12 @@ type:
   type: string
   default: recommender
   meta:
-    description: "Type should always be `pii` when using a pii operator"
+    description: "Type should always be `recommender` when using a recommender operator"
 
 
 spec:
   required: true
+  type: dict
   schema:
     user_data:
       required: true
@@ -83,6 +84,7 @@ spec:
     item_data:
       required: true
       type: dict
+      default: {"url": "item_data.csv"}
       meta:
         description: "This should contain item related attribute"
       schema:
@@ -134,6 +136,7 @@ spec:
 
     interactions_data:
       required: true
+      default: {"url": "interactions_data.csv"}
       meta:
         description: "This should include interactions between items and users"
       schema:
diff --git a/tests/operators/recommender/test_recommender.py b/tests/operators/recommender/test_recommender.py
@@ -0,0 +1,60 @@
+#!/usr/bin/env python
+
+# Copyright (c) 2023, 2024 Oracle and/or its affiliates.
+# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/
+
+import os
+import subprocess
+import tempfile
+from time import sleep
+
+import yaml
+
+DATASET_PREFIX = f"{os.path.dirname(os.path.abspath(__file__))}/../data/recommendation/"
+
+
+def test_recommender():
+    user_file = f"{DATASET_PREFIX}users.csv"
+    item_file = f"{DATASET_PREFIX}items.csv"
+    interation_file = f"{DATASET_PREFIX}interactions.csv"
+
+    yaml_params = {
+        "kind": "operator",
+        "type": "recommender",
+        "version": "v1",
+        "spec": {
+            "user_data": {
+                "url": user_file,
+            },
+            "item_data": {
+                "url": item_file,
+            },
+            "interactions_data": {
+                "url": interation_file,
+            },
+            "output_directory": {
+                "url": "results",
+            },
+            "top_k": 4,
+            "model_name": "svd",
+            "user_column": "user_id",
+            "item_column": "movie_id",
+            "interaction_column": "rating",
+            "recommendations_filename": "recommendations.csv",
+            "generate_report": True,
+            "report_filename": "report.html"
+        }
+    }
+
+    with tempfile.TemporaryDirectory() as tmpdirname:
+        recommender_yaml_filename = f"{tmpdirname}/recommender.yaml"
+        output_dirname = f"{tmpdirname}/results"
+        yaml_params["spec"]["output_directory"]["url"] = output_dirname
+        with open(recommender_yaml_filename, "w") as f:
+            f.write(yaml.dump(yaml_params))
+        sleep(0.5)
+        subprocess.run(
+            f"ads operator run -f {recommender_yaml_filename} --debug", shell=True
+        )
+        sleep(0.1)
+        assert os.path.exists(f"{output_dirname}/report.html"), "Report not generated."