Skip to content

Commit 5509fa6

Browse files
authored
Addition of ML Registry Functionality (#110)
* Adding ML Registry functionality * Cleanup whitespace in automl.q * Removing typo in components section of README * Fix typo in README * Fix typo in README * Remove 'Status' section in registry README * Cleanup docs folder in mlops, update registry api and examples docs
1 parent 7657d99 commit 5509fa6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+7805
-4
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,8 +104,9 @@ This library contains functions that cover the following areas:
104104
- Utility functions relating to areas including statistical analysis, data preprocessing and array manipulation.
105105
- A multi-processing framework to parallelize work across many cores or nodes.
106106
- Functions for seamless integration with PyKX or EmbedPy, which ensure seamless interoperability between Python and kdb+/q in either environment.
107+
- A location for the storage and versioning of ML models on-prem along with a common model retrieval API allowing models regardless of underlying requirements to be retrieved and used on kdb+ data. This allows for enhanced team collaboration opportunities and management oversight by centralising team work to a common storage location.
107108

108-
These sections are explained in greater depth within the [FRESH](ml/docs/fresh.md), [cross validation](ml/docs/xval.md), [clustering](ml/docs/clustering/algos.md), [timeseries](ml/docs/timeseries/README.md), [optimization](ml/docs/optimize.md), [graph/pipeline](ml/docs/graph/README.md) and [utilities](ml/docs/utilities/metric.md) documentation.
109+
These sections are explained in greater depth within the [FRESH](ml/docs/fresh.md), [cross validation](ml/docs/xval.md), [clustering](ml/docs/clustering/algos.md), [timeseries](ml/docs/timeseries/README.md), [optimization](ml/docs/optimize.md), [graph/pipeline](ml/docs/graph/README.md), [utilities](ml/docs/utilities/metric.md) and [registry](ml/docs/registry/README.md) documentation.
109110

110111

111112
### nlp
@@ -171,3 +172,4 @@ The Machine Learning Toolkit is provided here under an Apache 2.0 license.
171172
If you find issues with the interface or have feature requests, please [raise an issue](https://github.com/KxSystems/ml/issues).
172173

173174
To contribute to this project, please follow the [contributing guide](CONTRIBUTING.md).
175+

automl/automl.q

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,4 +45,3 @@ if[all`config`run in commandLineArguments;
4545
testRun:`test in commandLineArguments;
4646
runCommandLine[testRun];
4747
exit 0]
48-

docker/Dockerfile

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22

33
FROM registry.gitlab.com/kxdev/kxinsights/data-science/ml-tools/automl:embedpy-gcc-deb12
44

5+
# Java and jq packages required for registry tests
6+
RUN apt-get update && apt-get install -y openjdk-17-jdk && rm -rf /var/lib/apt/lists/*
7+
8+
ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64/
9+
510
COPY requirements_pinned.txt /opt/kx/automl/
611

712
USER kx

ml/docs/registry/README.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# ML Registry
2+
3+
The ML Model Registry defines a centralised location for the storage of the following versioned entities:
4+
5+
1. Machine Learning Models
6+
2. Model parameters
7+
3. Performance metrics
8+
4. Model configuration
9+
5. Model monitoring information
10+
11+
The ML Registry is intended to allow models and all important metadata information associated with them to be stored locally.
12+
13+
In the context of an MLOps offering the model registry is a collaborative location allowing teams to work together on different stages of a machine learning workflow from model experimentation to publishing a model to production. It is designed to aid in this in the following ways:
14+
15+
1. Provide a solution with which users can store models generated in q/Python to a centralised location on-prem.
16+
2. A common model retrieval API allowing models regardless of underlying requirements to be retrieved and used on kdb+ data.
17+
3. The ability to store information related to model training/monitoring requirements, allowing sysadmins to control the promotion of models to production environments.
18+
4. Enhanced team collaboration opportunities and management oversight by centralising team work to a common storage location.
19+
20+
## Contents
21+
22+
- [Quick start](#quick-start)
23+
- [Documentation](#documentation)
24+
- [Testing](#testing)
25+
- [Status](#status)
26+
27+
28+
## Quick start
29+
30+
Start by following the installation step found [here](../../../README.md) or alternatively start a q session using the code below from the `ml` folder
31+
32+
```
33+
$ q init.q
34+
q)
35+
```
36+
37+
Generate a model registry in the current directory and display the contents
38+
39+
```
40+
q).ml.registry.new.registry[::;::];
41+
q)\ls
42+
"CODEOWNERS"
43+
"CONTRIBUTING.md"
44+
"KX_ML_REGISTRY"
45+
...
46+
q)\ls KX_ML_REGISTRY
47+
"modelStore"
48+
"namedExperiments"
49+
"unnamedExperiments"
50+
```
51+
52+
Add an experiment folder to the registry
53+
54+
```
55+
q).ml.registry.new.experiment[::;"test";::];
56+
q)\ls KX_ML_REGISTRY/namedExperiments/
57+
"test"
58+
```
59+
60+
Add a basic q model associated with the experiment
61+
62+
```
63+
q).ml.registry.set.model[::;{x+1};"mymodel";"q";enlist[`experimentName]!enlist "test"]
64+
```
65+
66+
Check that the model has been added to the modelStore
67+
68+
```
69+
q)modelStore
70+
registrationTime experimentName modelName uniqueID ..
71+
-----------------------------------------------------------------------------..
72+
2021.08.02D10:27:04.863096000 "test" "mymodel" 66f12a71-175b-cd56-7d0..
73+
```
74+
75+
Retrieve the model and model information based on the model name and version
76+
77+
```
78+
q).ml.registry.get.model[::;::;"mymodel";1 0]
79+
modelInfo| `major`description`experimentName`folderPath`registryPath`modelSto..
80+
model | {x+1}
81+
```
82+
83+
## Documentation
84+
85+
### Static Documentation
86+
87+
Further information on the breakdown of the API for interacting with the ML-Registry and extended examples can be found in [Registry API](api/setting.md) and [Registry Examples](examples/basic.md).
88+
89+
This provides users with:
90+
91+
1. A breakdown of the API for interacting with the ML-Registry
92+
2. Examples of interacting with a registry
93+
94+
# Testing
95+
96+
Unit tests are provided for testing the operation of this interface both as a local service. In order to facilitate this users must have embedPy or pykx installed alongside the following additional Python requirements, it is also advisable to have the python requirements_pinned.txt installed before running the below.
97+
98+
```
99+
$ pip install pyspark xgboost
100+
```
101+
102+
The local tests are run using a bespoke q script. The local tests can be run standalone using the instructions outlined below.
103+
104+
## Local testing
105+
106+
The below tests are ran from the `ml` directory and test results will output to console
107+
108+
```bash
109+
$ q ../test.q registry/tests/registry.t
110+
```
111+
112+
This should present a summary of results of the unit tests.

0 commit comments

Comments
 (0)