Skip to content

Commit 05e0600

Browse files
authored
Initial version (#1)
- Use wrapper functions to define the scope of a UDF and to extract the source code of the UDF - Isolate the execution of the UDF from the remaining python state (use new globals and locals) - Support SET, SCALAR, RETURNS and EMIT - Pandas Dataframe support - Support for Connections and Metadata - Input and output data type checks - Support multiple groups and multiple rows per group - Poetry for managing dependencies and build - Generate setup.py via dephell to allow pip install from Github.com - Github Actions workflow for running tests and checking setup.py
1 parent 1b450a5 commit 05e0600

26 files changed

+1520
-0
lines changed

.github/workflows/check_setup_py.yaml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: Check if setup.py is up to date
2+
3+
on: [push, pull_request]
4+
5+
jobs:
6+
check_setup_py:
7+
strategy:
8+
fail-fast: false
9+
matrix:
10+
python-version: [3.6, 3.7, 3.8]
11+
runs-on: ubuntu-latest
12+
13+
steps:
14+
- uses: actions/checkout@v2
15+
- uses: actions/setup-python@v2
16+
with:
17+
python-version: ${{ matrix.python-version }}
18+
- name: Run convert
19+
uses: dephell/dephell_action@master
20+
with:
21+
dephell-env: convert
22+
- name: Show changes on working copy
23+
run: git status --porcelain=v1 -uno
24+
- name: Show diff on working copy
25+
run: git diff
26+
- name: Check if setup.py changed
27+
run: |
28+
[ -z "$(git status --porcelain=v1 -uno 2>/dev/null)" ]

.github/workflows/pytest.yaml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
name: Run pytest
2+
3+
on: [push, pull_request]
4+
5+
jobs:
6+
pytest:
7+
strategy:
8+
fail-fast: false
9+
matrix:
10+
python-version: [3.6, 3.7, 3.8]
11+
runs-on: ubuntu-latest
12+
13+
steps:
14+
- uses: actions/checkout@v2
15+
- uses: actions/setup-python@v2
16+
with:
17+
python-version: ${{ matrix.python-version }}
18+
- name: Run pytest
19+
uses: dephell/dephell_action@master
20+
with:
21+
dephell-env: pytest

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,3 +127,6 @@ dmypy.json
127127

128128
# Pyre type checker
129129
.pyre/
130+
131+
# Poetry
132+
poetry.lock

README.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# UDF Mock for Python
2+
3+
This projects provides a mock runner for Python3 UDFs which allows you
4+
to test your UDFs locally without a database.
5+
6+
**Note:** This project is in a very early development phase.
7+
Please, be aware that the behavior of the mock runner doesn't perfectly
8+
reflect the behaviors of the UDFs inside the database and that the interface still can change.
9+
In any case, you need to verify your UDFs with integrations test inside the database.
10+
11+
## Getting started
12+
13+
### Installing via pip
14+
```
15+
pip install git+https://github.com/exasol/exasol-udf-mock-python.git@master
16+
```
17+
18+
### Installing via poetry
19+
Add it to your `tool.poetry.dependencies` or `tool.poetry.dev-dependencies`
20+
21+
```
22+
[tool.poetry.dev-dependencies]
23+
exasol-udf-mock-python = { git = "https://github.com/exasol/exasol-udf-mock-python.git", branch = "master" }
24+
...
25+
```
26+
27+
### How to use the Mock
28+
29+
The mock runner runs your python UDF in a python environment in which
30+
no external variables, functions or classes are visble.
31+
This means in practice, you can only use things you defined inside your
32+
UDF and what gets provided by the UDF frameworks,
33+
such as exa.meta and the context for the run function.
34+
This includes imports, variables, functions, classes and so on.
35+
36+
You define a UDF in this framework within in a wrapper function.
37+
This wrapper function then contains all necessary imports, functions,
38+
variables and classes.
39+
You then handover the wrapper function to the `UDFMockExecutor`
40+
which runs the UDF inside if the isolated python environment.
41+
The following example shows, how you use this framework:
42+
The following example shows the general setup for a test with the Mock:
43+
44+
```
45+
def udf_wrapper():
46+
47+
def run(ctx):
48+
return ctx.t1+1, ctx.t2+1.1, ctx.t3+"1"
49+
50+
executor = UDFMockExecutor()
51+
meta = MockMetaData(
52+
script_code_wrapper_function=udf_wrapper,
53+
input_type="SCALAR",
54+
input_columns=[Column("t1", int, "INTEGER"),
55+
Column("t2", float, "FLOAT"),
56+
Column("t3", str, "VARCHAR(20000)")],
57+
output_type="RETURNS",
58+
output_columns=[Column("t1", int, "INTEGER"),
59+
Column("t2", float, "FLOAT"),
60+
Column("t3", str, "VARCHAR(20000)")]
61+
)
62+
exa = MockExaEnvironment(meta)
63+
result = executor.run([Group([(1,1.0,"1"), (5,5.0,"5"), (6,6.0,"6")])], exa)
64+
```
65+
66+
**Checkout the [tests](tests) for more information about, how to use the Mock.**
67+
68+
## Limitations or missing features
69+
70+
Some of the following limitations are fundamental, other are missing
71+
feature and might get removed by later releases:
72+
73+
- Data type checks for outputs are more strict as in real UDFs
74+
- No support for Import or Export Specification or Virtual Schema adapter
75+
- No support for dynamic input and output parameters
76+
- No support for exa.import_script
77+
- No BucketFS
78+
- Execution is not isolated in a container
79+
- Can access and manipulate the file system of the system running the Mock
80+
- UDF inside of the database only can write /tmp to tmp and
81+
only see the file system of the script-language container and the mounted bucketfs
82+
- Can use all python package available in the system running the Mock
83+
- If you use package which are currently not available in the script-language containers,
84+
you need create your own container for testing inside of the database
85+
- Does not emulate the ressource limitations which get a applied in the database
86+
- Only one instance of the UDF gets executed
87+
- No support for Python2, because Python2 is officially End of Life

README.rst

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
2+
UDF Mock for Python
3+
===================
4+
5+
This projects provides a mock runner for Python3 UDFs which allows you
6+
to test your UDFs locally without a database.
7+
8+
**Note:** This project is in a very early development phase.
9+
Please, be aware that the behavior of the mock runner doesn't perfectly
10+
reflect the behaviors of the UDFs inside the database and that the interface still can change.
11+
In any case, you need to verify your UDFs with integrations test inside the database.
12+
13+
Getting started
14+
---------------
15+
16+
Installing via pip
17+
^^^^^^^^^^^^^^^^^^
18+
19+
.. code-block::
20+
21+
pip install git+https://github.com/exasol/exasol-udf-mock-python.git@master
22+
23+
Installing via poetry
24+
^^^^^^^^^^^^^^^^^^^^^
25+
26+
Add it to your ``tool.poetry.dependencies`` or ``tool.poetry.dev-dependencies``
27+
28+
.. code-block::
29+
30+
[tool.poetry.dev-dependencies]
31+
exasol-udf-mock-python = { git = "https://github.com/exasol/exasol-udf-mock-python.git", branch = "master" }
32+
...
33+
34+
How to use the Mock
35+
^^^^^^^^^^^^^^^^^^^
36+
37+
The mock runner runs your python UDF in a python environment in which
38+
no external variables, functions or classes are visble.
39+
This means in practice, you can only use things you defined inside your
40+
UDF and what gets provided by the UDF frameworks,
41+
such as exa.meta and the context for the run function.
42+
This includes imports, variables, functions, classes and so on.
43+
44+
You define a UDF in this framework within in a wrapper function.
45+
This wrapper function then contains all necessary imports, functions,
46+
variables and classes.
47+
You then handover the wrapper function to the ``UDFMockExecutor``
48+
which runs the UDF inside if the isolated python environment.
49+
The following example shows, how you use this framework:
50+
The following example shows the general setup for a test with the Mock:
51+
52+
.. code-block::
53+
54+
def udf_wrapper():
55+
56+
def run(ctx):
57+
return ctx.t1+1, ctx.t2+1.1, ctx.t3+"1"
58+
59+
executor = UDFMockExecutor()
60+
meta = MockMetaData(
61+
script_code_wrapper_function=udf_wrapper,
62+
input_type="SCALAR",
63+
input_columns=[Column("t1", int, "INTEGER"),
64+
Column("t2", float, "FLOAT"),
65+
Column("t3", str, "VARCHAR(20000)")],
66+
output_type="RETURNS",
67+
output_columns=[Column("t1", int, "INTEGER"),
68+
Column("t2", float, "FLOAT"),
69+
Column("t3", str, "VARCHAR(20000)")]
70+
)
71+
exa = MockExaEnvironment(meta)
72+
result = executor.run([Group([(1,1.0,"1"), (5,5.0,"5"), (6,6.0,"6")])], exa)
73+
74+
**Checkout the `tests <tests>`_ for more information about, how to use the Mock.**
75+
76+
Limitations or missing features
77+
-------------------------------
78+
79+
Some of the following limitations are fundamental, other are missing
80+
feature and might get removed by later releases:
81+
82+
83+
* Data type checks for outputs are more strict as in real UDFs
84+
* No support for Import or Export Specification or Virtual Schema adapter
85+
* No support for dynamic input and output parameters
86+
* No support for exa.import_script
87+
* No BucketFS
88+
* Execution is not isolated in a container
89+
90+
* Can access and manipulate the file system of the system running the Mock
91+
92+
* UDF inside of the database only can write /tmp to tmp and
93+
only see the file system of the script-language container and the mounted bucketfs
94+
95+
* Can use all python package available in the system running the Mock
96+
97+
* If you use package which are currently not available in the script-language containers,
98+
you need create your own container for testing inside of the database
99+
100+
* Does not emulate the ressource limitations which get a applied in the database
101+
102+
* Only one instance of the UDF gets executed
103+
* No support for Python2, because Python2 is officially End of Life

exasol_udf_mock_python/__init__.py

Whitespace-only changes.

exasol_udf_mock_python/column.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
class Column:
2+
def __init__(self, name, type, sql_type, precision=None, scale=None, length=None):
3+
self.name = name
4+
self.type = type
5+
self.sql_type = sql_type
6+
self.precision = precision
7+
self.scale = scale
8+
self.length = length
9+
10+
def __repr__(self):
11+
return str(self.__class__) + ": " + str(self.__dict__)

exasol_udf_mock_python/connection.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
from enum import Enum
2+
3+
4+
class ConnectionType(Enum):
5+
PASSWORD = 1
6+
7+
8+
class Connection:
9+
def __init__(self, address: str, user: str = None, password: str = None,
10+
type: ConnectionType = ConnectionType.PASSWORD):
11+
self.type = type
12+
self.password = password
13+
self.user = user
14+
self.address = address
15+
16+
def __repr__(self):
17+
return str(self.__class__) + ": " + str(self.__dict__)

exasol_udf_mock_python/group.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
from typing import List, Tuple
2+
3+
4+
class Group:
5+
"""
6+
Reperesents a Group (SET Function) or a Batch (Scalar Function) of rows
7+
"""
8+
9+
def __init__(self, rows:List[Tuple]):
10+
self.rows = rows
11+
12+
def __repr__(self):
13+
return str(self.__class__) + ": " + str(self.__dict__)
14+
15+
def __eq__(self, other):
16+
return self.__dict__ == other.__dict__

0 commit comments

Comments
 (0)