You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`<lambda_expression>`| The lambda expression or code snippet defining the function's behavior. |
32
+
|`DESC='<description>'`| Description of the UDF.|
33
+
|`<<input_param_names>`| A list of input parameter names. Separated by comma.|
34
+
|`<<input_param_types>`| A list of input parameter types. Separated by comma.|
35
+
|`<return_type>`| The return type of the function. |
36
+
|`LANGUAGE`| Specifies the language used to write the function. Available values: `python`. |
37
+
|`HANDLER = '<handler_name>'`| Specifies the name of the function's handler. |
38
+
|`ADDRESS = '<udf_server_address>'`| Specifies the address of the UDF server. |
39
+
18
40
## Examples
19
41
42
+
### Creating UDF with Lambda Expression
43
+
20
44
```sql
21
45
CREATEFUNCTIONa_plus_3AS (a) -> a+3;
22
46
@@ -53,3 +77,89 @@ DROP FUNCTION get_v2;
53
77
54
78
DROPTABLE json_table;
55
79
```
80
+
81
+
### Creating UDF with UDF Server (Python)
82
+
83
+
This example demonstrates how to enable and configure a UDF server in Python:
84
+
85
+
1. Enable UDF server support by adding the following parameters to the [query] section in the [databend-query.toml](https://github.com/datafuselabs/databend/blob/main/scripts/distribution/configs/databend-query.toml) configuration file.
86
+
87
+
```toml title='databend-query.toml'
88
+
[query]
89
+
...
90
+
enable_udf_server = true
91
+
# List the allowed UDF server addresses, separating multiple addresses with commas.
92
+
# For example, ['http://0.0.0.0:8815', 'http://example.com']
93
+
udf_server_allow_list = ['http://0.0.0.0:8815']
94
+
...
95
+
```
96
+
97
+
2. Define your function. This code defines and runs a UDF server in Python, which exposes a custom function *gcd* for calculating the greatest common divisor of two integers and allows remote execution of this function:
98
+
99
+
:::note
100
+
The SDK package is not yet available. Prior to its release, please download the 'udf.py' file from https://github.com/datafuselabs/databend/blob/main/tests/udf-server/udf.py and ensure it is saved in the same directory as this Python script. This step is essential for the code to function correctly.
101
+
:::
102
+
103
+
```python title='udf_server.py'
104
+
from udf import*
105
+
106
+
@udf(
107
+
input_types=["INT", "INT"],
108
+
result_type="INT",
109
+
skip_null=True,
110
+
)
111
+
defgcd(x: int, y: int) -> int:
112
+
while y !=0:
113
+
(x, y) = (y, x % y)
114
+
return x
115
+
116
+
if__name__=='__main__':
117
+
# create a UDF server listening at '0.0.0.0:8815'
118
+
server = UdfServer("0.0.0.0:8815")
119
+
# add defined functions
120
+
server.add_function(gcd)
121
+
# start the UDF server
122
+
server.serve()
123
+
```
124
+
125
+
`@udf` is a decorator used for defining UDFs in Databend, supporting the following parameters:
| input_types | A list of strings or Arrow data types that specify the input data types. |
130
+
| result_type | A string or an Arrow data type that specifies the return value type. |
131
+
| name | An optional string specifying the function name. If not provided, the original name will be used. |
132
+
| io_threads | Number of I/O threads used per data chunk for I/O bound functions. |
133
+
| skip_null | A boolean value specifying whether to skip NULL values. If set to True, NULL values will not be passed to the function, and the corresponding return value is set to NULL. Default is False. |
134
+
135
+
This table illustrates the correspondence between Databend data types and their corresponding Python equivalents:
136
+
137
+
| Databend Type | Python Type |
138
+
|-----------------------|-----------------------|
139
+
| BOOLEAN | bool |
140
+
| TINYINT (UNSIGNED) | int |
141
+
| SMALLINT (UNSIGNED) | int |
142
+
| INT (UNSIGNED) | int |
143
+
| BIGINT (UNSIGNED) | int |
144
+
| FLOAT | float |
145
+
| DOUBLE | float |
146
+
| DECIMAL | decimal.Decimal |
147
+
| DATE | datetime.date |
148
+
| TIMESTAMP | datetime.datetime |
149
+
| VARCHAR | str |
150
+
| VARIANT | any |
151
+
| MAP(K,V) | dict |
152
+
| ARRAY(T) | list[T]|
153
+
| TUPLE(T...) | tuple(T...) |
154
+
155
+
3. Run the Python file to start the UDF server:
156
+
157
+
```shell
158
+
python3 udf_server.py
159
+
```
160
+
161
+
4. Register the function *gcd* with the [CREATE FUNCTION](ddl-create-function.md) in Databend:
162
+
163
+
```sql
164
+
CREATEFUNCTIONgcd (INT, INT) RETURNS INT LANGUAGE python HANDLER ='gcd' ADDRESS ='http://0.0.0.0:8815';
import IndexOverviewList from '@site/src/components/IndexOverviewList';
5
+
6
+
## What are UDFs?
7
+
8
+
User-Defined Functions (UDFs) enable you to define their own custom operations to process data within Databend. They are typically written using lambda expressions or implemented via a UDF server with programming languages such as Python and are executed as part of Databend's query processing pipeline. Advantages of using UDFs include:
9
+
10
+
- Customized Data Transformations: UDFs empower you to perform data transformations that may not be achievable through built-in Databend functions alone. This customization is particularly valuable for handling unique data formats or business logic.
11
+
12
+
- Performance Optimization: UDFs provide the flexibility to define and fine-tune your own custom functions, enabling you to optimize data processing to meet precise performance requirements. This means you can tailor the code for maximum efficiency, ensuring that your data processing tasks run as efficiently as possible.
13
+
14
+
- Code Reusability: UDFs can be reused across multiple queries, saving time and effort in coding and maintaining data processing logic.
15
+
16
+
## Managing UDFs
17
+
18
+
To manage UDFs in Databend, use the following commands:
19
+
20
+
<IndexOverviewList />
21
+
22
+
## Usage Examples
23
+
24
+
This section demonstrates two UDF implementation methods within Databend: one by creating UDFs with lambda expressions and the other by utilizing UDF servers in conjunction with Python. For additional examples of defining UDFs in various programming languages, see [CREATE FUNCTION](ddl-create-function.md).
25
+
26
+
### UDF Implementation with Lambda Expression
27
+
28
+
This example implements a UDF named *a_plus_3* using a lambda expression:
29
+
30
+
```sql
31
+
CREATEFUNCTIONa_plus_3AS (a) -> a+3;
32
+
33
+
SELECT a_plus_3(2);
34
+
+---------+
35
+
| (2+3) |
36
+
+---------+
37
+
| 5 |
38
+
+---------+
39
+
```
40
+
41
+
### UDF Implementation via UDF Server
42
+
43
+
This example demonstrates how to enable and configure a UDF server in Python:
44
+
45
+
1. Enable UDF server support by adding the following parameters to the [query] section in the [databend-query.toml](https://github.com/datafuselabs/databend/blob/main/scripts/distribution/configs/databend-query.toml) configuration file.
46
+
47
+
```toml title='databend-query.toml'
48
+
[query]
49
+
...
50
+
enable_udf_server = true
51
+
# List the allowed UDF server addresses, separating multiple addresses with commas.
52
+
# For example, ['http://0.0.0.0:8815', 'http://example.com']
53
+
udf_server_allow_list = ['http://0.0.0.0:8815']
54
+
...
55
+
```
56
+
57
+
2. Define your function. This code defines and runs a UDF server in Python, which exposes a custom function *gcd* for calculating the greatest common divisor of two integers and allows remote execution of this function:
58
+
59
+
:::note
60
+
The SDK package is not yet available. Prior to its release, please download the 'udf.py' file from https://github.com/datafuselabs/databend/blob/main/tests/udf-server/udf.py and ensure it is saved in the same directory as this Python script. This step is essential for the code to function correctly.
61
+
:::
62
+
63
+
```python title='udf_server.py'
64
+
from udf import*
65
+
66
+
@udf(
67
+
input_types=["INT", "INT"],
68
+
result_type="INT",
69
+
skip_null=True,
70
+
)
71
+
defgcd(x: int, y: int) -> int:
72
+
while y !=0:
73
+
(x, y) = (y, x % y)
74
+
return x
75
+
76
+
if__name__=='__main__':
77
+
# create a UDF server listening at '0.0.0.0:8815'
78
+
server = UdfServer("0.0.0.0:8815")
79
+
# add defined functions
80
+
server.add_function(gcd)
81
+
# start the UDF server
82
+
server.serve()
83
+
```
84
+
85
+
`@udf` is a decorator used for defining UDFs in Databend, supporting the following parameters:
| input_types | A list of strings or Arrow data types that specify the input data types. |
90
+
| result_type | A string or an Arrow data type that specifies the return value type. |
91
+
| name | An optional string specifying the function name. If not provided, the original name will be used. |
92
+
| io_threads | Number of I/O threads used per data chunk for I/O bound functions. |
93
+
| skip_null | A boolean value specifying whether to skip NULL values. If set to True, NULL values will not be passed to the function, and the corresponding return value is set to NULL. Default is False. |
94
+
95
+
This table illustrates the correspondence between Databend data types and their corresponding Python equivalents:
96
+
97
+
| Databend Type | Python Type |
98
+
|-----------------------|-----------------------|
99
+
| BOOLEAN | bool |
100
+
| TINYINT (UNSIGNED) | int |
101
+
| SMALLINT (UNSIGNED) | int |
102
+
| INT (UNSIGNED) | int |
103
+
| BIGINT (UNSIGNED) | int |
104
+
| FLOAT | float |
105
+
| DOUBLE | float |
106
+
| DECIMAL | decimal.Decimal |
107
+
| DATE | datetime.date |
108
+
| TIMESTAMP | datetime.datetime |
109
+
| VARCHAR | str |
110
+
| VARIANT | any |
111
+
| MAP(K,V) | dict |
112
+
| ARRAY(T) | list[T]|
113
+
| TUPLE(T...) | tuple(T...) |
114
+
115
+
3. Run the Python file to start the UDF server:
116
+
117
+
```shell
118
+
python3 udf_server.py
119
+
```
120
+
121
+
4. Register the function *gcd* with the [CREATE FUNCTION](ddl-create-function.md) in Databend:
122
+
123
+
```sql
124
+
CREATEFUNCTIONgcd (INT, INT) RETURNS INT LANGUAGE python HANDLER ='gcd' ADDRESS ='http://0.0.0.0:8815';
0 commit comments