Skip to content

Commit fdbbcea

Browse files
authored
docs: udf (#12938)
* added * added * added * Update ddl-create-function.md * format
1 parent fa7be80 commit fdbbcea

File tree

5 files changed

+262
-20
lines changed

5 files changed

+262
-20
lines changed
Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,3 @@
11
{
2-
"label": "User-Defined Function",
3-
"link": {
4-
"type": "generated-index",
5-
"slug": "/sql-commands/ddl/udf"
6-
}
2+
"label": "User-Defined Function"
73
}

docs/doc/14-sql-commands/00-ddl/50-udf/ddl-alter-function.md

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,33 @@ title: ALTER FUNCTION
33
description:
44
Modifies the properties for an existing user-defined function.
55
---
6+
import FunctionDescription from '@site/src/components/FunctionDescription';
7+
8+
<FunctionDescription description="Introduced or updated: v1.2.116"/>
9+
10+
Alters a user-defined function.
611

712
## Syntax
813

914
```sql
10-
CREATE FUNCTION <name> AS ([ argname ]) -> '<function_definition>'
15+
-- Alter UDF created with lambda expression
16+
ALTER FUNCTION [IF NOT EXISTS] <function_name>
17+
AS (<input_param_names>) -> <lambda_expression>
18+
[DESC='<description>']
19+
20+
-- Alter UDF created with UDF server
21+
ALTER FUNCTION [IF NOT EXISTS] <function_name>
22+
AS (<input_param_types>) RETURNS <return_type> LANGUAGE <language_name>
23+
HANDLER = '<handler_name>' ADDRESS = '<udf_server_address>'
24+
[DESC='<description>']
1125
```
1226

1327
## Examples
1428

1529
```sql
30+
CREATE FUNCTION a_plus_3 AS (a) -> a+3+3;
1631
ALTER FUNCTION a_plus_3 AS (a) -> a+3;
1732

18-
SELECT a_plus_3(2);
19-
+---------+
20-
| (2 + 3) |
21-
+---------+
22-
| 5 |
23-
+---------+
24-
```
33+
CREATE FUNCTION gcd (INT, INT) RETURNS INT LANGUAGE python HANDLER = 'gcd' ADDRESS = 'http://0.0.0.0:8815';
34+
ALTER FUNCTION gcd (INT, INT) RETURNS INT LANGUAGE python HANDLER = 'gcd_new' ADDRESS = 'http://0.0.0.0:8815';
35+
```

docs/doc/14-sql-commands/00-ddl/50-udf/ddl-create-function.md

Lines changed: 114 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,44 @@ title: CREATE FUNCTION
33
description:
44
Create a new user-defined scalar function.
55
---
6+
import FunctionDescription from '@site/src/components/FunctionDescription';
67

8+
<FunctionDescription description="Introduced or updated: v1.2.116"/>
79

8-
## CREATE FUNCTION
9-
10-
Creates a new UDF (user-defined function), the UDF can contain an SQL expression.
10+
Creates a user-defined function.
1111

1212
## Syntax
1313

1414
```sql
15-
CREATE FUNCTION [ IF NOT EXISTS ] <name> AS ([ argname ]) -> '<function_definition>'
15+
-- Create with lambda expression
16+
CREATE FUNCTION [IF NOT EXISTS] <function_name>
17+
AS (<input_param_names>) -> <lambda_expression>
18+
[DESC='<description>']
19+
20+
21+
-- Create with UDF server
22+
CREATE FUNCTION [IF NOT EXISTS] <function_name>
23+
AS (<input_param_types>) RETURNS <return_type> LANGUAGE <language_name>
24+
HANDLER = '<handler_name>' ADDRESS = '<udf_server_address>'
25+
[DESC='<description>']
1626
```
1727

28+
| Parameter | Description |
29+
|-----------------------|---------------------------------------------------------------------------------------------------|
30+
| `<function_name>` | The name of the function. |
31+
| `<lambda_expression>` | The lambda expression or code snippet defining the function's behavior. |
32+
| `DESC='<description>'` | Description of the UDF.|
33+
| `<<input_param_names>`| A list of input parameter names. Separated by comma.|
34+
| `<<input_param_types>`| A list of input parameter types. Separated by comma.|
35+
| `<return_type>` | The return type of the function. |
36+
| `LANGUAGE` | Specifies the language used to write the function. Available values: `python`. |
37+
| `HANDLER = '<handler_name>'` | Specifies the name of the function's handler. |
38+
| `ADDRESS = '<udf_server_address>'` | Specifies the address of the UDF server. |
39+
1840
## Examples
1941

42+
### Creating UDF with Lambda Expression
43+
2044
```sql
2145
CREATE FUNCTION a_plus_3 AS (a) -> a+3;
2246

@@ -53,3 +77,89 @@ DROP FUNCTION get_v2;
5377

5478
DROP TABLE json_table;
5579
```
80+
81+
### Creating UDF with UDF Server (Python)
82+
83+
This example demonstrates how to enable and configure a UDF server in Python:
84+
85+
1. Enable UDF server support by adding the following parameters to the [query] section in the [databend-query.toml](https://github.com/datafuselabs/databend/blob/main/scripts/distribution/configs/databend-query.toml) configuration file.
86+
87+
```toml title='databend-query.toml'
88+
[query]
89+
...
90+
enable_udf_server = true
91+
# List the allowed UDF server addresses, separating multiple addresses with commas.
92+
# For example, ['http://0.0.0.0:8815', 'http://example.com']
93+
udf_server_allow_list = ['http://0.0.0.0:8815']
94+
...
95+
```
96+
97+
2. Define your function. This code defines and runs a UDF server in Python, which exposes a custom function *gcd* for calculating the greatest common divisor of two integers and allows remote execution of this function:
98+
99+
:::note
100+
The SDK package is not yet available. Prior to its release, please download the 'udf.py' file from https://github.com/datafuselabs/databend/blob/main/tests/udf-server/udf.py and ensure it is saved in the same directory as this Python script. This step is essential for the code to function correctly.
101+
:::
102+
103+
```python title='udf_server.py'
104+
from udf import *
105+
106+
@udf(
107+
input_types=["INT", "INT"],
108+
result_type="INT",
109+
skip_null=True,
110+
)
111+
def gcd(x: int, y: int) -> int:
112+
while y != 0:
113+
(x, y) = (y, x % y)
114+
return x
115+
116+
if __name__ == '__main__':
117+
# create a UDF server listening at '0.0.0.0:8815'
118+
server = UdfServer("0.0.0.0:8815")
119+
# add defined functions
120+
server.add_function(gcd)
121+
# start the UDF server
122+
server.serve()
123+
```
124+
125+
`@udf` is a decorator used for defining UDFs in Databend, supporting the following parameters:
126+
127+
| Parameter | Description |
128+
|--------------|-----------------------------------------------------------------------------------------------------|
129+
| input_types | A list of strings or Arrow data types that specify the input data types. |
130+
| result_type | A string or an Arrow data type that specifies the return value type. |
131+
| name | An optional string specifying the function name. If not provided, the original name will be used. |
132+
| io_threads | Number of I/O threads used per data chunk for I/O bound functions. |
133+
| skip_null | A boolean value specifying whether to skip NULL values. If set to True, NULL values will not be passed to the function, and the corresponding return value is set to NULL. Default is False. |
134+
135+
This table illustrates the correspondence between Databend data types and their corresponding Python equivalents:
136+
137+
| Databend Type | Python Type |
138+
|-----------------------|-----------------------|
139+
| BOOLEAN | bool |
140+
| TINYINT (UNSIGNED) | int |
141+
| SMALLINT (UNSIGNED) | int |
142+
| INT (UNSIGNED) | int |
143+
| BIGINT (UNSIGNED) | int |
144+
| FLOAT | float |
145+
| DOUBLE | float |
146+
| DECIMAL | decimal.Decimal |
147+
| DATE | datetime.date |
148+
| TIMESTAMP | datetime.datetime |
149+
| VARCHAR | str |
150+
| VARIANT | any |
151+
| MAP(K,V) | dict |
152+
| ARRAY(T) | list[T] |
153+
| TUPLE(T...) | tuple(T...) |
154+
155+
3. Run the Python file to start the UDF server:
156+
157+
```shell
158+
python3 udf_server.py
159+
```
160+
161+
4. Register the function *gcd* with the [CREATE FUNCTION](ddl-create-function.md) in Databend:
162+
163+
```sql
164+
CREATE FUNCTION gcd (INT, INT) RETURNS INT LANGUAGE python HANDLER = 'gcd' ADDRESS = 'http://0.0.0.0:8815'
165+
```

docs/doc/14-sql-commands/00-ddl/50-udf/ddl-drop-function.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ description:
44
Drop an existing user-defined function.
55
---
66

7-
Drop an existing user-defined function.
7+
Drops a user-defined function.
88

99
## Syntax
1010

1111
```sql
12-
DROP FUNCTION [IF EXISTS] <name>
12+
DROP FUNCTION [IF EXISTS] <function_name>
1313
```
1414

1515
## Examples
@@ -19,4 +19,4 @@ DROP FUNCTION a_plus_3;
1919

2020
SELECT a_plus_3(2);
2121
ERROR 1105 (HY000): Code: 2602, Text = Unknown Function a_plus_3 (while in analyze select projection).
22-
```
22+
```
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
---
2+
title: User-Defined Function
3+
---
4+
import IndexOverviewList from '@site/src/components/IndexOverviewList';
5+
6+
## What are UDFs?
7+
8+
User-Defined Functions (UDFs) enable you to define their own custom operations to process data within Databend. They are typically written using lambda expressions or implemented via a UDF server with programming languages such as Python and are executed as part of Databend's query processing pipeline. Advantages of using UDFs include:
9+
10+
- Customized Data Transformations: UDFs empower you to perform data transformations that may not be achievable through built-in Databend functions alone. This customization is particularly valuable for handling unique data formats or business logic.
11+
12+
- Performance Optimization: UDFs provide the flexibility to define and fine-tune your own custom functions, enabling you to optimize data processing to meet precise performance requirements. This means you can tailor the code for maximum efficiency, ensuring that your data processing tasks run as efficiently as possible.
13+
14+
- Code Reusability: UDFs can be reused across multiple queries, saving time and effort in coding and maintaining data processing logic.
15+
16+
## Managing UDFs
17+
18+
To manage UDFs in Databend, use the following commands:
19+
20+
<IndexOverviewList />
21+
22+
## Usage Examples
23+
24+
This section demonstrates two UDF implementation methods within Databend: one by creating UDFs with lambda expressions and the other by utilizing UDF servers in conjunction with Python. For additional examples of defining UDFs in various programming languages, see [CREATE FUNCTION](ddl-create-function.md).
25+
26+
### UDF Implementation with Lambda Expression
27+
28+
This example implements a UDF named *a_plus_3* using a lambda expression:
29+
30+
```sql
31+
CREATE FUNCTION a_plus_3 AS (a) -> a+3;
32+
33+
SELECT a_plus_3(2);
34+
+---------+
35+
| (2 + 3) |
36+
+---------+
37+
| 5 |
38+
+---------+
39+
```
40+
41+
### UDF Implementation via UDF Server
42+
43+
This example demonstrates how to enable and configure a UDF server in Python:
44+
45+
1. Enable UDF server support by adding the following parameters to the [query] section in the [databend-query.toml](https://github.com/datafuselabs/databend/blob/main/scripts/distribution/configs/databend-query.toml) configuration file.
46+
47+
```toml title='databend-query.toml'
48+
[query]
49+
...
50+
enable_udf_server = true
51+
# List the allowed UDF server addresses, separating multiple addresses with commas.
52+
# For example, ['http://0.0.0.0:8815', 'http://example.com']
53+
udf_server_allow_list = ['http://0.0.0.0:8815']
54+
...
55+
```
56+
57+
2. Define your function. This code defines and runs a UDF server in Python, which exposes a custom function *gcd* for calculating the greatest common divisor of two integers and allows remote execution of this function:
58+
59+
:::note
60+
The SDK package is not yet available. Prior to its release, please download the 'udf.py' file from https://github.com/datafuselabs/databend/blob/main/tests/udf-server/udf.py and ensure it is saved in the same directory as this Python script. This step is essential for the code to function correctly.
61+
:::
62+
63+
```python title='udf_server.py'
64+
from udf import *
65+
66+
@udf(
67+
input_types=["INT", "INT"],
68+
result_type="INT",
69+
skip_null=True,
70+
)
71+
def gcd(x: int, y: int) -> int:
72+
while y != 0:
73+
(x, y) = (y, x % y)
74+
return x
75+
76+
if __name__ == '__main__':
77+
# create a UDF server listening at '0.0.0.0:8815'
78+
server = UdfServer("0.0.0.0:8815")
79+
# add defined functions
80+
server.add_function(gcd)
81+
# start the UDF server
82+
server.serve()
83+
```
84+
85+
`@udf` is a decorator used for defining UDFs in Databend, supporting the following parameters:
86+
87+
| Parameter | Description |
88+
|--------------|-----------------------------------------------------------------------------------------------------|
89+
| input_types | A list of strings or Arrow data types that specify the input data types. |
90+
| result_type | A string or an Arrow data type that specifies the return value type. |
91+
| name | An optional string specifying the function name. If not provided, the original name will be used. |
92+
| io_threads | Number of I/O threads used per data chunk for I/O bound functions. |
93+
| skip_null | A boolean value specifying whether to skip NULL values. If set to True, NULL values will not be passed to the function, and the corresponding return value is set to NULL. Default is False. |
94+
95+
This table illustrates the correspondence between Databend data types and their corresponding Python equivalents:
96+
97+
| Databend Type | Python Type |
98+
|-----------------------|-----------------------|
99+
| BOOLEAN | bool |
100+
| TINYINT (UNSIGNED) | int |
101+
| SMALLINT (UNSIGNED) | int |
102+
| INT (UNSIGNED) | int |
103+
| BIGINT (UNSIGNED) | int |
104+
| FLOAT | float |
105+
| DOUBLE | float |
106+
| DECIMAL | decimal.Decimal |
107+
| DATE | datetime.date |
108+
| TIMESTAMP | datetime.datetime |
109+
| VARCHAR | str |
110+
| VARIANT | any |
111+
| MAP(K,V) | dict |
112+
| ARRAY(T) | list[T] |
113+
| TUPLE(T...) | tuple(T...) |
114+
115+
3. Run the Python file to start the UDF server:
116+
117+
```shell
118+
python3 udf_server.py
119+
```
120+
121+
4. Register the function *gcd* with the [CREATE FUNCTION](ddl-create-function.md) in Databend:
122+
123+
```sql
124+
CREATE FUNCTION gcd (INT, INT) RETURNS INT LANGUAGE python HANDLER = 'gcd' ADDRESS = 'http://0.0.0.0:8815'
125+
```

0 commit comments

Comments
 (0)