Skip to content

Commit 312c3c6

Browse files
committed
update streaming load docs and tests
1 parent 93ffeb0 commit 312c3c6

File tree

3 files changed

+18
-26
lines changed

3 files changed

+18
-26
lines changed

docs/doc/11-integrations/00-api/03-streaming-load.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,21 +14,21 @@ The Streaming Load API is used to read data from your local files and load it in
1414
To create a request with the Streaming Load API, follow the format below:
1515

1616
```bash
17-
curl -H "<parameter>:<value>" [-H "<parameter>:<value>"...] -F "upload=@<file_location>" [-F "upload=@<file_location>"] -XPUT http://<user_name>:[password]@<http_handler_host>:<http_handler_port>/v1/streaming_load
17+
curl -H "insert_sql:<value>" -F "upload=@<file_location>" [-F "upload=@<file_location>"] -XPUT http://<user_name>:[password]@<http_handler_host>:<http_handler_port>/v1/streaming_load
1818
```
19+
eg: `curl -H "insert_sql:insert into ontime_streaming_load file_format = (type = 'CSV' skip_header = 1 compression = 'bz2')" -F "upload=@/tmp/ontime_200.csv.bz2" -u root: -XPUT "http://localhost:127.0.0.1:8000/v1/streaming_load"`
1920
## Explaining Argument `-H`
2021

21-
The request usually includes many occurrences of the argument `-H` and each is followed by one of the following parameters to tell Databend how to handle the file you're loading data from. Please note that `insert_sql` is required, and other parameters are optional.
22+
The request usually includes many occurrences of the argument `-H` and each is followed by one of the following parameters to tell Databend how to handle the file you're loading data from. Please note that `insert_sql` is required.
2223

23-
| Parameter | Values | Supported Formats | Examples |
24-
|-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
25-
| insert_sql | [INSERT_statement] + format [file_format] | All | -H "insert_sql: insert into ontime format CSV" |
26-
| format_skip_header | Tells Databend how many lines at the beginning of the file to skip for header.<br /> 0 (default): No lines to skip;<br /> 1: Skip the first line;<br /> N: Skip the first N lines. | CSV / TSV / NDJSON | -H "format_skip_header: 1" |
27-
| format_compression | Tells Databend the compression format of the file.<br /> NONE (default): Do NOT decompress the file;<br /> AUTO: Automatically decompress the file by suffix;<br /> You can also use one of these values to explicitly specify the compression format: GZIP \ | BZ2 \| BROTLI \ | ZSTD \| DEFALTE \| RAW_DEFLATE. | CSV / TSV / NDJSON | -H "format_compression:auto" |
28-
| format_field_delimiter | Tells Databend the characters used in the file to separate fields.<br /> Default for CSV files: `,`.<br /> Default for TSV files: `\t`.<br /> Hive output files using [SOH control character (\x01)]( https://en.wikipedia.org/wiki/C0_and_C1_control_codes#SOH) as the field delimiter. | CSV / TSV | -H "format_field_delimiter:,". |
29-
| format_record_delimiter | Tells Databend the new line characters used in the file to separate records.<br /> Default: `\n`. | CSV / TSV | -H "format_recorder_delimiter:\n" |
30-
| format_quote | Tells Databend the quote characters for strings in CSV file.<br /> Default: ""(Double quotes). | CSV | |
24+
| Parameter | Values | Supported Formats | Examples |
25+
|-------------------------|-------------------------------------|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
26+
| insert_sql | [INSERT_statement] + [FILE_FORMAT] | All | -H "insert_sql: insert into ontime format CSV" | | CSV | |
3127

28+
29+
> FILE_FORMAT = ( TYPE = { CSV | TSV | NDJSON | PARQUET | XML} [ formatTypeOptions ] )
30+
>
31+
> The `formatTypeOptions` is same as [COPY_INTO](../../14-sql-commands/10-dml/dml-copy-into-table.md)'s `formatTypeOptions`.
3232
## Alternatives to Streaming Load API
3333

3434
The [COPY INTO](../../14-sql-commands/10-dml/dml-copy-into-table.md) command enables you to load data from files using insecure protocols, such as HTTP. This simplifies the data loading in some specific scenarios, for example, Databend is installed on-premises with MinIO. In such cases, you can load data from local files with the COPY INTO command.

tests/suites/1_stateful/01_load_v2/01_0000_streaming_load.result

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,6 @@
1010
199 2020.0 769
1111
--ndjson
1212
198 2020.0 767
13-
--csv using file_format
14-
199 2020.0 769
1513
--parquet less
1614
199 2020.0 769
1715
--parquet mismatch schema

tests/suites/1_stateful/01_load_v2/01_0000_streaming_load.sh

Lines changed: 8 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -38,43 +38,37 @@ fi
3838

3939
# load csv
4040
echo "--csv"
41-
curl -H "insert_sql:insert into ontime_streaming_load format Csv" -H "format_skip_header:1" -F "upload=@/tmp/ontime_200.csv" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
41+
curl -H "insert_sql:insert into ontime_streaming_load file_format = (type = 'CSV' skip_header = 1)" -F "upload=@/tmp/ontime_200.csv" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
4242
echo "select count(1), avg(Year), sum(DayOfWeek) from ontime_streaming_load;" | $MYSQL_CLIENT_CONNECT
4343
echo "truncate table ontime_streaming_load" | $MYSQL_CLIENT_CONNECT
4444

4545
echo "--csv.gz"
4646
# load csv gz
47-
curl -H "insert_sql:insert into ontime_streaming_load format Csv" -H "format_skip_header:1" -H "format_compression:gzip" -F "upload=@/tmp/ontime_200.csv.gz" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
47+
curl -H "insert_sql:insert into ontime_streaming_load file_format = (type = 'CSV' skip_header = 1 compression = 'gzip')" -F "upload=@/tmp/ontime_200.csv.gz" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
4848
echo "select count(1), avg(Year), sum(DayOfWeek) from ontime_streaming_load;" | $MYSQL_CLIENT_CONNECT
4949
echo "truncate table ontime_streaming_load" | $MYSQL_CLIENT_CONNECT
5050

5151
# load csv zstd
5252
echo "--csv.zstd"
53-
curl -H "insert_sql:insert into ontime_streaming_load format Csv" -H "format_skip_header:1" -H "format_compression:zstd" -F "upload=@/tmp/ontime_200.csv.zst" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
53+
curl -H "insert_sql:insert into ontime_streaming_load file_format = (type = 'CSV' skip_header = 1 compression = 'zstd')" -F "upload=@/tmp/ontime_200.csv.zst" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
5454
echo "select count(1), avg(Year), sum(DayOfWeek) from ontime_streaming_load;" | $MYSQL_CLIENT_CONNECT
5555
echo "truncate table ontime_streaming_load" | $MYSQL_CLIENT_CONNECT
5656

5757
# load csv bz2
5858
echo "--csv.bz2"
59-
curl -H "insert_sql:insert into ontime_streaming_load format Csv" -H "format_skip_header:1" -H "format_compression:bz2" -F "upload=@/tmp/ontime_200.csv.bz2" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
59+
curl -H "insert_sql:insert into ontime_streaming_load file_format = (type = 'CSV' skip_header = 1 compression = 'bz2')" -F "upload=@/tmp/ontime_200.csv.bz2" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
6060
echo "select count(1), avg(Year), sum(DayOfWeek) from ontime_streaming_load;" | $MYSQL_CLIENT_CONNECT
6161
echo "truncate table ontime_streaming_load" | $MYSQL_CLIENT_CONNECT
6262

6363
# load parquet
6464
echo "--parquet"
65-
curl -H "insert_sql:insert into ontime_streaming_load format Parquet" -F "upload=@/tmp/ontime_200.parquet" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
65+
curl -H "insert_sql:insert into ontime_streaming_load file_format = (type = 'Parquet')" -F "upload=@/tmp/ontime_200.parquet" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
6666
echo "select count(1), avg(Year), sum(DayOfWeek) from ontime_streaming_load;" | $MYSQL_CLIENT_CONNECT
6767
echo "truncate table ontime_streaming_load" | $MYSQL_CLIENT_CONNECT
6868

6969
# load ndjson
7070
echo "--ndjson"
71-
curl -H "insert_sql:insert into ontime_streaming_load format NdJson" -H "format_skip_header:1" -F "upload=@/tmp/ontime_200.ndjson" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
72-
echo "select count(1), avg(Year), sum(DayOfWeek) from ontime_streaming_load;" | $MYSQL_CLIENT_CONNECT
73-
echo "truncate table ontime_streaming_load" | $MYSQL_CLIENT_CONNECT
74-
75-
# load csv using file_format syntax
76-
echo "--csv using file_format"
77-
curl -H "insert_sql:insert into ontime_streaming_load file_format = (type = 'CSV' skip_header = 1)" -F "upload=@/tmp/ontime_200.csv" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
71+
curl -H "insert_sql:insert into ontime_streaming_load file_format = (type = 'NdJson' skip_header = 1)" -F "upload=@/tmp/ontime_200.ndjson" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
7872
echo "select count(1), avg(Year), sum(DayOfWeek) from ontime_streaming_load;" | $MYSQL_CLIENT_CONNECT
7973
echo "truncate table ontime_streaming_load" | $MYSQL_CLIENT_CONNECT
8074

@@ -90,13 +84,13 @@ echo 'CREATE TABLE ontime_less
9084

9185

9286
echo "--parquet less"
93-
curl -s -H "insert_sql:insert into ontime_less format Parquet" -F "upload=@/tmp/ontime_200.parquet" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
87+
curl -s -H "insert_sql:insert into ontime_less file_format = (type = 'Parquet')" -F "upload=@/tmp/ontime_200.parquet" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" > /dev/null 2>&1
9488
echo "select count(1), avg(Year), sum(DayOfWeek) from ontime_less;" | $MYSQL_CLIENT_CONNECT
9589

9690
# load parquet with mismatch schema
9791
echo "--parquet mismatch schema"
9892
cat $CURDIR/../ddl/ontime.sql | sed 's/ontime/ontime_test_mismatch/g' | sed 's/DATE/VARCHAR/g' | $MYSQL_CLIENT_CONNECT
99-
curl -s -H "insert_sql:insert into ontime_test_mismatch format Parquet" -F "upload=@/tmp/ontime_200.parquet" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" | grep -c 'parquet schema mismatch'
93+
curl -s -H "insert_sql:insert into ontime_test_mismatch file_format = (type = 'Parquet')" -F "upload=@/tmp/ontime_200.parquet" -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" | grep -c 'parquet schema mismatch'
10094

10195

10296
echo "drop table ontime_streaming_load;" | $MYSQL_CLIENT_CONNECT

0 commit comments

Comments
 (0)