Skip to content

Improve performance of trim for string view (10%) #12395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
736eb11
draft.
Rachelint Sep 9, 2024
a0da2d0
add unit tests for xTrim.
Rachelint Sep 9, 2024
3c8b035
fix fmt.
Rachelint Sep 9, 2024
06d104d
tmp copy for ci.
Rachelint Sep 9, 2024
48cb4db
move `make_and_append_view` to common.
Rachelint Sep 11, 2024
863e9b7
fix sting view trim about the process of empty string.
Rachelint Sep 11, 2024
36a8125
fix compile.
Rachelint Sep 11, 2024
aa2c131
eliminate some repeated codes.
Rachelint Sep 11, 2024
e3e9b53
add sql test case about string view trim.
Rachelint Sep 17, 2024
dbd0f25
Merge branch 'main' into string-view-trim
Rachelint Sep 17, 2024
6d5660f
remove unused imports.
Rachelint Sep 17, 2024
65ef988
Merge branch 'main' into string-view-trim
Rachelint Sep 21, 2024
4129a43
Merge branch 'main' into string-view-trim
Rachelint Sep 22, 2024
522c87f
fix tests.
Rachelint Sep 22, 2024
840ec46
remove stale file.
Rachelint Sep 22, 2024
307850a
Merge remote-tracking branch 'apache/main' into string-view-trim
alamb Sep 23, 2024
064450f
Avoid unecessary unsafe
alamb Sep 23, 2024
c2510de
add unit test cases with a unlined string view output.
Rachelint Sep 23, 2024
38790b2
fix tests.
Rachelint Sep 23, 2024
20197d9
improve comments.
Rachelint Sep 23, 2024
2112bc5
add todo and the related issue.
Rachelint Sep 23, 2024
790f7a9
use the safe way to get `start_offset` after trimming.
Rachelint Sep 25, 2024
f817462
fix comments.
Rachelint Sep 25, 2024
148a991
Merge branch 'main' into string-view-trim
Rachelint Sep 25, 2024
f9c1543
Remove redundant test
alamb Sep 25, 2024
e39f916
Merge remote-tracking branch 'apache/main' into string-view-trim
alamb Sep 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 151 additions & 1 deletion datafusion/functions/src/string/btrim.rs
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,11 @@ impl ScalarUDFImpl for BTrimFunc {
}

fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
utf8_to_str_type(&arg_types[0], "btrim")
if arg_types[0] == DataType::Utf8View {
Ok(DataType::Utf8View)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Also eventually it would also be possible to return Utf8View when the input was Utf8 and save a copy as well

} else {
utf8_to_str_type(&arg_types[0], "btrim")
}
}

fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
Expand All @@ -106,3 +110,149 @@ impl ScalarUDFImpl for BTrimFunc {
&self.aliases
}
}

#[cfg(test)]
mod tests {
use arrow::array::{Array, StringArray, StringViewArray};
use arrow::datatypes::DataType::{Utf8, Utf8View};

use datafusion_common::{Result, ScalarValue};
use datafusion_expr::{ColumnarValue, ScalarUDFImpl};

use crate::string::btrim::BTrimFunc;
use crate::utils::test::test_function;

#[test]
fn test_functions() {
// String view cases for checking normal logic
test_function!(
BTrimFunc::new(),
&[ColumnarValue::Scalar(ScalarValue::Utf8View(Some(
String::from("alphabet ")
))),],
Ok(Some("alphabet")),
&str,
Utf8View,
StringViewArray
);
test_function!(
BTrimFunc::new(),
&[ColumnarValue::Scalar(ScalarValue::Utf8View(Some(
String::from(" alphabet ")
))),],
Ok(Some("alphabet")),
&str,
Utf8View,
StringViewArray
);
test_function!(
BTrimFunc::new(),
&[
ColumnarValue::Scalar(ScalarValue::Utf8View(Some(String::from(
"alphabet"
)))),
ColumnarValue::Scalar(ScalarValue::Utf8View(Some(String::from("t")))),
],
Ok(Some("alphabe")),
&str,
Utf8View,
StringViewArray
);
test_function!(
BTrimFunc::new(),
&[
ColumnarValue::Scalar(ScalarValue::Utf8View(Some(String::from(
"alphabet"
)))),
ColumnarValue::Scalar(ScalarValue::Utf8View(Some(String::from(
"alphabe"
)))),
],
Ok(Some("t")),
&str,
Utf8View,
StringViewArray
);
test_function!(
BTrimFunc::new(),
&[
ColumnarValue::Scalar(ScalarValue::Utf8View(Some(String::from(
"alphabet"
)))),
ColumnarValue::Scalar(ScalarValue::Utf8View(None)),
],
Ok(None),
&str,
Utf8View,
StringViewArray
);
// Special string view case for checking unlined output(len > 12)
test_function!(
BTrimFunc::new(),
&[
ColumnarValue::Scalar(ScalarValue::Utf8View(Some(String::from(
"xxxalphabetalphabetxxx"
)))),
ColumnarValue::Scalar(ScalarValue::Utf8View(Some(String::from("x")))),
],
Ok(Some("alphabetalphabet")),
&str,
Utf8View,
StringViewArray
);
// String cases
test_function!(
BTrimFunc::new(),
&[ColumnarValue::Scalar(ScalarValue::Utf8(Some(
String::from("alphabet ")
))),],
Ok(Some("alphabet")),
&str,
Utf8,
StringArray
);
test_function!(
BTrimFunc::new(),
&[ColumnarValue::Scalar(ScalarValue::Utf8(Some(
String::from("alphabet ")
))),],
Ok(Some("alphabet")),
&str,
Utf8,
StringArray
);
test_function!(
BTrimFunc::new(),
&[
ColumnarValue::Scalar(ScalarValue::Utf8(Some(String::from("alphabet")))),
ColumnarValue::Scalar(ScalarValue::Utf8(Some(String::from("t")))),
],
Ok(Some("alphabe")),
&str,
Utf8,
StringArray
);
test_function!(
BTrimFunc::new(),
&[
ColumnarValue::Scalar(ScalarValue::Utf8(Some(String::from("alphabet")))),
ColumnarValue::Scalar(ScalarValue::Utf8(Some(String::from("alphabe")))),
],
Ok(Some("t")),
&str,
Utf8,
StringArray
);
test_function!(
BTrimFunc::new(),
&[
ColumnarValue::Scalar(ScalarValue::Utf8(Some(String::from("alphabet")))),
ColumnarValue::Scalar(ScalarValue::Utf8(None)),
],
Ok(None),
&str,
Utf8,
StringArray
);
}
}
Loading