String Transform Examples #18616

lamarrr · 2025-05-01T13:17:05Z

Description

Depends on: #18490
Follows up on #18023

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…IME_JIT

…rrr/cudf into column-device-view-refactor

…ce-view-refactor

…string-output

…into string-jit-examples

cpp/examples/string_transforms/CMakeLists.txt

cpp/examples/string_transforms/preallocated.cpp

bdice · 2025-05-15T20:10:54Z

cpp/examples/string_transforms/preallocated.cpp

+  auto push = [&](cudf::string_view str) {
+    auto const size = str.size_bytes();
+
+    if ((it + size) > end) { return; }


Would it be possible to set an "error" flag for this row as a second output, so we could know if the scratch space was insufficient?

I'd rather not suggest error handling. It would require an extra column to hold the errors and I think would get in the way of what we are trying to show here.

I agree with @davidwendt, this example assumes the buffer is sufficient for the entire operation. Which I think is ideal, especially for database/parquet structured operations where we know the upper-bound of memory usage.

Sorry, I wasn’t clear in my question. I agree it is not suitable for this example from a teaching standpoint. I was trying to ask if it is possible, like if we have support for multiple column outputs.

No, It is not possible currently. But if there's a consensus or user requirement for it I'll be happy to implement it. It should be easy to extend it to do so, I had that usage in mind when writing it but wanted to avoid feature creep.

I assume this should also be the same as a struct output?
Is there a way to slice a struct column into its components without copying?

This is perhaps outside the scope of this PR. It would not be something to solve in the examples. I can think of many different ways to solve this with the current API.

davidwendt · 2025-05-16T11:44:56Z

cpp/examples/string_transforms/README.md

+
+The example source code loads a csv file and produces a transformed column from the table using the values from the tables.
+
+Four examples are included:


What happened to the other ones? I think it would be good to show an example just returning a fixed-width integer.

I added it back. I was informed it wasn't that important for the release blog. But it's a valuable example anyway

cpp/examples/string_transforms/README.md

cpp/examples/string_transforms/common.hpp

Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>

davidwendt · 2025-05-19T12:51:35Z

cpp/examples/string_transforms/output.cpp

+  auto alt = cudf::make_column_from_scalar(
+    cudf::string_scalar(cudf::string_view{"(unknown)", 9}, true, stream), 1, stream);


This is great. How about a comment here about how a single-row column is treated as a constant scalar for all rows? Something worded better than that perhaps.

davidwendt · 2025-05-19T12:52:55Z

cpp/examples/string_transforms/preallocated.cpp

+  rmm::device_uvector<char> scratch(maximum_size * num_rows, stream, mr);
+
+  auto size =
+    cudf::make_column_from_scalar(cudf::numeric_scalar<int32_t>(maximum_size, true, stream, mr), 1);


Suggested change

cudf::make_column_from_scalar(cudf::numeric_scalar<int32_t>(maximum_size, true, stream, mr), 1);

cudf::make_column_from_scalar(cudf::numeric_scalar<int32_t>(maximum_size, true, stream), 1, stream);

lamarrr added 30 commits March 24, 2025 20:38

initial changes

211e9a2

removed jit::column_device_view and renamed CUDF_JIT_UDF to CUDF_RUNT…

4b02b11

…IME_JIT

updated tests

12f4a46

Merge branch 'branch-25.06' into column-device-view-refactor

9c0ab88

added todo

88e5c87

Merge branch 'branch-25.06' into column-device-view-refactor

2025110

Merge branch 'branch-25.06' into column-device-view-refactor

9964234

Merge branch 'branch-25.06' into column-device-view-refactor

53062b0

added string find test

c00444d

Merge branch 'branch-25.06' into column-device-view-refactor

61dcd4d

Merge branch 'branch-25.06' into column-device-view-refactor

8ed0029

Merge branch 'branch-25.06' into column-device-view-refactor

2c59065

initial prototype

f1a11d8

update

ae4a47a

added raw_device_column_view

4a4ef09

update

4b2c18c

update

0eff98c

Merge branch 'branch-25.06' into column-device-view-refactor

8008826

updated doc

aa5684f

Merge branch 'column-device-view-refactor' of https://github.com/lama…

7f02f32

…rrr/cudf into column-device-view-refactor

Merge branch 'column-device-view-refactor' into string-output

f6d764b

update

f9dc7ce

moved and renamed raw_column_device_view

315545d

formatting and doc update

1163d7c

moved get_mask_offset_word

785a415

Merge branch 'column-device-view-refactor' into string-output

fb01e05

update

065d19e

renamed column offsets index

3539622

Merge remote-tracking branch 'upstream/branch-25.06' into column-devi…

e7b5585

…ce-view-refactor

Merge branch 'branch-25.06' into column-device-view-refactor

046d8be

lamarrr added 3 commits May 14, 2025 23:35

removed null tests in java transforms test

05250bf

Merge branch 'string-output' of https://github.com/lamarrr/cudf into …

649863e

…string-output

Merge branch 'string-output' into string-jit-examples

0ba2305

github-actions bot added the Java Affects Java cuDF API. label May 15, 2025

Merge branch 'branch-25.06' into string-jit-examples

92fc387

github-actions bot removed the Java Affects Java cuDF API. label May 15, 2025

lamarrr added 3 commits May 15, 2025 16:41

removed int_output and updated examples

5c676a8

Merge branch 'string-jit-examples' of https://github.com/lamarrr/cudf …

7ac0022

…into string-jit-examples

Merge branch 'branch-25.06' into string-jit-examples

faf5573

lamarrr added feature request New feature or request non-breaking Non-breaking change labels May 15, 2025

lamarrr marked this pull request as ready for review May 15, 2025 16:43

lamarrr requested review from a team as code owners May 15, 2025 16:43

lamarrr requested review from vyasr and shrshi May 15, 2025 16:43

lamarrr added 2 commits May 15, 2025 16:45

formatting

46d1d25

Merge branch 'string-jit-examples' of https://github.com/lamarrr/cudf …

221443b

…into string-jit-examples

bdice reviewed May 15, 2025

View reviewed changes

davidwendt reviewed May 16, 2025

View reviewed changes

lamarrr added 2 commits May 17, 2025 01:50

updated examples

d5a351a

pre-commit

2ea0bf7

lamarrr requested review from davidwendt and bdice May 19, 2025 12:37

davidwendt reviewed May 19, 2025

View reviewed changes

cpp/examples/string_transforms/README.md Outdated Show resolved Hide resolved

davidwendt reviewed May 19, 2025

View reviewed changes

cpp/examples/string_transforms/common.hpp Outdated Show resolved Hide resolved

lamarrr and others added 3 commits May 19, 2025 13:48

Update cpp/examples/string_transforms/README.md

4541851

Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>

Merge branch 'branch-25.06' into string-jit-examples

668b25b

Update cpp/examples/string_transforms/common.hpp

2b1b4cb

Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>

davidwendt requested changes May 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String Transform Examples #18616

String Transform Examples #18616

lamarrr commented May 1, 2025 •

edited

Loading

bdice May 15, 2025

davidwendt May 16, 2025

lamarrr May 17, 2025

bdice May 17, 2025

lamarrr May 17, 2025

davidwendt May 19, 2025

davidwendt May 16, 2025

lamarrr May 17, 2025

davidwendt May 19, 2025

davidwendt May 19, 2025


		The example source code loads a csv file and produces a transformed column from the table using the values from the tables.

		Four examples are included:

		auto alt = cudf::make_column_from_scalar(
		cudf::string_scalar(cudf::string_view{"(unknown)", 9}, true, stream), 1, stream);

	cudf::make_column_from_scalar(cudf::numeric_scalar<int32_t>(maximum_size, true, stream, mr), 1);
	cudf::make_column_from_scalar(cudf::numeric_scalar<int32_t>(maximum_size, true, stream), 1, stream);

String Transform Examples #18616

Are you sure you want to change the base?

String Transform Examples #18616

Conversation

lamarrr commented May 1, 2025 • edited Loading

Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lamarrr commented May 1, 2025 •

edited

Loading