Skip to content

Add Arena option to BulkUpsert, add hidden --send-format option #20061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Jun 24, 2025

Conversation

pnv1
Copy link
Collaborator

@pnv1 pnv1 commented Jun 23, 2025

Changelog entry

Changelog category

  • Not for changelog (changelog entry is not required)

Description for reviewers

Collaboration based on PRs:
#19667
and
#20005

Because `TValue&& rows` is provided as rvalue reference, we can
move its value into the request rather than making a copy of the data.

Speed boost: ~130MiB/s -> ~168-200MiB/s.
…sertUnretryable (leaving BulkUpsert unchanged)
Implement additional APIs that work with arena-allocated proto messages.
The implementation creates a new protobuf::Arena per batch request and builds Ydb::Value
inside this arena inside TCsvParser::BuildListOnArena.

Ydb::Value gets created once and then moved into Ydb::Table::BulkUpsertRequest, which
is allocated on the same arena (allocation in the same arena prevents copying).

Left undone/requires modifications:
1. TODO: I had to copy-paste RunDeferred/Run methods (see: RunDeferredOnArena/RunOnArena)
         because arena-allocated messages are returned as pointers, but the API expected
         rvalue/lvalue references on the request message. If the pointer is deferefenced,
         there will be a single copy of the message inside TGRpcConnectionsImpl::Run when
         WithServiceConnection is called and request is moved into capture parameter of
         a callback. In the best case, there must be no code duplication (the copy-pasted
         methods differ only in the type of accepted request: they accept TRequest*).

2. TODO: ensure correctness.

Speed boost: ~130MiB/s -> ~250MiB/s
Notes:
1. Creates custom converter in `csv_parser.cpp` to build `Ydb::Value` on the arena.
2. Creates new implementer of `TValueBuilderBase` to build `TArenaAllocatedValue` in `value.cpp`/`value.h`.
3. Introduces `TValueHolder` interface with two implementations:
   - `StackAllocatedValueHolder` with stack-allocated `Ydb::Value`.
   - `TArenaAllocatedValueHolder` with arena-allocated `Ydb::Value`.

Current drawbacks:
- `TValueHolder` has virtual calls, which is costy; try to employ static polymorphism.
- `FieldToValueOnArena` creates `TCsvToYdbOnArenaConverter`, which is a copy of `TCsvToYdbConverter`.

Verdict: didn't produce any performance improvements (likely due to virtual calls in `TValueHolder`).
…ildListOnArena`

This commit fixes the issue with virtual calls in `TValueHolder`,
integrated in the commit 396d98f.

Average speed: 254-257 MiB/s (33.1-33.6 sec spent)

There seems to be no performance difference in terms of upload speed.
The only difference is that the percental fraction of CPU load
that `TCsvParser::BuildListOnArena` method takes:
1. this commit: 42.7% of the total CPU time.
2. commit 396d98f: 43.1% of the total CPU time.
3. without both commits: 49.3% of the total CPU time.
…hars counting

Benchmarking results (see commit 6201814 for details):

1. Previous solution (at commit 6201814):
Elapsed: 17.7 sec. Total read size: 8.33GiB. Average processing speed: 482MiB/s.

2. Current solution:
Elapsed: 10.6 sec. Total read size: 8.33GiB. Average processing speed: 804MiB/s (up to 817MiB/s).

Cherry-picked the commit: `773a3a5`
See: Vladislav0Art@773a3a5
Requires testing via a real instance of YDBD to ensure that the refactoring is correct.
@pnv1 pnv1 requested review from a team as code owners June 23, 2025 15:29
Copy link

🟢 2025-06-23 15:32:43 UTC The validation of the Pull Request description is successful.

Copy link

github-actions bot commented Jun 23, 2025

2025-06-23 15:33:32 UTC Pre-commit check linux-x86_64-release-asan for 018d58a has started.
2025-06-23 15:33:44 UTC Artifacts will be uploaded here
2025-06-23 15:37:16 UTC ya make is running...
2025-06-23 16:56:38 UTC Check cancelled

Copy link

github-actions bot commented Jun 23, 2025

2025-06-23 15:33:32 UTC Pre-commit check linux-x86_64-relwithdebinfo for 018d58a has started.
2025-06-23 15:33:44 UTC Artifacts will be uploaded here
2025-06-23 15:37:19 UTC ya make is running...
2025-06-23 16:56:40 UTC Check cancelled

Copy link

github-actions bot commented Jun 23, 2025

2025-06-23 17:06:49 UTC Pre-commit check linux-x86_64-relwithdebinfo for f533f39 has started.
2025-06-23 17:07:00 UTC Artifacts will be uploaded here
2025-06-23 17:09:25 UTC Check cancelled

Copy link

github-actions bot commented Jun 23, 2025

2025-06-23 17:06:57 UTC Pre-commit check linux-x86_64-release-asan for f533f39 has started.
2025-06-23 17:07:08 UTC Artifacts will be uploaded here
2025-06-23 17:09:25 UTC Check cancelled

Copy link

github-actions bot commented Jun 23, 2025

2025-06-23 17:10:43 UTC Pre-commit check linux-x86_64-relwithdebinfo for 8821deb has started.
2025-06-23 17:10:47 UTC Artifacts will be uploaded here
2025-06-23 17:14:24 UTC ya make is running...
2025-06-23 17:42:29 UTC Check cancelled

Copy link

github-actions bot commented Jun 23, 2025

2025-06-23 17:10:51 UTC Pre-commit check linux-x86_64-release-asan for 8821deb has started.
2025-06-23 17:11:02 UTC Artifacts will be uploaded here
2025-06-23 17:14:41 UTC ya make is running...
2025-06-23 17:42:30 UTC Check cancelled

const TCallMeta& meta,
IQueueClientContext* context)
{
if (IsPointer_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsPointer_ кажется не нужен, можно std::get_if использовать

Copy link

github-actions bot commented Jun 23, 2025

2025-06-23 17:43:47 UTC Pre-commit check linux-x86_64-relwithdebinfo for a4d614e has started.
2025-06-23 17:44:05 UTC Artifacts will be uploaded here
2025-06-23 17:47:43 UTC ya make is running...
🟡 2025-06-23 19:01:14 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
38450 35726 0 5 2665 54

2025-06-23 19:04:35 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-06-23 19:16:57 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
649 (only retried tests) 612 0 1 0 36

2025-06-23 19:17:08 UTC ya make is running... (failed tests rerun, try 3)
🔴 2025-06-23 19:27:19 UTC Some tests failed, follow the links below.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
471 (only retried tests) 440 0 1 0 30

🟢 2025-06-23 19:27:27 UTC Build successful.
🔴 2025-06-23 19:27:49 UTC ydbd size 2.2 GiB changed* by +2.2 MiB, which is >= 2.0 MiB vs main: Alert

ydbd size dash main: 09373f6 merge: a4d614e diff diff %
ydbd size 2 379 862 488 Bytes 2 382 163 184 Bytes +2.2 MiB +0.097%
ydbd stripped size 498 524 296 Bytes 498 698 056 Bytes +169.7 KiB +0.035%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Jun 23, 2025

2025-06-23 17:44:26 UTC Pre-commit check linux-x86_64-release-asan for a4d614e has started.
2025-06-23 17:44:36 UTC Artifacts will be uploaded here
2025-06-23 17:48:04 UTC ya make is running...
🟡 2025-06-23 19:35:50 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
16182 15795 0 111 247 29

2025-06-23 19:37:15 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-06-23 20:12:39 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
1844 (only retried tests) 1529 0 81 206 28

2025-06-23 20:13:00 UTC ya make is running... (failed tests rerun, try 3)
🟡 2025-06-23 20:47:15 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
1656 (only retried tests) 1398 0 54 179 25

🟢 2025-06-23 20:47:32 UTC Build successful.
🔴 2025-06-23 20:48:06 UTC ydbd size 3.9 GiB changed* by +3.7 MiB, which is >= 2.0 MiB vs main: Alert

ydbd size dash main: 09373f6 merge: a4d614e diff diff %
ydbd size 4 186 899 496 Bytes 4 190 792 544 Bytes +3.7 MiB +0.093%
ydbd stripped size 1 451 440 440 Bytes 1 452 768 952 Bytes +1.3 MiB +0.092%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@pnv1 pnv1 added the rebase-and-check Rebase PR with the current base branch and check label Jun 24, 2025
@github-actions github-actions bot removed the rebase-and-check Rebase PR with the current base branch and check label Jun 24, 2025
Copy link

github-actions bot commented Jun 24, 2025

2025-06-24 06:52:18 UTC Pre-commit check linux-x86_64-relwithdebinfo for b553e5d has started.
2025-06-24 06:52:29 UTC Artifacts will be uploaded here
2025-06-24 06:56:09 UTC ya make is running...
🟡 2025-06-24 08:42:21 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
38461 35729 0 9 2681 42

2025-06-24 08:45:49 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-06-24 08:59:26 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
772 (only retried tests) 674 0 4 48 46

2025-06-24 08:59:37 UTC ya make is running... (failed tests rerun, try 3)
🔴 2025-06-24 09:09:42 UTC Some tests failed, follow the links below.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
496 (only retried tests) 462 0 1 5 28

🟢 2025-06-24 09:09:50 UTC Build successful.
🔴 2025-06-24 09:10:14 UTC ydbd size 2.2 GiB changed* by +2.2 MiB, which is >= 2.0 MiB vs main: Alert

ydbd size dash main: e250218 merge: b553e5d diff diff %
ydbd size 2 380 256 632 Bytes 2 382 557 328 Bytes +2.2 MiB +0.097%
ydbd stripped size 498 608 904 Bytes 498 782 664 Bytes +169.7 KiB +0.035%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Jun 24, 2025

2025-06-24 06:52:18 UTC Pre-commit check linux-x86_64-release-asan for b553e5d has started.
2025-06-24 06:52:29 UTC Artifacts will be uploaded here
2025-06-24 06:55:57 UTC ya make is running...
🔴 2025-06-24 07:17:47 UTC Build failed, see the logs. Also see fail summary

@pnv1 pnv1 added the rebase-and-check Rebase PR with the current base branch and check label Jun 24, 2025
@github-actions github-actions bot removed the rebase-and-check Rebase PR with the current base branch and check label Jun 24, 2025
Copy link

github-actions bot commented Jun 24, 2025

2025-06-24 11:17:31 UTC Pre-commit check linux-x86_64-release-asan for 7ccf905 has started.
2025-06-24 11:17:42 UTC Artifacts will be uploaded here
2025-06-24 11:21:16 UTC ya make is running...
🟡 2025-06-24 13:33:05 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
16208 15862 0 117 203 26

🟢 2025-06-24 13:34:27 UTC Build successful.
🔴 2025-06-24 13:35:12 UTC ydbd size 3.9 GiB changed* by +3.6 MiB, which is >= 2.0 MiB vs main: Alert

ydbd size dash main: 4541608 merge: 7ccf905 diff diff %
ydbd size 4 188 013 648 Bytes 4 191 785 640 Bytes +3.6 MiB +0.090%
ydbd stripped size 1 451 827 544 Bytes 1 453 078 616 Bytes +1.2 MiB +0.086%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Jun 24, 2025

2025-06-24 11:24:25 UTC Pre-commit check linux-x86_64-relwithdebinfo for 7ccf905 has started.
2025-06-24 11:24:31 UTC Artifacts will be uploaded here
2025-06-24 11:28:20 UTC ya make is running...
🟡 2025-06-24 13:03:54 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
38491 35786 0 2 2665 38

2025-06-24 13:07:12 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-06-24 13:16:57 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
490 (only retried tests) 457 0 2 2 29

2025-06-24 13:17:07 UTC ya make is running... (failed tests rerun, try 3)
🟢 2025-06-24 13:25:35 UTC Tests successful.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
301 (only retried tests) 269 0 0 2 30

🟢 2025-06-24 13:25:45 UTC Build successful.
🔴 2025-06-24 13:26:06 UTC ydbd size 2.2 GiB changed* by +2.3 MiB, which is >= 2.0 MiB vs main: Alert

ydbd size dash main: 4541608 merge: 7ccf905 diff diff %
ydbd size 2 380 319 544 Bytes 2 382 715 720 Bytes +2.3 MiB +0.101%
ydbd stripped size 498 618 024 Bytes 498 832 936 Bytes +209.9 KiB +0.043%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@pnv1 pnv1 merged commit d6db40f into ydb-platform:main Jun 24, 2025
19 of 22 checks passed
@pnv1 pnv1 deleted the cli-import-improvements branch June 24, 2025 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants