Skip to content

GH-46224: [C++][Acero] Fix the hang in asof join #46300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 14, 2025

Conversation

zanmato1984
Copy link
Contributor

@zanmato1984 zanmato1984 commented May 3, 2025

Rationale for this change

A hang of asof join is reported in #46224 . To explain the cause of the hang, I shall first brief the basic processing of asof join.

There is an outstanding processing thread for the asof join, whose lifespan is bound to the asof node. This thread will wait on a command queue. Once a command is pushed into the queue, the thread pops it and do one round of processing. The command is issued as the batches are received from either left or right side input of the asof join node. Each round of processing will advance the accumulated left and right inputs as much as possible to see if it is sufficient to output a batch (if yes, i.e., either side of the asof join can be concluded by the latest input timestamp, then emit it, otherwise adjust the internal state according to the latest input timestamp).

Notably, the processing is by design to advance the inputs "as much as possible". However there is a bug (seemingly since day 1) when advancing the right side input in a tricky condition: if all right side rows are at the times before the least time of the left rows, then advancing the right side input won't cross batches in one round of processing. This is OK (i.e., no hang) as long as there are enough commands issued into the queue to trigger enough rounds of processing. However if lucky enough, say, the left input batches come late, then the processing with consume some commands issued by the right side input but advance nothing (because the left side input is empty), resulting in insufficient commands to fully advance both the input.

What changes are included in this PR?

Fix the issue that advancing the right side input won't cross batches when the minimal left time is bigger than all the right times.

Are these changes tested?

Yes, added a dedicated case.

Are there any user-facing changes?

None.

Copy link

github-actions bot commented May 3, 2025

⚠️ GitHub issue #46224 has been automatically assigned in GitHub to PR creator.

@zanmato1984
Copy link
Contributor Author

@github-actions crossbow submit -g cpp -g r

Copy link

github-actions bot commented May 3, 2025

Revision: aa5a9be

Submitted crossbow builds: ursacomputing/crossbow @ actions-d733af0ab3

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
r-binary-packages GitHub Actions
r-recheck-most GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-meson GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-r-arrow-backwards-compatibility GitHub Actions
test-r-clang-sanitizer GitHub Actions
test-r-depsource-bundled Azure
test-r-depsource-system GitHub Actions
test-r-dev-duckdb GitHub Actions
test-r-devdocs GitHub Actions
test-r-extra-packages GitHub Actions
test-r-gcc-11 GitHub Actions
test-r-gcc-12 GitHub Actions
test-r-install-local GitHub Actions
test-r-install-local-minsizerel GitHub Actions
test-r-linux-as-cran GitHub Actions
test-r-linux-rchk GitHub Actions
test-r-linux-sanitizer GitHub Actions
test-r-linux-valgrind GitHub Actions
test-r-macos-as-cran GitHub Actions
test-r-minimal-build Azure
test-r-offline-maximal GitHub Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-release-latest Azure
test-r-rocker-r-ver-latest Azure
test-r-rstudio-r-base-4.1-opensuse155 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 GitHub Actions
test-r-versions GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@zanmato1984
Copy link
Contributor Author

@github-actions crossbow submit -g cpp -g r

Copy link

github-actions bot commented May 3, 2025

Revision: 6ea4624

Submitted crossbow builds: ursacomputing/crossbow @ actions-a2fa793a58

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
r-binary-packages GitHub Actions
r-recheck-most GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-meson GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-r-arrow-backwards-compatibility GitHub Actions
test-r-clang-sanitizer GitHub Actions
test-r-depsource-bundled Azure
test-r-depsource-system GitHub Actions
test-r-dev-duckdb GitHub Actions
test-r-devdocs GitHub Actions
test-r-extra-packages GitHub Actions
test-r-gcc-11 GitHub Actions
test-r-gcc-12 GitHub Actions
test-r-install-local GitHub Actions
test-r-install-local-minsizerel GitHub Actions
test-r-linux-as-cran GitHub Actions
test-r-linux-rchk GitHub Actions
test-r-linux-sanitizer GitHub Actions
test-r-linux-valgrind GitHub Actions
test-r-macos-as-cran GitHub Actions
test-r-minimal-build Azure
test-r-offline-maximal GitHub Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-release-latest Azure
test-r-rocker-r-ver-latest Azure
test-r-rstudio-r-base-4.1-opensuse155 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 GitHub Actions
test-r-versions GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@@ -639,12 +639,8 @@ class InputState : public util::SerialSequencingQueue::Processor {
// hit the end of the batch, need to get the next batch if possible.
++batches_processed_;
latest_ref_row_ = 0;
have_active_batch &= !queue_.TryPop();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this code is apparently NOT doing what it is supposed to do: by here the have_active_batch must be true, meaning queue_ must be non-empty. Then popping the queue always gets a valid std::optional and effectively setting have_active_batch to false. The if branch below won't be executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed there are two fixes made before in the following if branch #36094 and #36499. Given that this have_active_batch &= !queue_.TryPop() exists since #13028 where it was introduced, and the lack of corresponding tests, I'm not sure how they fixed anything and what the bugs even were. So I just remove them.

@zanmato1984 zanmato1984 marked this pull request as ready for review May 6, 2025 23:14
@zanmato1984 zanmato1984 requested a review from westonpace as a code owner May 6, 2025 23:14
@zanmato1984 zanmato1984 requested a review from icexelloss May 6, 2025 23:14
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 6, 2025
@zanmato1984
Copy link
Contributor Author

Hi, @westonpace @icexelloss @pitrou would you help to review? Thanks.

@zanmato1984
Copy link
Contributor Author

@github-actions crossbow submit -g cpp -g python -g r

Copy link

github-actions bot commented May 7, 2025

Revision: af70071

Submitted crossbow builds: ursacomputing/crossbow @ actions-ccf7708cfd

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
r-binary-packages GitHub Actions
r-recheck-most GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-meson GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-fedora-39-python-3 GitHub Actions
test-r-arrow-backwards-compatibility GitHub Actions
test-r-clang-asan GitHub Actions
test-r-clang-ubsan GitHub Actions
test-r-depsource-bundled Azure
test-r-depsource-system GitHub Actions
test-r-dev-duckdb GitHub Actions
test-r-devdocs GitHub Actions
test-r-extra-packages GitHub Actions
test-r-gcc-11 GitHub Actions
test-r-gcc-12 GitHub Actions
test-r-install-local GitHub Actions
test-r-install-local-minsizerel GitHub Actions
test-r-linux-as-cran GitHub Actions
test-r-linux-rchk GitHub Actions
test-r-linux-sanitizer GitHub Actions
test-r-linux-valgrind GitHub Actions
test-r-m1-san GitHub Actions
test-r-macos-as-cran GitHub Actions
test-r-minimal-build Azure
test-r-offline-maximal GitHub Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-release-latest Azure
test-r-rocker-r-ver-latest Azure
test-r-rstudio-r-base-4.1-opensuse155 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 GitHub Actions
test-r-versions GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@pitrou
Copy link
Member

pitrou commented May 7, 2025

I'm curious, this does only happen when the right side is concerned? The update logic is not symmetric?

@zanmato1984
Copy link
Contributor Author

I'm curious, this does only happen when the right side is concerned? The update logic is not symmetric?

Right, not symmetric as one might would naturally expect (the same situation for hash join as well - right side to build the hash table and left side to probe the hash table).

The left VS. right differs in the following aspects:

  1. There is only one left side input but possibly multiple right side inputs. The right side inputs are flattened and equally iterated during the whole processing (as opposed to traversed in a tree-like fashion).
  2. Several processing logic, and in addition some methods are right side inputs only, e.g.:
    Result<RhsUpdateState> UpdateRhs() {

    and
    rhs.AdvanceAndMemoize(lhs_latest_time, rhs_empty));

So it's normal that some issues are right side only.

But it does remind me that the method I was fixing, namely ‎InputState::Advance, also applies to the left side input, so I think I should see if the issue exists for the left side as well, probably by adding a symmetric case you suggested.

Thanks for the question!

Co-authored-by: Antoine Pitrou <pitrou@free.fr>
@zanmato1984
Copy link
Contributor Author

zanmato1984 commented May 7, 2025

Hi @pitrou , comments all addressed. Would you take a look again? Thanks.

@zanmato1984
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

Copy link

Revision: dd7a66b

Submitted crossbow builds: ursacomputing/crossbow @ actions-7d095c8a25

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-meson GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@zanmato1984
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

Copy link

Revision: 5f5b895

Submitted crossbow builds: ursacomputing/crossbow @ actions-973e5e92ed

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-meson GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@zanmato1984
Copy link
Contributor Author

Thanks @pitrou for reviewing. Is there anything else to address?

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zanmato1984 zanmato1984 merged commit fe29b7d into apache:main May 14, 2025
39 of 40 checks passed
@zanmato1984 zanmato1984 removed the awaiting committer review Awaiting committer review label May 14, 2025
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit fe29b7d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 19 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants