refactor: TableScan file plan generation now implemented purely in streams rather than channels #1486

sdd · 2025-07-04T08:01:41Z

The current implementation of scan planning leaves a lot to be desired. The channel-based approach is hard-to-follow, error-prone, and does not properly support backpressure.

Not only that, but in real-life usage I've been experiencing intermittent deadlock with it, and haven't been able to track down the cause.

There's a source of deadlock present that I have spotted in the existing implementation too. fetch_manifest_and_stream_manifest_entries attempts to push to bounded channels, and so will block if the channel it is pushing to is full. If the head-of-line data file context is awaiting on the DeleteFileIndex but the delete files that it depends upon are in a manifest that is yet to enter the delete file manifest context channel, then the pipeline is in deadlock.

@Xuanwo's refactor of the Arrow reader showed that a stream-based approach can be more elegant, address the lack of backpressure, and be more reliable. This refactor brings those same advantages to the plan phase.

…g a source of deadlock

sdd force-pushed the refactor-scan-plan branch 2 times, most recently from 45099ba to 9946641 Compare July 4, 2025 08:05

refactor: scan plan now pure streams rather than channels, eliminatin…

b1a4447

…g a source of deadlock

sdd force-pushed the refactor-scan-plan branch from 9946641 to b1a4447 Compare July 4, 2025 18:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: TableScan file plan generation now implemented purely in streams rather than channels #1486

refactor: TableScan file plan generation now implemented purely in streams rather than channels #1486

sdd commented Jul 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

refactor: TableScan file plan generation now implemented purely in streams rather than channels #1486

Are you sure you want to change the base?

refactor: TableScan file plan generation now implemented purely in streams rather than channels #1486

Conversation

sdd commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sdd commented Jul 4, 2025 •

edited

Loading