Skip to content

perf: speed up iter, range, scan methods with Rc<Node> #356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 1, 2025
Merged

Conversation

maksymar
Copy link
Contributor

@maksymar maksymar commented Jun 30, 2025

This PR improves iteration performance by storing nodes as Rc<Node>.

The change is inspired by the Entry API PoC in #251, but applied separately to avoid breaking changes.

btreeset:

Summary:
  instructions:
    status:   Regressions and improvements 🔴🟢
    counts:   [total 100 | regressed 7 | improved 40 | new 0 | unchanged 53]
    change:   [max +2.21K | p75 -179.95K | median -207.33K | p25 -620.79K | min -2.49M]
    change %: [max +5.01% | p75 -0.04% | median -1.11% | p25 -5.50% | min -11.48%]

btreemap:

Summary:
  instructions:
    status:   Regressions and improvements 🔴🟢
    counts:   [total 285 | regressed 3 | improved 8 | new 0 | unchanged 274]
    change:   [max +5.34M | p75 +411.31K | median +146.65K | p25 +97.32K | min -1.49M]
    change %: [max +3.36% | p75 +0.06% | median +0.03% | p25 +0.01% | min -10.92%]

Most of the improved benchmarks related to iter, range, scan methods.
Some regressions are small and limited to benchmarks not involving traversal.

UPD(from comment):
At first glance using Rc<Node<K>> should not help since we don't explicitly clone Node anywhere.
The improvement comes from avoiding repeated moves of large value.
Node is fairly big (multiple fields and vectors) comparing to Rc, so moving it around during iteration adds overhead.
Switching to Rc makes those moves much cheaper, it's just a pointer copy.

Copy link

github-actions bot commented Jun 30, 2025

canbench 🏋 (dir: ./benchmarks/memory_manager) a52bff3 2025-07-01 11:20:30 UTC

./benchmarks/memory_manager/canbench_results.yml is up to date
📦 canbench_results_memory-manager.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 3 | regressed 0 | improved 0 | new 0 | unchanged 3]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 3 | regressed 0 | improved 0 | new 0 | unchanged 3]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 3 | regressed 0 | improved 0 | new 0 | unchanged 3]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

Copy link

github-actions bot commented Jun 30, 2025

canbench 🏋 (dir: ./benchmarks/btreeset) a52bff3 2025-07-01 11:20:33 UTC

./benchmarks/btreeset/canbench_results.yml is up to date
📦 canbench_results_btreeset.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   Regressions and improvements 🔴🟢
    counts:   [total 100 | regressed 7 | improved 40 | new 0 | unchanged 53]
    change:   [max +2.21K | p75 -179.95K | median -207.33K | p25 -620.79K | min -2.49M]
    change %: [max +5.01% | p75 -0.04% | median -1.11% | p25 -5.50% | min -11.48%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 100 | regressed 0 | improved 0 | new 0 | unchanged 100]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 100 | regressed 0 | improved 0 | new 0 | unchanged 100]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                                  | calls |    ins |  ins Δ% | HI |  HI Δ% | SMI |  SMI Δ% |
|--------|---------------------------------------|-------|--------|---------|----|--------|-----|---------|
|   +    | btreeset_is_subset_u32                |       | 46.30K |  +5.01% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreeset_is_subset_u64                |       | 46.76K |  +4.51% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreeset_is_subset_blob_32            |       | 48.62K |  +3.92% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreeset_is_subset_blob_8             |       | 55.42K |  +3.55% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreeset_is_subset_blob_16            |       | 56.78K |  +3.31% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreeset_is_subset_blob_64            |       | 59.90K |  +3.27% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreeset_is_subset_blob_128           |       | 91.46K |  +2.22% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_is_superset_blob_64          |       |  8.69M |  -2.34% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_is_disjoint_blob_64          |       |  5.31M |  -2.71% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_range_blob_256               |       | 77.84M |  -3.10% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_iter_blob_64                 |       | 41.01M |  -3.30% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_range_blob_64                |       | 25.47M |  -4.10% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_union_blob_32                |       |  4.71M |  -4.26% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_symmetric_difference_blob_32 |       |  4.71M |  -4.26% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_intersection_blob_32         |       |  4.71M |  -4.26% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_is_disjoint_blob_32          |       |  2.87M |  -4.51% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_is_superset_blob_32          |       |  4.56M |  -4.91% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_is_disjoint_blob_8           |       |  2.14M |  -5.14% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_union_blob_16                |       |  3.45M |  -5.42% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_symmetric_difference_blob_16 |       |  3.45M |  -5.42% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_intersection_blob_16         |       |  3.45M |  -5.43% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_is_disjoint_blob_16          |       |  2.21M |  -5.48% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_union_blob_8                 |       |  3.21M |  -5.55% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_symmetric_difference_blob_8  |       |  3.20M |  -5.55% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_intersection_blob_8          |       |  3.20M |  -5.55% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_is_superset_blob_16          |       |  3.37M |  -5.55% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_iter_blob_32                 |       | 22.57M |  -5.82% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_is_superset_blob_8           |       |  3.14M |  -5.85% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_is_disjoint_u64              |       |  1.59M |  -6.53% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_range_blob_32                |       | 14.36M |  -6.74% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_is_disjoint_u32              |       |  1.58M |  -6.76% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_is_superset_u32              |       |  2.32M |  -7.07% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_symmetric_difference_u64     |       |  2.34M |  -7.20% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_union_u64                    |       |  2.33M |  -7.21% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_intersection_u64             |       |  2.33M |  -7.21% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_is_superset_u64              |       |  2.33M |  -7.37% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_symmetric_difference_u32     |       |  2.32M |  -7.47% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_union_u32                    |       |  2.32M |  -7.48% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_intersection_u32             |       |  2.31M |  -7.52% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_iter_blob_16                 |       | 16.14M |  -7.80% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_iter_blob_8                  |       | 15.52M |  -7.90% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_range_blob_16                |       | 10.54M |  -8.78% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_range_blob_8                 |       | 10.16M |  -8.97% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_iter_u64                     |       | 12.34M |  -9.46% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_iter_u32                     |       | 12.27M |  -9.50% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_range_u64                    |       |  7.60M | -11.40% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreeset_range_u32                    |       |  7.55M | -11.48% |  0 |  0.00% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
CSV results saved to canbench_results.csv

Copy link

github-actions bot commented Jun 30, 2025

canbench 🏋 (dir: ./benchmarks/vec) a52bff3 2025-07-01 11:20:24 UTC

./benchmarks/vec/canbench_results.yml is up to date
📦 canbench_results_vec.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 16 | regressed 0 | improved 0 | new 0 | unchanged 16]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

Copy link

github-actions bot commented Jun 30, 2025

canbench 🏋 (dir: ./benchmarks/compare) a52bff3 2025-07-01 11:21:18 UTC

./benchmarks/compare/canbench_results.yml is up to date
📦 canbench_results_compare.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   No significant changes 👍
    counts:   [total 18 | regressed 0 | improved 0 | new 0 | unchanged 18]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 18 | regressed 0 | improved 0 | new 0 | unchanged 18]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 18 | regressed 0 | improved 0 | new 0 | unchanged 18]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------
CSV results saved to canbench_results.csv

Copy link

github-actions bot commented Jun 30, 2025

canbench 🏋 (dir: ./benchmarks/btreemap) a52bff3 2025-07-01 11:22:06 UTC

./benchmarks/btreemap/canbench_results.yml is up to date
📦 canbench_results_btreemap.csv available in artifacts

---------------------------------------------------

Summary:
  instructions:
    status:   Regressions and improvements 🔴🟢
    counts:   [total 285 | regressed 3 | improved 8 | new 0 | unchanged 274]
    change:   [max +5.34M | p75 +411.31K | median +146.65K | p25 +97.32K | min -1.49M]
    change %: [max +3.36% | p75 +0.06% | median +0.03% | p25 +0.01% | min -10.92%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 285 | regressed 0 | improved 0 | new 0 | unchanged 285]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 285 | regressed 0 | improved 0 | new 0 | unchanged 285]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                               | calls |     ins |  ins Δ% | HI |  HI Δ% | SMI |  SMI Δ% |
|--------|------------------------------------|-------|---------|---------|----|--------|-----|---------|
|   +    | btreemap_v2_contains_10mib_values  |       | 147.00M |  +3.36% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_scan_keys_20_10mib     |       |  19.09M |  +3.36% |  0 |  0.00% |   0 |   0.00% |
|   +    | btreemap_v2_scan_keys_rev_20_10mib |       |  19.09M |  +3.36% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_range_count_1k_10kib   |       |   2.47M |  -4.24% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_range_count_20_10mib   |       |  19.09M |  -7.22% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_scan_iter_1k_0b        |       |   1.26M |  -8.49% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_scan_values_1k_0b      |       |   1.24M |  -8.63% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_scan_iter_rev_1k_0b    |       |   1.26M |  -8.77% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_scan_values_rev_1k_0b  |       |   1.24M |  -9.06% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_scan_keys_1k_0b        |       | 983.89K | -10.91% |  0 |  0.00% |   0 |   0.00% |
|   -    | btreemap_v2_scan_keys_rev_1k_0b    |       | 985.20K | -10.92% |  0 |  0.00% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
CSV results saved to canbench_results.csv

@maksymar maksymar requested a review from Copilot June 30, 2025 11:53
Copilot

This comment was marked as outdated.

@maksymar maksymar changed the title perf: use Rc<node> perf: speed up iter, range, scan methods with Rc<Node> Jun 30, 2025
@maksymar maksymar marked this pull request as ready for review June 30, 2025 12:02
@maksymar maksymar requested a review from a team as a code owner June 30, 2025 12:02
@maksymar
Copy link
Contributor Author

FYI @frankdavid

Copy link
Contributor

@mraszyk mraszyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this PR supposed to improve performance? It doesn't seem like we are cloning node (in which case using Rc<Node<K>> could improve performance).

@maksymar maksymar changed the title perf: speed up iter, range, scan methods with Rc<Node> perf: speed up iter, range, scan methods with Rc Jun 30, 2025
@maksymar maksymar changed the title perf: speed up iter, range, scan methods with Rc perf: speed up iter, range, scan methods with Rc<Node> Jul 1, 2025
@maksymar
Copy link
Contributor Author

maksymar commented Jul 1, 2025

Why is this PR supposed to improve performance? It doesn't seem like we are cloning node (in which case using Rc<Node<K>> could improve performance).

you're right -- at first glance using Rc<Node<K>> should not help since we don't explicitly clone Node anywhere.

but my guess is the improvement comes from avoiding repeated moves of large value.
Node is fairly big (multiple fields and vectors) comparing to Rc, so moving it around during iteration adds overhead.
switching to Rc makes those moves much cheaper, it's just a pointer copy.

as benchmarks show this change provides a measurable impact:

  • some regressions of +3...+5% (due to Rc overhead)
  • but also improvements up to -11% (due to cheap move)
  • with median in the range of +0.03%...-1.11%

updated github PR description with this explanation.

does it make sense?

@maksymar maksymar requested review from mraszyk and Copilot July 1, 2025 11:41
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves iteration performance for BTreeMap (and BTreeSet benchmarks) by switching from moving entire Node structs to storing nodes behind reference‐counted pointers (Rc<Node>). Key changes include modifying the Cursor enum in src/btreemap/iter.rs to wrap nodes in Rc, and updating numerous iteration methods to create Rc instances, along with adjusting benchmark values to reflect performance improvements.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/btreemap/iter.rs Updated Cursor and related iteration logic to wrap nodes in Rc.
benchmarks/btreeset/canbench_results.yml Benchmark instruction counts adjusted to reflect performance changes.
benchmarks/btreemap/canbench_results.yml Benchmark instruction counts adjusted to reflect performance changes.

@maksymar maksymar marked this pull request as draft July 1, 2025 11:44
@maksymar maksymar marked this pull request as ready for review July 1, 2025 11:44
Copy link
Contributor

@mraszyk mraszyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

@maksymar maksymar merged commit 8d97803 into main Jul 1, 2025
21 checks passed
@maksymar maksymar deleted the maksym/rc branch July 1, 2025 14:19
maksymar added a commit that referenced this pull request Jul 7, 2025
This PR replaces the use of reference-counted `Rc<Node>` pointers with
unique `Box<Node>` allocations in the `BTreeMap` iterator to reduce
overhead.

btreemap
```
  instructions:
    status:   Improvements detected 🟢
    counts:   [total 285 | regressed 0 | improved 9 | new 0 | unchanged 276]
    change:   [max 0 | p75 -97.33K | median -147.22K | p25 -481.26K | min -5.34M]
    change %: [max 0.00% | p75 -0.02% | median -0.03% | p25 -0.06% | min -3.25%]
```

btreeset
```
  instructions:
    status:   Improvements detected 🟢
    counts:   [total 100 | regressed 0 | improved 2 | new 0 | unchanged 98]
    change:   [max +625.36K | p75 -288 | median -27.44K | p25 -80.94K | min -377.91K]
    change %: [max +0.05% | p75 -0.07% | median -0.30% | p25 -1.17% | min -2.35%]
```

Previous change #356
was adding `Rc<Node>` (vs simple moving) which added some improvements
as well as regressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants