Skip to content

ARROW-11108: [Rust] Fixed performance issue in mutableBuffer. #9076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

ARROW-11108: [Rust] Fixed performance issue in mutableBuffer. #9076

wants to merge 7 commits into from

Conversation

jorgecarleitao
Copy link
Member

@jorgecarleitao jorgecarleitao commented Jan 2, 2021

This PR refactors MutableBuffer::extend_from_slice to remove the need to use to_byte_slice on every call, thereby removing its level of indirection, that does not allow the compiler to optimize out some code.

This is the second performance improvement originally presented in #8796 and, together with #9027 , brings the performance of "MutableBuffer" to the same level as Vec<u8>, in particular to building buffers on the fly.

Basically, when converting to a byte slice &[u8], the compiler loses the type size information, and thus needs to perform extra checks and can't just optimize out the code.

This PR adopts the same API as Vec<T>::extend_from_slice, but since our buffers are in u8 (i.e. a la Vec<u8>), I made the signature

pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T])
pub fn push<T: ToByteSlice>(&mut self, item: &T)

i.e. it consumes something that can be converted to a byte slice, but internally makes the conversion to bytes (as to_byte_slice was doing).

Credits for the root cause analysis that lead to this PR go to @Dandandan, originally fielded here.

[...] current conversion to a byte slice may add some overhead? - @Dandandan

Benches (against master, so, both this PR and #9044 ):

Switched to branch 'perf_buffer'
Your branch and 'origin/perf_buffer' have diverged,
and have 6 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 00s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/buffer_create-915da5f1abaf0471
Gnuplot not found, using plotters backend
mutable                 time:   [463.11 us 463.57 us 464.07 us]                    
                        change: [-19.508% -18.571% -17.526%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

mutable prepared        time:   [527.84 us 528.46 us 529.14 us]                             
                        change: [-13.356% -12.522% -11.790%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Benchmarking from_slice: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
from_slice              time:   [1.1968 ms 1.1979 ms 1.1991 ms]                        
                        change: [-6.8697% -6.2029% -5.5812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

from_slice prepared     time:   [917.49 us 918.89 us 920.60 us]                                
                        change: [-6.5111% -5.9102% -5.3038%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

@github-actions
Copy link

github-actions bot commented Jan 2, 2021

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@codecov-io
Copy link

codecov-io commented Jan 2, 2021

Codecov Report

Merging #9076 (ecd2769) into master (fdf5e88) will decrease coverage by 0.05%.
The diff coverage is 94.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9076      +/-   ##
==========================================
- Coverage   82.57%   82.52%   -0.06%     
==========================================
  Files         204      204              
  Lines       50327    50247      -80     
==========================================
- Hits        41560    41467      -93     
- Misses       8767     8780      +13     
Impacted Files Coverage Δ
rust/arrow/src/array/equal/utils.rs 74.50% <0.00%> (+0.72%) ⬆️
rust/arrow/src/bytes.rs 53.12% <ø> (-5.21%) ⬇️
rust/arrow/src/compute/kernels/aggregate.rs 74.93% <ø> (-0.07%) ⬇️
rust/arrow/src/datatypes.rs 77.66% <ø> (ø)
rust/arrow/src/array/transform/list.rs 86.20% <66.66%> (+2.33%) ⬆️
rust/arrow/src/array/builder.rs 85.23% <86.79%> (-0.67%) ⬇️
rust/arrow/src/buffer.rs 95.95% <88.67%> (-1.77%) ⬇️
rust/arrow/src/array/array_binary.rs 90.98% <100.00%> (ø)
rust/arrow/src/array/array_list.rs 92.66% <100.00%> (-0.45%) ⬇️
rust/arrow/src/array/array_primitive.rs 92.28% <100.00%> (-0.04%) ⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdf5e88...ecd2769. Read the comment docs.

@Dandandan
Copy link
Contributor

Dandandan commented Jan 2, 2021

Great! Besides being a speed up, being able to push something of a certain type is nice.

I did some quick benchmarking of your branch on DataFusion - I see a speed up of ~5% on q1 / q12.

@@ -56,6 +57,23 @@ pub struct Buffer {
}

impl Buffer {
/// Initializes a [Buffer] from a slice of items.
pub fn from_slice_ref<U: ArrowNativeType, T: AsRef<[U]>>(items: &T) -> Self {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was necessary because it is not possible to have this generic on impl, see https://users.rust-lang.org/t/how-to-implement-from-asref-u-for-generic-u/53533/5

@jorgecarleitao
Copy link
Member Author

@Dandandan , could you run this again? I rebased it against the PR with the one that removes the zero alloc. I see a 20% improv on the buffer_create bench, but the datafusion is the real deal :)

@Dandandan
Copy link
Contributor

This PR:

Query 1 iteration 0 took 608.4 ms
Query 1 iteration 1 took 602.8 ms
Query 1 iteration 2 took 601.4 ms
Query 1 iteration 3 took 604.6 ms
Query 1 iteration 4 took 604.0 ms
Query 1 iteration 5 took 603.6 ms
Query 1 iteration 6 took 605.5 ms
Query 1 iteration 7 took 610.7 ms
Query 1 iteration 8 took 607.5 ms
Query 1 iteration 9 took 607.1 ms
Query 1 avg time: 605.55 ms

Master:

Query 1 iteration 0 took 642.9 ms
Query 1 iteration 1 took 640.5 ms
Query 1 iteration 2 took 638.1 ms
Query 1 iteration 3 took 639.8 ms
Query 1 iteration 4 took 641.9 ms
Query 1 iteration 5 took 646.4 ms
Query 1 iteration 6 took 638.6 ms
Query 1 iteration 7 took 640.6 ms
Query 1 iteration 8 took 653.0 ms
Query 1 iteration 9 took 648.6 ms
Query 1 avg time: 643.04 ms

@jorgecarleitao jorgecarleitao marked this pull request as ready for review January 2, 2021 12:58
let len = slice.len() * std::mem::size_of::<U>();
let capacity = bit_util::round_upto_multiple_of_64(len);
let buffer = memory::allocate_aligned(capacity);
unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe document a justification for the unsafe here? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was there in the previous version of the code

@@ -48,9 +48,8 @@ macro_rules! compare_op {
combine_option_bitmap($left.data_ref(), $right.data_ref(), $left.len())?;

let byte_capacity = bit_util::ceil($left.len(), 8);
let actual_capacity = bit_util::round_upto_multiple_of_64(byte_capacity);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 was just looking at this, was added without any need by me

pub fn push<T: ToByteSlice>(&mut self, item: &T) {
let additional = std::mem::size_of::<T>();
let new_len = self.len + additional;
if new_len > self.capacity {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we in a followup can make len / capacity similar as Vec to be in number of items instead of number of bytes?

This makes some implementations easier, e.g. looking at Vec https://doc.rust-lang.org/src/alloc/vec.rs.html#1206

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though about that also, but it requires more investigations. We would need to make MutableBuffer a generic over T and that has implications in our ability to build generic structures.

For example, builders::FieldData relies on a generic MutableBuffer, and MutableArrayData also.

Note that growing and freezing a MutableBuffer is now faster than Vec<u32> to store u32. We do still have a performance problem on which growing a MutableBuffer is faster than pre-reserving it.

@jorgecarleitao jorgecarleitao changed the title [Rust] Fixed performance issue in mutableBuffer. ARROW-11108: [Rust] Fixed performance issue in mutableBuffer. Jan 2, 2021
@github-actions
Copy link

github-actions bot commented Jan 2, 2021

@alamb alamb added the needs-rebase A PR that needs to be rebased by the author label Jan 5, 2021
@jorgecarleitao jorgecarleitao removed the needs-rebase A PR that needs to be rebased by the author label Jan 6, 2021
@jorgecarleitao
Copy link
Member Author

Ok, I have the benchmarks of this PR, which IMO is ready to review. FYI @alamb @nevi-me

critcmp master-08cccd68802c9ddc3ca0a5d4bad6e4ba382d74b4 perf_buffer-2dca3c85920c951d92992d343b69a5384178c6fd -t 10
group                                         master-08cccd68802c9ddc3ca0a5d4bad6e4ba382d74b4    perf_buffer-2dca3c85920c951d92992d343b69a5384178c6fd
-----                                         -----------------------------------------------    ----------------------------------------------------
array_slice 512                               1.17   453.4±37.97ns        ? B/sec                1.00    387.0±8.80ns        ? B/sec
array_string_from_vec 128                     1.28      4.2±0.22µs        ? B/sec                1.00      3.3±0.09µs        ? B/sec
array_string_from_vec 256                     1.45      5.9±0.14µs        ? B/sec                1.00      4.1±0.12µs        ? B/sec
array_string_from_vec 512                     1.66      9.6±0.28µs        ? B/sec                1.00      5.8±0.18µs        ? B/sec
buffer_bit_ops and                            1.23  715.9±278.78ns        ? B/sec                1.00   581.4±16.32ns        ? B/sec
cast date64 to date32 512                     1.11      7.6±0.40µs        ? B/sec                1.00      6.9±0.32µs        ? B/sec
cast float64 to float32 512                   1.21      3.2±0.10µs        ? B/sec                1.00      2.7±0.06µs        ? B/sec
cast float64 to uint64 512                    1.00      3.9±0.41µs        ? B/sec                1.11      4.4±0.12µs        ? B/sec
cast int32 to float32 512                     1.12      3.2±0.11µs        ? B/sec                1.00      2.8±0.08µs        ? B/sec
cast int32 to int64 512                       1.19      3.2±0.08µs        ? B/sec                1.00      2.7±0.07µs        ? B/sec
cast int32 to uint32 512                      1.42      3.9±0.10µs        ? B/sec                1.00      2.8±0.15µs        ? B/sec
cast int64 to int32 512                       1.96      5.1±0.27µs        ? B/sec                1.00      2.6±0.07µs        ? B/sec
cast time32s to time64us 512                  1.11      5.4±0.23µs        ? B/sec                1.00      4.9±0.15µs        ? B/sec
concat i32 1024                               1.00      4.2±0.09µs        ? B/sec                1.12      4.7±0.17µs        ? B/sec
concat str nulls 1024                         1.00     21.3±0.74µs        ? B/sec                1.14     24.3±0.73µs        ? B/sec
eq scalar Float32                             1.26     76.4±5.24µs        ? B/sec                1.00     60.6±3.56µs        ? B/sec
filter context f32 high selectivity           1.00    309.7±1.96µs        ? B/sec                1.12   346.4±10.64µs        ? B/sec
filter context u8 high selectivity            1.27      4.8±0.11µs        ? B/sec                1.00      3.8±0.14µs        ? B/sec
filter context u8 w NULLs high selectivity    1.00    299.2±2.82µs        ? B/sec                1.12    336.0±7.21µs        ? B/sec
filter u8 high selectivity                    1.11     12.2±0.18µs        ? B/sec                1.00     11.0±0.13µs        ? B/sec
from_slice prepared                           1.00   834.5±26.98µs        ? B/sec                1.15   963.4±36.11µs        ? B/sec
gt scalar Float32                             1.13     42.9±1.52µs        ? B/sec                1.00     37.9±1.83µs        ? B/sec
length                                        1.58     57.8±0.58µs        ? B/sec                1.00     36.5±9.01µs        ? B/sec
like_utf8 scalar complex                      1.00  1064.0±91.20µs        ? B/sec                1.17  1241.6±320.92µs        ? B/sec
like_utf8 scalar ends with                    1.14    317.0±8.58µs        ? B/sec                1.00    277.5±5.80µs        ? B/sec
like_utf8 scalar equals                       2.43    142.6±6.62µs        ? B/sec                1.00     58.6±2.81µs        ? B/sec
like_utf8 scalar starts with                  1.41   294.5±16.24µs        ? B/sec                1.00    208.9±7.83µs        ? B/sec
min string 512                                1.16      6.2±0.14µs        ? B/sec                1.00      5.3±0.12µs        ? B/sec
mutable                                       1.19   587.7±15.96µs        ? B/sec                1.00   492.5±17.55µs        ? B/sec
mutable prepared                              1.17   640.6±26.75µs        ? B/sec                1.00   549.0±21.62µs        ? B/sec
mutable str nulls 1024                        1.00      4.0±0.06ms        ? B/sec                1.34      5.3±0.05ms        ? B/sec
neq scalar Float32                            1.23     76.4±3.41µs        ? B/sec                1.00     61.9±6.27µs        ? B/sec
nlike_utf8 scalar starts with                 1.00   321.1±12.95µs        ? B/sec                1.27    407.8±9.19µs        ? B/sec
struct_array_from_vec 1024                    1.53     20.8±0.55µs        ? B/sec                1.00     13.6±0.66µs        ? B/sec
struct_array_from_vec 128                     1.16      7.2±0.21µs        ? B/sec                1.00      6.2±0.23µs        ? B/sec
struct_array_from_vec 256                     1.27      9.1±0.29µs        ? B/sec                1.00      7.1±0.20µs        ? B/sec
struct_array_from_vec 512                     1.40     13.0±0.27µs        ? B/sec                1.00      9.3±0.32µs        ? B/sec
take i32 1024                                 1.26      2.4±0.96µs        ? B/sec                1.00  1883.9±35.60ns        ? B/sec

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this PR line by line and I think I understand what is going on and it looks good to me, though I am not sure I would catch any subtle errors here.

I also ran the tests under valgrind (both in release and debug) which didn't detect any errors.

👍

==24169== 
==24169== HEAP SUMMARY:
==24169==     in use at exit: 813,390 bytes in 410 blocks
==24169==   total heap usage: 120,636 allocs, 120,226 frees, 25,950,900 bytes allocated
==24169== 
==24169== LEAK SUMMARY:
==24169==    definitely lost: 160 bytes in 5 blocks
==24169==    indirectly lost: 0 bytes in 0 blocks
==24169==      possibly lost: 1,980 bytes in 5 blocks
==24169==    still reachable: 811,250 bytes in 400 blocks
==24169==         suppressed: 0 bytes in 0 blocks
==24169== Rerun with --leak-check=full to see details of leaked memory
==24169== 
==24169== For lists of detected and suppressed errors, rerun with: -s
==24169== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


/// extends the `buffer` to be able to hold `len` bits, setting all bits of the new size to zero.
#[inline]
pub(super) fn reserve_for_bits(buffer: &mut MutableBuffer, len: usize) {
pub(super) fn resize_for_bits(buffer: &mut MutableBuffer, len: usize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a rename to make the names match more closely what they do, right?

/// ```
// For performance reasons, this must be inlined so that the `if` is executed inside the caller, and not as an extra call that just
// exits.
#[inline(always)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be that the use of inline(always) is what causes this PR to slow down some kernels (as the extra code either disables some additional optimization or blows some cache level)

@@ -184,23 +180,33 @@ fn is_like_pattern(c: char) -> bool {

pub fn like_utf8_scalar(left: &StringArray, right: &str) -> Result<BooleanArray> {
let null_bit_buffer = left.data().null_buffer().cloned();
let mut result = BooleanBufferBuilder::new(left.len());
let bytes = bit_util::ceil(left.len(), 8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it was an additional performance improvement?

@alamb
Copy link
Contributor

alamb commented Jan 11, 2021

I would also like to note that this is EPIC work @jorgecarleitao -- nicely done.

@alamb
Copy link
Contributor

alamb commented Jan 11, 2021

FWIW given the "touch most of the codepaths" property of this PR, it might be good to wait to merge this PR until after 3.0 is shipped (in the next few days) in case it introduces any subtle new bugs we would have longer to shake them out

@alamb
Copy link
Contributor

alamb commented Jan 17, 2021

I apologize for the delay in merging Rust PRs -- the 3.0 release is being finalized now and are planning to minimize entropy by postponing merging changes not critical for the release until the process was complete. I hope the process is complete in the next few days. There is more discussion in the mailing list

@alamb
Copy link
Contributor

alamb commented Jan 19, 2021

🎉

@jorgecarleitao
Copy link
Member Author

Well, it broke master... 🤣 @Dandandan already has a fix #9269 💯

@alamb
Copy link
Contributor

alamb commented Jan 19, 2021

The fix is merged, so hopefully master is back 🟢 ✅

pub mod compute;
pub mod csv;
pub mod datatypes;
pub mod error;
pub mod ffi;
pub mod ipc;
pub mod json;
pub mod memory;
mod memory;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgecarleitao

This change has broken a bunch of our code downstream as we build buffers directly. Can you make the memory module public again?

alamb pushed a commit that referenced this pull request Jan 23, 2021
… -97% for length)

# Rational

Rust forbids safely accessing uninitialized memory because it is undefined behavior. However, when building `Buffer`s, it is important to be able to _write_ to uninitialized memory regions, thereby avoiding the need to write _something_ to it before using it.

Currently, all our initializations are zeroed, which is expensive. #9076 modifies our allocator to allocate uninitialized regions. However, by itself, this is not useful if we do not offer any methods to write to those (uninitialized) regions.

# This PR

This PR is built on top of #9076 and introduces methods to extend a `MutableBuffer` from an iterator, thereby offering an API to efficiently grow `MutableBuffer` without having to initialize memory regions with zeros (i.e. without `with_bitset` and the like).

This PR also introduces methods to create a `Buffer` from an iterator and a `trusted_len` iterator.

The design is inspired in `Vec`, with the catch that we use stable Rust (i.e. no trait specialization, no `TrustedLen`), and thus have to expose a bit more methods than what `Vec` exposes. This means that we can't use that (nicer) API for trustedlen iterators based on `collect()`.

Note that, as before, there are still `unsafe` uses of the `MutableBuffer` derived from the fact that it is not a generic over a type `T` (and thus people can mix types and grow the buffer in unsound ways).

Special thanks to @mbrubeck for all the help on this, originally discussed [here](https://users.rust-lang.org/t/collect-for-exactsizediterator/54367/6).

```bash
git checkout master
cargo bench --bench arithmetic_kernels
git checkout length_faster
cargo bench --bench arithmetic_kernels

git checkout 16bc720
cargo bench --bench length_kernel
git checkout length_faster
```

```
Switched to branch 'length_faster'
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 02s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/arithmetic_kernels-ec2cc20ce07d9b83
Gnuplot not found, using plotters backend
add 512                 time:   [522.24 ns 523.67 ns 525.26 ns]
                        change: [-21.738% -20.960% -20.233%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  9 (9.00%) high mild
  3 (3.00%) high severe

subtract 512            time:   [503.18 ns 504.93 ns 506.81 ns]
                        change: [-21.741% -21.075% -20.308%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

multiply 512            time:   [508.25 ns 512.04 ns 516.06 ns]
                        change: [-22.569% -21.946% -21.305%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

divide 512              time:   [1.4711 us 1.4753 us 1.4799 us]
                        change: [-24.783% -23.305% -22.176%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

limit 512, 512          time:   [373.47 ns 377.76 ns 382.21 ns]
                        change: [+3.3055% +4.4193% +5.5923%] (p = 0.00 < 0.05)
                        Performance has regressed.

add_nulls_512           time:   [502.94 ns 504.51 ns 506.28 ns]
                        change: [-24.876% -24.299% -23.709%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

divide_nulls_512        time:   [1.4843 us 1.4931 us 1.5053 us]
                        change: [-22.968% -22.243% -21.420%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 100 measurements (24.00%)
  15 (15.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe
```

Length (against the commit that fixes the bench, `16bc7200f3baa6e526aea7135c60dcc949c9b592`, not master):

```
length                  time:   [1.5379 us 1.5408 us 1.5437 us]
                        change: [-97.311% -97.295% -97.278%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
```

Closes #9235 from jorgecarleitao/length_faster

Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Co-authored-by: Matt Brubeck <mbrubeck@limpet.net>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
kszucs pushed a commit that referenced this pull request Jan 25, 2021
This PR refactors `MutableBuffer::extend_from_slice` to remove the need to use `to_byte_slice` on every call, thereby removing its level of indirection, that does not allow the compiler to optimize out some code.

This is the second performance improvement originally presented in #8796 and, together with #9027 , brings the performance of "MutableBuffer" to the same level as `Vec<u8>`, in particular to building buffers on the fly.

Basically, when converting to a byte slice `&[u8]`, the compiler loses the type size information, and thus needs to perform extra checks and can't just optimize out the code.

This PR adopts the same API as `Vec<T>::extend_from_slice`, but since our buffers are in `u8` (i.e. a la `Vec<u8>`), I made the signature

```
pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T])
pub fn push<T: ToByteSlice>(&mut self, item: &T)
```

i.e. it consumes something that can be converted to a byte slice, but internally makes the conversion to bytes (as `to_byte_slice` was doing).

Credits for the root cause analysis that lead to this PR go to @Dandandan, [originally fielded here](#9016 (comment)).

> [...] current conversion to a byte slice may add some overhead? - @Dandandan

Benches (against master, so, both this PR and #9044 ):

```
Switched to branch 'perf_buffer'
Your branch and 'origin/perf_buffer' have diverged,
and have 6 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 00s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/buffer_create-915da5f1abaf0471
Gnuplot not found, using plotters backend
mutable                 time:   [463.11 us 463.57 us 464.07 us]
                        change: [-19.508% -18.571% -17.526%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

mutable prepared        time:   [527.84 us 528.46 us 529.14 us]
                        change: [-13.356% -12.522% -11.790%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Benchmarking from_slice: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
from_slice              time:   [1.1968 ms 1.1979 ms 1.1991 ms]
                        change: [-6.8697% -6.2029% -5.5812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

from_slice prepared     time:   [917.49 us 918.89 us 920.60 us]
                        change: [-6.5111% -5.9102% -5.3038%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
```

Closes #9076 from jorgecarleitao/perf_buffer

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
kszucs pushed a commit that referenced this pull request Jan 25, 2021
… -97% for length)

# Rational

Rust forbids safely accessing uninitialized memory because it is undefined behavior. However, when building `Buffer`s, it is important to be able to _write_ to uninitialized memory regions, thereby avoiding the need to write _something_ to it before using it.

Currently, all our initializations are zeroed, which is expensive. #9076 modifies our allocator to allocate uninitialized regions. However, by itself, this is not useful if we do not offer any methods to write to those (uninitialized) regions.

# This PR

This PR is built on top of #9076 and introduces methods to extend a `MutableBuffer` from an iterator, thereby offering an API to efficiently grow `MutableBuffer` without having to initialize memory regions with zeros (i.e. without `with_bitset` and the like).

This PR also introduces methods to create a `Buffer` from an iterator and a `trusted_len` iterator.

The design is inspired in `Vec`, with the catch that we use stable Rust (i.e. no trait specialization, no `TrustedLen`), and thus have to expose a bit more methods than what `Vec` exposes. This means that we can't use that (nicer) API for trustedlen iterators based on `collect()`.

Note that, as before, there are still `unsafe` uses of the `MutableBuffer` derived from the fact that it is not a generic over a type `T` (and thus people can mix types and grow the buffer in unsound ways).

Special thanks to @mbrubeck for all the help on this, originally discussed [here](https://users.rust-lang.org/t/collect-for-exactsizediterator/54367/6).

```bash
git checkout master
cargo bench --bench arithmetic_kernels
git checkout length_faster
cargo bench --bench arithmetic_kernels

git checkout 16bc720
cargo bench --bench length_kernel
git checkout length_faster
```

```
Switched to branch 'length_faster'
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 02s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/arithmetic_kernels-ec2cc20ce07d9b83
Gnuplot not found, using plotters backend
add 512                 time:   [522.24 ns 523.67 ns 525.26 ns]
                        change: [-21.738% -20.960% -20.233%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  9 (9.00%) high mild
  3 (3.00%) high severe

subtract 512            time:   [503.18 ns 504.93 ns 506.81 ns]
                        change: [-21.741% -21.075% -20.308%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

multiply 512            time:   [508.25 ns 512.04 ns 516.06 ns]
                        change: [-22.569% -21.946% -21.305%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

divide 512              time:   [1.4711 us 1.4753 us 1.4799 us]
                        change: [-24.783% -23.305% -22.176%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

limit 512, 512          time:   [373.47 ns 377.76 ns 382.21 ns]
                        change: [+3.3055% +4.4193% +5.5923%] (p = 0.00 < 0.05)
                        Performance has regressed.

add_nulls_512           time:   [502.94 ns 504.51 ns 506.28 ns]
                        change: [-24.876% -24.299% -23.709%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

divide_nulls_512        time:   [1.4843 us 1.4931 us 1.5053 us]
                        change: [-22.968% -22.243% -21.420%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 100 measurements (24.00%)
  15 (15.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe
```

Length (against the commit that fixes the bench, `16bc7200f3baa6e526aea7135c60dcc949c9b592`, not master):

```
length                  time:   [1.5379 us 1.5408 us 1.5437 us]
                        change: [-97.311% -97.295% -97.278%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
```

Closes #9235 from jorgecarleitao/length_faster

Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Co-authored-by: Matt Brubeck <mbrubeck@limpet.net>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
This PR refactors `MutableBuffer::extend_from_slice` to remove the need to use `to_byte_slice` on every call, thereby removing its level of indirection, that does not allow the compiler to optimize out some code.

This is the second performance improvement originally presented in apache#8796 and, together with apache#9027 , brings the performance of "MutableBuffer" to the same level as `Vec<u8>`, in particular to building buffers on the fly.

Basically, when converting to a byte slice `&[u8]`, the compiler loses the type size information, and thus needs to perform extra checks and can't just optimize out the code.

This PR adopts the same API as `Vec<T>::extend_from_slice`, but since our buffers are in `u8` (i.e. a la `Vec<u8>`), I made the signature

```
pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T])
pub fn push<T: ToByteSlice>(&mut self, item: &T)
```

i.e. it consumes something that can be converted to a byte slice, but internally makes the conversion to bytes (as `to_byte_slice` was doing).

Credits for the root cause analysis that lead to this PR go to @Dandandan, [originally fielded here](apache#9016 (comment)).

> [...] current conversion to a byte slice may add some overhead? - @Dandandan

Benches (against master, so, both this PR and apache#9044 ):

```
Switched to branch 'perf_buffer'
Your branch and 'origin/perf_buffer' have diverged,
and have 6 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 00s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/buffer_create-915da5f1abaf0471
Gnuplot not found, using plotters backend
mutable                 time:   [463.11 us 463.57 us 464.07 us]
                        change: [-19.508% -18.571% -17.526%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

mutable prepared        time:   [527.84 us 528.46 us 529.14 us]
                        change: [-13.356% -12.522% -11.790%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Benchmarking from_slice: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
from_slice              time:   [1.1968 ms 1.1979 ms 1.1991 ms]
                        change: [-6.8697% -6.2029% -5.5812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

from_slice prepared     time:   [917.49 us 918.89 us 920.60 us]
                        change: [-6.5111% -5.9102% -5.3038%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
```

Closes apache#9076 from jorgecarleitao/perf_buffer

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
… -97% for length)

# Rational

Rust forbids safely accessing uninitialized memory because it is undefined behavior. However, when building `Buffer`s, it is important to be able to _write_ to uninitialized memory regions, thereby avoiding the need to write _something_ to it before using it.

Currently, all our initializations are zeroed, which is expensive. apache#9076 modifies our allocator to allocate uninitialized regions. However, by itself, this is not useful if we do not offer any methods to write to those (uninitialized) regions.

# This PR

This PR is built on top of apache#9076 and introduces methods to extend a `MutableBuffer` from an iterator, thereby offering an API to efficiently grow `MutableBuffer` without having to initialize memory regions with zeros (i.e. without `with_bitset` and the like).

This PR also introduces methods to create a `Buffer` from an iterator and a `trusted_len` iterator.

The design is inspired in `Vec`, with the catch that we use stable Rust (i.e. no trait specialization, no `TrustedLen`), and thus have to expose a bit more methods than what `Vec` exposes. This means that we can't use that (nicer) API for trustedlen iterators based on `collect()`.

Note that, as before, there are still `unsafe` uses of the `MutableBuffer` derived from the fact that it is not a generic over a type `T` (and thus people can mix types and grow the buffer in unsound ways).

Special thanks to @mbrubeck for all the help on this, originally discussed [here](https://users.rust-lang.org/t/collect-for-exactsizediterator/54367/6).

```bash
git checkout master
cargo bench --bench arithmetic_kernels
git checkout length_faster
cargo bench --bench arithmetic_kernels

git checkout 16bc720
cargo bench --bench length_kernel
git checkout length_faster
```

```
Switched to branch 'length_faster'
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 02s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/arithmetic_kernels-ec2cc20ce07d9b83
Gnuplot not found, using plotters backend
add 512                 time:   [522.24 ns 523.67 ns 525.26 ns]
                        change: [-21.738% -20.960% -20.233%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  9 (9.00%) high mild
  3 (3.00%) high severe

subtract 512            time:   [503.18 ns 504.93 ns 506.81 ns]
                        change: [-21.741% -21.075% -20.308%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

multiply 512            time:   [508.25 ns 512.04 ns 516.06 ns]
                        change: [-22.569% -21.946% -21.305%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

divide 512              time:   [1.4711 us 1.4753 us 1.4799 us]
                        change: [-24.783% -23.305% -22.176%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

limit 512, 512          time:   [373.47 ns 377.76 ns 382.21 ns]
                        change: [+3.3055% +4.4193% +5.5923%] (p = 0.00 < 0.05)
                        Performance has regressed.

add_nulls_512           time:   [502.94 ns 504.51 ns 506.28 ns]
                        change: [-24.876% -24.299% -23.709%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

divide_nulls_512        time:   [1.4843 us 1.4931 us 1.5053 us]
                        change: [-22.968% -22.243% -21.420%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 100 measurements (24.00%)
  15 (15.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe
```

Length (against the commit that fixes the bench, `16bc7200f3baa6e526aea7135c60dcc949c9b592`, not master):

```
length                  time:   [1.5379 us 1.5408 us 1.5437 us]
                        change: [-97.311% -97.295% -97.278%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
```

Closes apache#9235 from jorgecarleitao/length_faster

Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Co-authored-by: Matt Brubeck <mbrubeck@limpet.net>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
This PR refactors `MutableBuffer::extend_from_slice` to remove the need to use `to_byte_slice` on every call, thereby removing its level of indirection, that does not allow the compiler to optimize out some code.

This is the second performance improvement originally presented in apache#8796 and, together with apache#9027 , brings the performance of "MutableBuffer" to the same level as `Vec<u8>`, in particular to building buffers on the fly.

Basically, when converting to a byte slice `&[u8]`, the compiler loses the type size information, and thus needs to perform extra checks and can't just optimize out the code.

This PR adopts the same API as `Vec<T>::extend_from_slice`, but since our buffers are in `u8` (i.e. a la `Vec<u8>`), I made the signature

```
pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T])
pub fn push<T: ToByteSlice>(&mut self, item: &T)
```

i.e. it consumes something that can be converted to a byte slice, but internally makes the conversion to bytes (as `to_byte_slice` was doing).

Credits for the root cause analysis that lead to this PR go to @Dandandan, [originally fielded here](apache#9016 (comment)).

> [...] current conversion to a byte slice may add some overhead? - @Dandandan

Benches (against master, so, both this PR and apache#9044 ):

```
Switched to branch 'perf_buffer'
Your branch and 'origin/perf_buffer' have diverged,
and have 6 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 00s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/buffer_create-915da5f1abaf0471
Gnuplot not found, using plotters backend
mutable                 time:   [463.11 us 463.57 us 464.07 us]
                        change: [-19.508% -18.571% -17.526%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

mutable prepared        time:   [527.84 us 528.46 us 529.14 us]
                        change: [-13.356% -12.522% -11.790%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Benchmarking from_slice: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
from_slice              time:   [1.1968 ms 1.1979 ms 1.1991 ms]
                        change: [-6.8697% -6.2029% -5.5812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

from_slice prepared     time:   [917.49 us 918.89 us 920.60 us]
                        change: [-6.5111% -5.9102% -5.3038%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
```

Closes apache#9076 from jorgecarleitao/perf_buffer

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
… -97% for length)

# Rational

Rust forbids safely accessing uninitialized memory because it is undefined behavior. However, when building `Buffer`s, it is important to be able to _write_ to uninitialized memory regions, thereby avoiding the need to write _something_ to it before using it.

Currently, all our initializations are zeroed, which is expensive. apache#9076 modifies our allocator to allocate uninitialized regions. However, by itself, this is not useful if we do not offer any methods to write to those (uninitialized) regions.

# This PR

This PR is built on top of apache#9076 and introduces methods to extend a `MutableBuffer` from an iterator, thereby offering an API to efficiently grow `MutableBuffer` without having to initialize memory regions with zeros (i.e. without `with_bitset` and the like).

This PR also introduces methods to create a `Buffer` from an iterator and a `trusted_len` iterator.

The design is inspired in `Vec`, with the catch that we use stable Rust (i.e. no trait specialization, no `TrustedLen`), and thus have to expose a bit more methods than what `Vec` exposes. This means that we can't use that (nicer) API for trustedlen iterators based on `collect()`.

Note that, as before, there are still `unsafe` uses of the `MutableBuffer` derived from the fact that it is not a generic over a type `T` (and thus people can mix types and grow the buffer in unsound ways).

Special thanks to @mbrubeck for all the help on this, originally discussed [here](https://users.rust-lang.org/t/collect-for-exactsizediterator/54367/6).

```bash
git checkout master
cargo bench --bench arithmetic_kernels
git checkout length_faster
cargo bench --bench arithmetic_kernels

git checkout 16bc720
cargo bench --bench length_kernel
git checkout length_faster
```

```
Switched to branch 'length_faster'
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 02s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/arithmetic_kernels-ec2cc20ce07d9b83
Gnuplot not found, using plotters backend
add 512                 time:   [522.24 ns 523.67 ns 525.26 ns]
                        change: [-21.738% -20.960% -20.233%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  9 (9.00%) high mild
  3 (3.00%) high severe

subtract 512            time:   [503.18 ns 504.93 ns 506.81 ns]
                        change: [-21.741% -21.075% -20.308%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

multiply 512            time:   [508.25 ns 512.04 ns 516.06 ns]
                        change: [-22.569% -21.946% -21.305%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

divide 512              time:   [1.4711 us 1.4753 us 1.4799 us]
                        change: [-24.783% -23.305% -22.176%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

limit 512, 512          time:   [373.47 ns 377.76 ns 382.21 ns]
                        change: [+3.3055% +4.4193% +5.5923%] (p = 0.00 < 0.05)
                        Performance has regressed.

add_nulls_512           time:   [502.94 ns 504.51 ns 506.28 ns]
                        change: [-24.876% -24.299% -23.709%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

divide_nulls_512        time:   [1.4843 us 1.4931 us 1.5053 us]
                        change: [-22.968% -22.243% -21.420%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 100 measurements (24.00%)
  15 (15.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe
```

Length (against the commit that fixes the bench, `16bc7200f3baa6e526aea7135c60dcc949c9b592`, not master):

```
length                  time:   [1.5379 us 1.5408 us 1.5437 us]
                        change: [-97.311% -97.295% -97.278%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
```

Closes apache#9235 from jorgecarleitao/length_faster

Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Co-authored-by: Matt Brubeck <mbrubeck@limpet.net>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants