ARROW-11108: [Rust] Fixed performance issue in mutableBuffer. #9076

jorgecarleitao · 2021-01-02T09:49:33Z

This PR refactors MutableBuffer::extend_from_slice to remove the need to use to_byte_slice on every call, thereby removing its level of indirection, that does not allow the compiler to optimize out some code.

This is the second performance improvement originally presented in #8796 and, together with #9027 , brings the performance of "MutableBuffer" to the same level as Vec<u8>, in particular to building buffers on the fly.

Basically, when converting to a byte slice &[u8], the compiler loses the type size information, and thus needs to perform extra checks and can't just optimize out the code.

This PR adopts the same API as Vec<T>::extend_from_slice, but since our buffers are in u8 (i.e. a la Vec<u8>), I made the signature

pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T])
pub fn push<T: ToByteSlice>(&mut self, item: &T)

i.e. it consumes something that can be converted to a byte slice, but internally makes the conversion to bytes (as to_byte_slice was doing).

Credits for the root cause analysis that lead to this PR go to @Dandandan, originally fielded here.

[...] current conversion to a byte slice may add some overhead? - @Dandandan

Benches (against master, so, both this PR and #9044 ):

Switched to branch 'perf_buffer'
Your branch and 'origin/perf_buffer' have diverged,
and have 6 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
   Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 1m 00s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/buffer_create-915da5f1abaf0471
Gnuplot not found, using plotters backend
mutable                 time:   [463.11 us 463.57 us 464.07 us]                    
                        change: [-19.508% -18.571% -17.526%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

mutable prepared        time:   [527.84 us 528.46 us 529.14 us]                             
                        change: [-13.356% -12.522% -11.790%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Benchmarking from_slice: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
from_slice              time:   [1.1968 ms 1.1979 ms 1.1991 ms]                        
                        change: [-6.8697% -6.2029% -5.5812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

from_slice prepared     time:   [917.49 us 918.89 us 920.60 us]                                
                        change: [-6.5111% -5.9102% -5.3038%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

github-actions · 2021-01-02T09:54:42Z

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

Codecov Report

Merging #9076 (ecd2769) into master (fdf5e88) will decrease coverage by 0.05%.
The diff coverage is 94.00%.

@@            Coverage Diff             @@
##           master    #9076      +/-   ##
==========================================
- Coverage   82.57%   82.52%   -0.06%     
==========================================
  Files         204      204              
  Lines       50327    50247      -80     
==========================================
- Hits        41560    41467      -93     
- Misses       8767     8780      +13

Impacted Files	Coverage Δ
rust/arrow/src/array/equal/utils.rs	`74.50% <0.00%> (+0.72%)`	⬆️
rust/arrow/src/bytes.rs	`53.12% <ø> (-5.21%)`	⬇️
rust/arrow/src/compute/kernels/aggregate.rs	`74.93% <ø> (-0.07%)`	⬇️
rust/arrow/src/datatypes.rs	`77.66% <ø> (ø)`
rust/arrow/src/array/transform/list.rs	`86.20% <66.66%> (+2.33%)`	⬆️
rust/arrow/src/array/builder.rs	`85.23% <86.79%> (-0.67%)`	⬇️
rust/arrow/src/buffer.rs	`95.95% <88.67%> (-1.77%)`	⬇️
rust/arrow/src/array/array_binary.rs	`90.98% <100.00%> (ø)`
rust/arrow/src/array/array_list.rs	`92.66% <100.00%> (-0.45%)`	⬇️
rust/arrow/src/array/array_primitive.rs	`92.28% <100.00%> (-0.04%)`	⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdf5e88...ecd2769. Read the comment docs.

Dandandan · 2021-01-02T10:51:43Z

Great! Besides being a speed up, being able to push something of a certain type is nice.

I did some quick benchmarking of your branch on DataFusion - I see a speed up of ~5% on q1 / q12.

jorgecarleitao · 2021-01-02T10:59:37Z

rust/arrow/src/buffer.rs

@@ -56,6 +57,23 @@ pub struct Buffer {
 }

 impl Buffer {
+    /// Initializes a [Buffer] from a slice of items.
+    pub fn from_slice_ref<U: ArrowNativeType, T: AsRef<[U]>>(items: &T) -> Self {


This was necessary because it is not possible to have this generic on impl, see https://users.rust-lang.org/t/how-to-implement-from-asref-u-for-generic-u/53533/5

jorgecarleitao · 2021-01-02T11:58:43Z

@Dandandan , could you run this again? I rebased it against the PR with the one that removes the zero alloc. I see a 20% improv on the buffer_create bench, but the datafusion is the real deal :)

Dandandan · 2021-01-02T12:31:58Z

This PR:

Query 1 iteration 0 took 608.4 ms
Query 1 iteration 1 took 602.8 ms
Query 1 iteration 2 took 601.4 ms
Query 1 iteration 3 took 604.6 ms
Query 1 iteration 4 took 604.0 ms
Query 1 iteration 5 took 603.6 ms
Query 1 iteration 6 took 605.5 ms
Query 1 iteration 7 took 610.7 ms
Query 1 iteration 8 took 607.5 ms
Query 1 iteration 9 took 607.1 ms
Query 1 avg time: 605.55 ms

Master:

Query 1 iteration 0 took 642.9 ms
Query 1 iteration 1 took 640.5 ms
Query 1 iteration 2 took 638.1 ms
Query 1 iteration 3 took 639.8 ms
Query 1 iteration 4 took 641.9 ms
Query 1 iteration 5 took 646.4 ms
Query 1 iteration 6 took 638.6 ms
Query 1 iteration 7 took 640.6 ms
Query 1 iteration 8 took 653.0 ms
Query 1 iteration 9 took 648.6 ms
Query 1 avg time: 643.04 ms

Dandandan · 2021-01-02T13:16:36Z

rust/arrow/src/buffer.rs

+        let len = slice.len() * std::mem::size_of::<U>();
+        let capacity = bit_util::round_upto_multiple_of_64(len);
+        let buffer = memory::allocate_aligned(capacity);
+        unsafe {


Maybe document a justification for the unsafe here? :)

I think it was there in the previous version of the code

Dandandan · 2021-01-02T13:21:04Z

rust/arrow/src/compute/kernels/comparison.rs

@@ -48,9 +48,8 @@ macro_rules! compare_op {
            combine_option_bitmap($left.data_ref(), $right.data_ref(), $left.len())?;

        let byte_capacity = bit_util::ceil($left.len(), 8);
-        let actual_capacity = bit_util::round_upto_multiple_of_64(byte_capacity);


👍 was just looking at this, was added without any need by me

Dandandan · 2021-01-02T18:43:53Z

rust/arrow/src/buffer.rs

+    pub fn push<T: ToByteSlice>(&mut self, item: &T) {
+        let additional = std::mem::size_of::<T>();
+        let new_len = self.len + additional;
+        if new_len > self.capacity {


I am wondering if we in a followup can make len / capacity similar as Vec to be in number of items instead of number of bytes?

This makes some implementations easier, e.g. looking at Vec https://doc.rust-lang.org/src/alloc/vec.rs.html#1206

I though about that also, but it requires more investigations. We would need to make MutableBuffer a generic over T and that has implications in our ability to build generic structures.

For example, builders::FieldData relies on a generic MutableBuffer, and MutableArrayData also.

Note that growing and freezing a MutableBuffer is now faster than Vec<u32> to store u32. We do still have a performance problem on which growing a MutableBuffer is faster than pre-reserving it.

rust/arrow/src/buffer.rs

github-actions · 2021-01-02T19:56:58Z

https://issues.apache.org/jira/browse/ARROW-11108

rust/arrow/src/buffer.rs

jorgecarleitao · 2021-01-10T15:15:15Z

Ok, I have the benchmarks of this PR, which IMO is ready to review. FYI @alamb @nevi-me

critcmp master-08cccd68802c9ddc3ca0a5d4bad6e4ba382d74b4 perf_buffer-2dca3c85920c951d92992d343b69a5384178c6fd -t 10

group                                         master-08cccd68802c9ddc3ca0a5d4bad6e4ba382d74b4    perf_buffer-2dca3c85920c951d92992d343b69a5384178c6fd
-----                                         -----------------------------------------------    ----------------------------------------------------
array_slice 512                               1.17   453.4±37.97ns        ? B/sec                1.00    387.0±8.80ns        ? B/sec
array_string_from_vec 128                     1.28      4.2±0.22µs        ? B/sec                1.00      3.3±0.09µs        ? B/sec
array_string_from_vec 256                     1.45      5.9±0.14µs        ? B/sec                1.00      4.1±0.12µs        ? B/sec
array_string_from_vec 512                     1.66      9.6±0.28µs        ? B/sec                1.00      5.8±0.18µs        ? B/sec
buffer_bit_ops and                            1.23  715.9±278.78ns        ? B/sec                1.00   581.4±16.32ns        ? B/sec
cast date64 to date32 512                     1.11      7.6±0.40µs        ? B/sec                1.00      6.9±0.32µs        ? B/sec
cast float64 to float32 512                   1.21      3.2±0.10µs        ? B/sec                1.00      2.7±0.06µs        ? B/sec
cast float64 to uint64 512                    1.00      3.9±0.41µs        ? B/sec                1.11      4.4±0.12µs        ? B/sec
cast int32 to float32 512                     1.12      3.2±0.11µs        ? B/sec                1.00      2.8±0.08µs        ? B/sec
cast int32 to int64 512                       1.19      3.2±0.08µs        ? B/sec                1.00      2.7±0.07µs        ? B/sec
cast int32 to uint32 512                      1.42      3.9±0.10µs        ? B/sec                1.00      2.8±0.15µs        ? B/sec
cast int64 to int32 512                       1.96      5.1±0.27µs        ? B/sec                1.00      2.6±0.07µs        ? B/sec
cast time32s to time64us 512                  1.11      5.4±0.23µs        ? B/sec                1.00      4.9±0.15µs        ? B/sec
concat i32 1024                               1.00      4.2±0.09µs        ? B/sec                1.12      4.7±0.17µs        ? B/sec
concat str nulls 1024                         1.00     21.3±0.74µs        ? B/sec                1.14     24.3±0.73µs        ? B/sec
eq scalar Float32                             1.26     76.4±5.24µs        ? B/sec                1.00     60.6±3.56µs        ? B/sec
filter context f32 high selectivity           1.00    309.7±1.96µs        ? B/sec                1.12   346.4±10.64µs        ? B/sec
filter context u8 high selectivity            1.27      4.8±0.11µs        ? B/sec                1.00      3.8±0.14µs        ? B/sec
filter context u8 w NULLs high selectivity    1.00    299.2±2.82µs        ? B/sec                1.12    336.0±7.21µs        ? B/sec
filter u8 high selectivity                    1.11     12.2±0.18µs        ? B/sec                1.00     11.0±0.13µs        ? B/sec
from_slice prepared                           1.00   834.5±26.98µs        ? B/sec                1.15   963.4±36.11µs        ? B/sec
gt scalar Float32                             1.13     42.9±1.52µs        ? B/sec                1.00     37.9±1.83µs        ? B/sec
length                                        1.58     57.8±0.58µs        ? B/sec                1.00     36.5±9.01µs        ? B/sec
like_utf8 scalar complex                      1.00  1064.0±91.20µs        ? B/sec                1.17  1241.6±320.92µs        ? B/sec
like_utf8 scalar ends with                    1.14    317.0±8.58µs        ? B/sec                1.00    277.5±5.80µs        ? B/sec
like_utf8 scalar equals                       2.43    142.6±6.62µs        ? B/sec                1.00     58.6±2.81µs        ? B/sec
like_utf8 scalar starts with                  1.41   294.5±16.24µs        ? B/sec                1.00    208.9±7.83µs        ? B/sec
min string 512                                1.16      6.2±0.14µs        ? B/sec                1.00      5.3±0.12µs        ? B/sec
mutable                                       1.19   587.7±15.96µs        ? B/sec                1.00   492.5±17.55µs        ? B/sec
mutable prepared                              1.17   640.6±26.75µs        ? B/sec                1.00   549.0±21.62µs        ? B/sec
mutable str nulls 1024                        1.00      4.0±0.06ms        ? B/sec                1.34      5.3±0.05ms        ? B/sec
neq scalar Float32                            1.23     76.4±3.41µs        ? B/sec                1.00     61.9±6.27µs        ? B/sec
nlike_utf8 scalar starts with                 1.00   321.1±12.95µs        ? B/sec                1.27    407.8±9.19µs        ? B/sec
struct_array_from_vec 1024                    1.53     20.8±0.55µs        ? B/sec                1.00     13.6±0.66µs        ? B/sec
struct_array_from_vec 128                     1.16      7.2±0.21µs        ? B/sec                1.00      6.2±0.23µs        ? B/sec
struct_array_from_vec 256                     1.27      9.1±0.29µs        ? B/sec                1.00      7.1±0.20µs        ? B/sec
struct_array_from_vec 512                     1.40     13.0±0.27µs        ? B/sec                1.00      9.3±0.32µs        ? B/sec
take i32 1024                                 1.26      2.4±0.96µs        ? B/sec                1.00  1883.9±35.60ns        ? B/sec

alamb

I went through this PR line by line and I think I understand what is going on and it looks good to me, though I am not sure I would catch any subtle errors here.

I also ran the tests under valgrind (both in release and debug) which didn't detect any errors.

👍

==24169== 
==24169== HEAP SUMMARY:
==24169==     in use at exit: 813,390 bytes in 410 blocks
==24169==   total heap usage: 120,636 allocs, 120,226 frees, 25,950,900 bytes allocated
==24169== 
==24169== LEAK SUMMARY:
==24169==    definitely lost: 160 bytes in 5 blocks
==24169==    indirectly lost: 0 bytes in 0 blocks
==24169==      possibly lost: 1,980 bytes in 5 blocks
==24169==    still reachable: 811,250 bytes in 400 blocks
==24169==         suppressed: 0 bytes in 0 blocks
==24169== Rerun with --leak-check=full to see details of leaked memory
==24169== 
==24169== For lists of detected and suppressed errors, rerun with: -s
==24169== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

alamb · 2021-01-11T22:12:58Z

rust/arrow/src/array/transform/utils.rs


 /// extends the `buffer` to be able to hold `len` bits, setting all bits of the new size to zero.
 #[inline]
-pub(super) fn reserve_for_bits(buffer: &mut MutableBuffer, len: usize) {
+pub(super) fn resize_for_bits(buffer: &mut MutableBuffer, len: usize) {


this is just a rename to make the names match more closely what they do, right?

alamb · 2021-01-11T22:16:38Z

rust/arrow/src/buffer.rs

+    /// ```
+    // For performance reasons, this must be inlined so that the `if` is executed inside the caller, and not as an extra call that just
+    // exits.
+    #[inline(always)]


I would be that the use of inline(always) is what causes this PR to slow down some kernels (as the extra code either disables some additional optimization or blows some cache level)

alamb · 2021-01-11T22:19:03Z

rust/arrow/src/compute/kernels/comparison.rs

@@ -184,23 +180,33 @@ fn is_like_pattern(c: char) -> bool {

 pub fn like_utf8_scalar(left: &StringArray, right: &str) -> Result<BooleanArray> {
    let null_bit_buffer = left.data().null_buffer().cloned();
-    let mut result = BooleanBufferBuilder::new(left.len());
+    let bytes = bit_util::ceil(left.len(), 8);


this looks like it was an additional performance improvement?

alamb · 2021-01-11T22:21:59Z

I would also like to note that this is EPIC work @jorgecarleitao -- nicely done.

alamb · 2021-01-11T22:25:44Z

FWIW given the "touch most of the codepaths" property of this PR, it might be good to wait to merge this PR until after 3.0 is shipped (in the next few days) in case it introduces any subtle new bugs we would have longer to shake them out

alamb · 2021-01-17T11:11:08Z

I apologize for the delay in merging Rust PRs -- the 3.0 release is being finalized now and are planning to minimize entropy by postponing merging changes not critical for the release until the process was complete. I hope the process is complete in the next few days. There is more discussion in the mailing list

alamb · 2021-01-19T18:23:39Z

🎉

jorgecarleitao · 2021-01-19T18:25:32Z

Well, it broke master... 🤣 @Dandandan already has a fix #9269 💯

alamb · 2021-01-19T18:52:53Z

The fix is merged, so hopefully master is back 🟢 ✅

maxburke · 2021-01-19T21:09:09Z

rust/arrow/src/lib.rs

 pub mod compute;
 pub mod csv;
 pub mod datatypes;
 pub mod error;
 pub mod ffi;
 pub mod ipc;
 pub mod json;
-pub mod memory;
+mod memory;


@jorgecarleitao

This change has broken a bunch of our code downstream as we build buffers directly. Can you make the memory module public again?

@mbrubeck

… -97% for length) # Rational Rust forbids safely accessing uninitialized memory because it is undefined behavior. However, when building `Buffer`s, it is important to be able to _write_ to uninitialized memory regions, thereby avoiding the need to write _something_ to it before using it. Currently, all our initializations are zeroed, which is expensive. #9076 modifies our allocator to allocate uninitialized regions. However, by itself, this is not useful if we do not offer any methods to write to those (uninitialized) regions. # This PR This PR is built on top of #9076 and introduces methods to extend a `MutableBuffer` from an iterator, thereby offering an API to efficiently grow `MutableBuffer` without having to initialize memory regions with zeros (i.e. without `with_bitset` and the like). This PR also introduces methods to create a `Buffer` from an iterator and a `trusted_len` iterator. The design is inspired in `Vec`, with the catch that we use stable Rust (i.e. no trait specialization, no `TrustedLen`), and thus have to expose a bit more methods than what `Vec` exposes. This means that we can't use that (nicer) API for trustedlen iterators based on `collect()`. Note that, as before, there are still `unsafe` uses of the `MutableBuffer` derived from the fact that it is not a generic over a type `T` (and thus people can mix types and grow the buffer in unsound ways). Special thanks to @mbrubeck for all the help on this, originally discussed [here](https://users.rust-lang.org/t/collect-for-exactsizediterator/54367/6). ```bash git checkout master cargo bench --bench arithmetic_kernels git checkout length_faster cargo bench --bench arithmetic_kernels git checkout 16bc720 cargo bench --bench length_kernel git checkout length_faster ``` ``` Switched to branch 'length_faster' (use "git pull" to merge the remote branch into yours) Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 1m 02s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/arithmetic_kernels-ec2cc20ce07d9b83 Gnuplot not found, using plotters backend add 512 time: [522.24 ns 523.67 ns 525.26 ns] change: [-21.738% -20.960% -20.233%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 9 (9.00%) high mild 3 (3.00%) high severe subtract 512 time: [503.18 ns 504.93 ns 506.81 ns] change: [-21.741% -21.075% -20.308%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe multiply 512 time: [508.25 ns 512.04 ns 516.06 ns] change: [-22.569% -21.946% -21.305%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild divide 512 time: [1.4711 us 1.4753 us 1.4799 us] change: [-24.783% -23.305% -22.176%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe limit 512, 512 time: [373.47 ns 377.76 ns 382.21 ns] change: [+3.3055% +4.4193% +5.5923%] (p = 0.00 < 0.05) Performance has regressed. add_nulls_512 time: [502.94 ns 504.51 ns 506.28 ns] change: [-24.876% -24.299% -23.709%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe divide_nulls_512 time: [1.4843 us 1.4931 us 1.5053 us] change: [-22.968% -22.243% -21.420%] (p = 0.00 < 0.05) Performance has improved. Found 24 outliers among 100 measurements (24.00%) 15 (15.00%) low mild 1 (1.00%) high mild 8 (8.00%) high severe ``` Length (against the commit that fixes the bench, `16bc7200f3baa6e526aea7135c60dcc949c9b592`, not master): ``` length time: [1.5379 us 1.5408 us 1.5437 us] change: [-97.311% -97.295% -97.278%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe ``` Closes #9235 from jorgecarleitao/length_faster Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>

@Dandandan

This PR refactors `MutableBuffer::extend_from_slice` to remove the need to use `to_byte_slice` on every call, thereby removing its level of indirection, that does not allow the compiler to optimize out some code. This is the second performance improvement originally presented in #8796 and, together with #9027 , brings the performance of "MutableBuffer" to the same level as `Vec<u8>`, in particular to building buffers on the fly. Basically, when converting to a byte slice `&[u8]`, the compiler loses the type size information, and thus needs to perform extra checks and can't just optimize out the code. This PR adopts the same API as `Vec<T>::extend_from_slice`, but since our buffers are in `u8` (i.e. a la `Vec<u8>`), I made the signature ``` pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T]) pub fn push<T: ToByteSlice>(&mut self, item: &T) ``` i.e. it consumes something that can be converted to a byte slice, but internally makes the conversion to bytes (as `to_byte_slice` was doing). Credits for the root cause analysis that lead to this PR go to @Dandandan, [originally fielded here](#9016 (comment)). > [...] current conversion to a byte slice may add some overhead? - @Dandandan Benches (against master, so, both this PR and #9044 ): ``` Switched to branch 'perf_buffer' Your branch and 'origin/perf_buffer' have diverged, and have 6 and 1 different commits each, respectively. (use "git pull" to merge the remote branch into yours) Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 1m 00s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/buffer_create-915da5f1abaf0471 Gnuplot not found, using plotters backend mutable time: [463.11 us 463.57 us 464.07 us] change: [-19.508% -18.571% -17.526%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) high mild 9 (9.00%) high severe mutable prepared time: [527.84 us 528.46 us 529.14 us] change: [-13.356% -12.522% -11.790%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe Benchmarking from_slice: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60. from_slice time: [1.1968 ms 1.1979 ms 1.1991 ms] change: [-6.8697% -6.2029% -5.5812%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe from_slice prepared time: [917.49 us 918.89 us 920.60 us] change: [-6.5111% -5.9102% -5.3038%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe ``` Closes #9076 from jorgecarleitao/perf_buffer Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>

@mbrubeck

… -97% for length) # Rational Rust forbids safely accessing uninitialized memory because it is undefined behavior. However, when building `Buffer`s, it is important to be able to _write_ to uninitialized memory regions, thereby avoiding the need to write _something_ to it before using it. Currently, all our initializations are zeroed, which is expensive. #9076 modifies our allocator to allocate uninitialized regions. However, by itself, this is not useful if we do not offer any methods to write to those (uninitialized) regions. # This PR This PR is built on top of #9076 and introduces methods to extend a `MutableBuffer` from an iterator, thereby offering an API to efficiently grow `MutableBuffer` without having to initialize memory regions with zeros (i.e. without `with_bitset` and the like). This PR also introduces methods to create a `Buffer` from an iterator and a `trusted_len` iterator. The design is inspired in `Vec`, with the catch that we use stable Rust (i.e. no trait specialization, no `TrustedLen`), and thus have to expose a bit more methods than what `Vec` exposes. This means that we can't use that (nicer) API for trustedlen iterators based on `collect()`. Note that, as before, there are still `unsafe` uses of the `MutableBuffer` derived from the fact that it is not a generic over a type `T` (and thus people can mix types and grow the buffer in unsound ways). Special thanks to @mbrubeck for all the help on this, originally discussed [here](https://users.rust-lang.org/t/collect-for-exactsizediterator/54367/6). ```bash git checkout master cargo bench --bench arithmetic_kernels git checkout length_faster cargo bench --bench arithmetic_kernels git checkout 16bc720 cargo bench --bench length_kernel git checkout length_faster ``` ``` Switched to branch 'length_faster' (use "git pull" to merge the remote branch into yours) Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 1m 02s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/arithmetic_kernels-ec2cc20ce07d9b83 Gnuplot not found, using plotters backend add 512 time: [522.24 ns 523.67 ns 525.26 ns] change: [-21.738% -20.960% -20.233%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 9 (9.00%) high mild 3 (3.00%) high severe subtract 512 time: [503.18 ns 504.93 ns 506.81 ns] change: [-21.741% -21.075% -20.308%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe multiply 512 time: [508.25 ns 512.04 ns 516.06 ns] change: [-22.569% -21.946% -21.305%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild divide 512 time: [1.4711 us 1.4753 us 1.4799 us] change: [-24.783% -23.305% -22.176%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe limit 512, 512 time: [373.47 ns 377.76 ns 382.21 ns] change: [+3.3055% +4.4193% +5.5923%] (p = 0.00 < 0.05) Performance has regressed. add_nulls_512 time: [502.94 ns 504.51 ns 506.28 ns] change: [-24.876% -24.299% -23.709%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe divide_nulls_512 time: [1.4843 us 1.4931 us 1.5053 us] change: [-22.968% -22.243% -21.420%] (p = 0.00 < 0.05) Performance has improved. Found 24 outliers among 100 measurements (24.00%) 15 (15.00%) low mild 1 (1.00%) high mild 8 (8.00%) high severe ``` Length (against the commit that fixes the bench, `16bc7200f3baa6e526aea7135c60dcc949c9b592`, not master): ``` length time: [1.5379 us 1.5408 us 1.5437 us] change: [-97.311% -97.295% -97.278%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe ``` Closes #9235 from jorgecarleitao/length_faster Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>

@Dandandan

This PR refactors `MutableBuffer::extend_from_slice` to remove the need to use `to_byte_slice` on every call, thereby removing its level of indirection, that does not allow the compiler to optimize out some code. This is the second performance improvement originally presented in apache#8796 and, together with apache#9027 , brings the performance of "MutableBuffer" to the same level as `Vec<u8>`, in particular to building buffers on the fly. Basically, when converting to a byte slice `&[u8]`, the compiler loses the type size information, and thus needs to perform extra checks and can't just optimize out the code. This PR adopts the same API as `Vec<T>::extend_from_slice`, but since our buffers are in `u8` (i.e. a la `Vec<u8>`), I made the signature ``` pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T]) pub fn push<T: ToByteSlice>(&mut self, item: &T) ``` i.e. it consumes something that can be converted to a byte slice, but internally makes the conversion to bytes (as `to_byte_slice` was doing). Credits for the root cause analysis that lead to this PR go to @Dandandan, [originally fielded here](apache#9016 (comment)). > [...] current conversion to a byte slice may add some overhead? - @Dandandan Benches (against master, so, both this PR and apache#9044 ): ``` Switched to branch 'perf_buffer' Your branch and 'origin/perf_buffer' have diverged, and have 6 and 1 different commits each, respectively. (use "git pull" to merge the remote branch into yours) Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 1m 00s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/buffer_create-915da5f1abaf0471 Gnuplot not found, using plotters backend mutable time: [463.11 us 463.57 us 464.07 us] change: [-19.508% -18.571% -17.526%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) high mild 9 (9.00%) high severe mutable prepared time: [527.84 us 528.46 us 529.14 us] change: [-13.356% -12.522% -11.790%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe Benchmarking from_slice: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60. from_slice time: [1.1968 ms 1.1979 ms 1.1991 ms] change: [-6.8697% -6.2029% -5.5812%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe from_slice prepared time: [917.49 us 918.89 us 920.60 us] change: [-6.5111% -5.9102% -5.3038%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe ``` Closes apache#9076 from jorgecarleitao/perf_buffer Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>

@mbrubeck

… -97% for length) # Rational Rust forbids safely accessing uninitialized memory because it is undefined behavior. However, when building `Buffer`s, it is important to be able to _write_ to uninitialized memory regions, thereby avoiding the need to write _something_ to it before using it. Currently, all our initializations are zeroed, which is expensive. apache#9076 modifies our allocator to allocate uninitialized regions. However, by itself, this is not useful if we do not offer any methods to write to those (uninitialized) regions. # This PR This PR is built on top of apache#9076 and introduces methods to extend a `MutableBuffer` from an iterator, thereby offering an API to efficiently grow `MutableBuffer` without having to initialize memory regions with zeros (i.e. without `with_bitset` and the like). This PR also introduces methods to create a `Buffer` from an iterator and a `trusted_len` iterator. The design is inspired in `Vec`, with the catch that we use stable Rust (i.e. no trait specialization, no `TrustedLen`), and thus have to expose a bit more methods than what `Vec` exposes. This means that we can't use that (nicer) API for trustedlen iterators based on `collect()`. Note that, as before, there are still `unsafe` uses of the `MutableBuffer` derived from the fact that it is not a generic over a type `T` (and thus people can mix types and grow the buffer in unsound ways). Special thanks to @mbrubeck for all the help on this, originally discussed [here](https://users.rust-lang.org/t/collect-for-exactsizediterator/54367/6). ```bash git checkout master cargo bench --bench arithmetic_kernels git checkout length_faster cargo bench --bench arithmetic_kernels git checkout 16bc720 cargo bench --bench length_kernel git checkout length_faster ``` ``` Switched to branch 'length_faster' (use "git pull" to merge the remote branch into yours) Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 1m 02s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/arithmetic_kernels-ec2cc20ce07d9b83 Gnuplot not found, using plotters backend add 512 time: [522.24 ns 523.67 ns 525.26 ns] change: [-21.738% -20.960% -20.233%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 9 (9.00%) high mild 3 (3.00%) high severe subtract 512 time: [503.18 ns 504.93 ns 506.81 ns] change: [-21.741% -21.075% -20.308%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe multiply 512 time: [508.25 ns 512.04 ns 516.06 ns] change: [-22.569% -21.946% -21.305%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild divide 512 time: [1.4711 us 1.4753 us 1.4799 us] change: [-24.783% -23.305% -22.176%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe limit 512, 512 time: [373.47 ns 377.76 ns 382.21 ns] change: [+3.3055% +4.4193% +5.5923%] (p = 0.00 < 0.05) Performance has regressed. add_nulls_512 time: [502.94 ns 504.51 ns 506.28 ns] change: [-24.876% -24.299% -23.709%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe divide_nulls_512 time: [1.4843 us 1.4931 us 1.5053 us] change: [-22.968% -22.243% -21.420%] (p = 0.00 < 0.05) Performance has improved. Found 24 outliers among 100 measurements (24.00%) 15 (15.00%) low mild 1 (1.00%) high mild 8 (8.00%) high severe ``` Length (against the commit that fixes the bench, `16bc7200f3baa6e526aea7135c60dcc949c9b592`, not master): ``` length time: [1.5379 us 1.5408 us 1.5437 us] change: [-97.311% -97.295% -97.278%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe ``` Closes apache#9235 from jorgecarleitao/length_faster Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>

@Dandandan

This PR refactors `MutableBuffer::extend_from_slice` to remove the need to use `to_byte_slice` on every call, thereby removing its level of indirection, that does not allow the compiler to optimize out some code. This is the second performance improvement originally presented in apache#8796 and, together with apache#9027 , brings the performance of "MutableBuffer" to the same level as `Vec<u8>`, in particular to building buffers on the fly. Basically, when converting to a byte slice `&[u8]`, the compiler loses the type size information, and thus needs to perform extra checks and can't just optimize out the code. This PR adopts the same API as `Vec<T>::extend_from_slice`, but since our buffers are in `u8` (i.e. a la `Vec<u8>`), I made the signature ``` pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T]) pub fn push<T: ToByteSlice>(&mut self, item: &T) ``` i.e. it consumes something that can be converted to a byte slice, but internally makes the conversion to bytes (as `to_byte_slice` was doing). Credits for the root cause analysis that lead to this PR go to @Dandandan, [originally fielded here](apache#9016 (comment)). > [...] current conversion to a byte slice may add some overhead? - @Dandandan Benches (against master, so, both this PR and apache#9044 ): ``` Switched to branch 'perf_buffer' Your branch and 'origin/perf_buffer' have diverged, and have 6 and 1 different commits each, respectively. (use "git pull" to merge the remote branch into yours) Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 1m 00s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/buffer_create-915da5f1abaf0471 Gnuplot not found, using plotters backend mutable time: [463.11 us 463.57 us 464.07 us] change: [-19.508% -18.571% -17.526%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) high mild 9 (9.00%) high severe mutable prepared time: [527.84 us 528.46 us 529.14 us] change: [-13.356% -12.522% -11.790%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe Benchmarking from_slice: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60. from_slice time: [1.1968 ms 1.1979 ms 1.1991 ms] change: [-6.8697% -6.2029% -5.5812%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe from_slice prepared time: [917.49 us 918.89 us 920.60 us] change: [-6.5111% -5.9102% -5.3038%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe ``` Closes apache#9076 from jorgecarleitao/perf_buffer Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>

@mbrubeck

… -97% for length) # Rational Rust forbids safely accessing uninitialized memory because it is undefined behavior. However, when building `Buffer`s, it is important to be able to _write_ to uninitialized memory regions, thereby avoiding the need to write _something_ to it before using it. Currently, all our initializations are zeroed, which is expensive. apache#9076 modifies our allocator to allocate uninitialized regions. However, by itself, this is not useful if we do not offer any methods to write to those (uninitialized) regions. # This PR This PR is built on top of apache#9076 and introduces methods to extend a `MutableBuffer` from an iterator, thereby offering an API to efficiently grow `MutableBuffer` without having to initialize memory regions with zeros (i.e. without `with_bitset` and the like). This PR also introduces methods to create a `Buffer` from an iterator and a `trusted_len` iterator. The design is inspired in `Vec`, with the catch that we use stable Rust (i.e. no trait specialization, no `TrustedLen`), and thus have to expose a bit more methods than what `Vec` exposes. This means that we can't use that (nicer) API for trustedlen iterators based on `collect()`. Note that, as before, there are still `unsafe` uses of the `MutableBuffer` derived from the fact that it is not a generic over a type `T` (and thus people can mix types and grow the buffer in unsound ways). Special thanks to @mbrubeck for all the help on this, originally discussed [here](https://users.rust-lang.org/t/collect-for-exactsizediterator/54367/6). ```bash git checkout master cargo bench --bench arithmetic_kernels git checkout length_faster cargo bench --bench arithmetic_kernels git checkout 16bc720 cargo bench --bench length_kernel git checkout length_faster ``` ``` Switched to branch 'length_faster' (use "git pull" to merge the remote branch into yours) Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 1m 02s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/arithmetic_kernels-ec2cc20ce07d9b83 Gnuplot not found, using plotters backend add 512 time: [522.24 ns 523.67 ns 525.26 ns] change: [-21.738% -20.960% -20.233%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 9 (9.00%) high mild 3 (3.00%) high severe subtract 512 time: [503.18 ns 504.93 ns 506.81 ns] change: [-21.741% -21.075% -20.308%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe multiply 512 time: [508.25 ns 512.04 ns 516.06 ns] change: [-22.569% -21.946% -21.305%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild divide 512 time: [1.4711 us 1.4753 us 1.4799 us] change: [-24.783% -23.305% -22.176%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe limit 512, 512 time: [373.47 ns 377.76 ns 382.21 ns] change: [+3.3055% +4.4193% +5.5923%] (p = 0.00 < 0.05) Performance has regressed. add_nulls_512 time: [502.94 ns 504.51 ns 506.28 ns] change: [-24.876% -24.299% -23.709%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe divide_nulls_512 time: [1.4843 us 1.4931 us 1.5053 us] change: [-22.968% -22.243% -21.420%] (p = 0.00 < 0.05) Performance has improved. Found 24 outliers among 100 measurements (24.00%) 15 (15.00%) low mild 1 (1.00%) high mild 8 (8.00%) high severe ``` Length (against the commit that fixes the bench, `16bc7200f3baa6e526aea7135c60dcc949c9b592`, not master): ``` length time: [1.5379 us 1.5408 us 1.5437 us] change: [-97.311% -97.295% -97.278%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe ``` Closes apache#9235 from jorgecarleitao/length_faster Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>

github-actions bot added the Component: Rust label Jan 2, 2021

jorgecarleitao commented Jan 2, 2021

View reviewed changes

jorgecarleitao marked this pull request as ready for review January 2, 2021 12:58

Dandandan reviewed Jan 2, 2021

View reviewed changes

rust/arrow/src/buffer.rs Outdated Show resolved Hide resolved

jorgecarleitao mentioned this pull request Jan 2, 2021

ARROW-11107: [Rust] Faster Length kernel (10%) #9077

Closed

jorgecarleitao changed the title ~~[Rust] Fixed performance issue in mutableBuffer.~~ ARROW-11108: [Rust] Fixed performance issue in mutableBuffer. Jan 2, 2021

Dandandan reviewed Jan 2, 2021

View reviewed changes

rust/arrow/src/buffer.rs Outdated Show resolved Hide resolved

jorgecarleitao mentioned this pull request Jan 5, 2021

[Rust] [Experiment] Vec<u8> vs current allocations #8796

Closed

alamb added the needs-rebase A PR that needs to be rebased by the author label Jan 5, 2021

jorgecarleitao removed the needs-rebase A PR that needs to be rebased by the author label Jan 6, 2021

jorgecarleitao added 7 commits January 9, 2021 19:34

New alloc.

7863471

Fixed error in parquet.

4749e60

Fixed performance issue in mutableBuffer.

cebbfae

Improve performance of MutableBuffer for nulls.

c30bf7f

Improved code for primitives.

cc46e81

Performance.

5d35fe7

Improved performance like.

2dca3c8

github-actions bot added the Component: Parquet label Jan 10, 2021

alamb approved these changes Jan 11, 2021

View reviewed changes

This was referenced Jan 17, 2021

ARROW-11291: [Rust] Add extend to MutableBuffer (-20% for arithmetic, -97% for length) #9235

Closed

ARROW-11239: [Rust] Fixed equality with offsets and nulls #9211

Closed

jorgecarleitao closed this in 7e0cb0a Jan 19, 2021

jorgecarleitao deleted the perf_buffer branch January 19, 2021 17:42

maxburke reviewed Jan 19, 2021

View reviewed changes

asfimport mentioned this pull request Apr 18, 2021

[Rust] Improve performance of MutableBuffer #27019

Closed

ARROW-11108: [Rust] Fixed performance issue in mutableBuffer. #9076

ARROW-11108: [Rust] Fixed performance issue in mutableBuffer. #9076

Uh oh!

Conversation

jorgecarleitao commented Jan 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 2, 2021

Uh oh!

codecov-io commented Jan 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Dandandan commented Jan 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorgecarleitao commented Jan 2, 2021

Uh oh!

Dandandan commented Jan 2, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jan 2, 2021

Uh oh!

Uh oh!

jorgecarleitao commented Jan 10, 2021

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jan 11, 2021

Uh oh!

alamb commented Jan 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Jan 17, 2021

Uh oh!

alamb commented Jan 19, 2021

Uh oh!

jorgecarleitao commented Jan 19, 2021

Uh oh!

alamb commented Jan 19, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jorgecarleitao commented Jan 2, 2021 •

edited

Loading

codecov-io commented Jan 2, 2021 •

edited

Loading

Dandandan commented Jan 2, 2021 •

edited

Loading

alamb commented Jan 11, 2021 •

edited

Loading