feat: replace sha1 and tiny-keccak with sha-1 and sha3 #52

koushiro · 2020-02-25T06:52:48Z

Add benchmark for digest
Replace sha1 and tiny-keccak with sha-1 and sha3

Signed-off-by: koushiro <koushiro.cqx@gmail.com>

koushiro · 2020-02-25T06:55:06Z

Benchmark

The results are not significantly different on my machine.

before

encode/identity         time:   [132.38 ns 132.90 ns 133.41 ns]                            
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
encode/sha1             time:   [2.0473 us 2.0528 us 2.0603 us]                         
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
encode/sha2_256         time:   [4.7345 us 4.7444 us 4.7558 us]                             
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  7 (7.00%) high mild
  3 (3.00%) high severe
encode/sha2_512         time:   [3.1024 us 3.1159 us 3.1335 us]                             
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe
encode/sha3_224         time:   [3.9798 us 3.9823 us 3.9854 us]                             
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
encode/sha3_256         time:   [3.9744 us 3.9776 us 3.9812 us]                             
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
encode/sha3_384         time:   [5.0447 us 5.0615 us 5.0808 us]                             
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high severe
encode/keccak_224       time:   [3.9861 us 3.9904 us 3.9956 us]                               
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
encode/keccak_256       time:   [3.9549 us 3.9580 us 3.9618 us]                               
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
encode/keccak_384       time:   [4.9629 us 4.9729 us 4.9843 us]                                
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
encode/keccak_512       time:   [7.4919 us 7.4974 us 7.5035 us]                               
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
encode/blake2b_256      time:   [1.3418 us 1.3426 us 1.3434 us]                                
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe
encode/blake2b_512      time:   [1.2973 us 1.2994 us 1.3014 us]                                
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
encode/blake2s_128      time:   [1.9659 us 1.9675 us 1.9692 us]                                
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
encode/blake2s_256      time:   [2.0079 us 2.0097 us 2.0117 us]                                
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild

after

encode/identity         time:   [126.36 ns 126.48 ns 126.58 ns]                            
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) low mild
encode/sha1             time:   [2.0177 us 2.0274 us 2.0377 us]                         
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
encode/sha2_256         time:   [4.8577 us 4.8865 us 4.9173 us]                             
encode/sha2_512         time:   [3.1260 us 3.1373 us 3.1510 us]                             
encode/sha3_224         time:   [4.0874 us 4.1135 us 4.1395 us]                             
encode/sha3_256         time:   [4.0373 us 4.0502 us 4.0663 us]                             
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
encode/sha3_384         time:   [5.0627 us 5.0666 us 5.0712 us]                             
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
encode/keccak_224       time:   [4.0856 us 4.1092 us 4.1340 us]                               
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
encode/keccak_256       time:   [4.1095 us 4.1300 us 4.1487 us]                               
encode/keccak_384       time:   [5.0831 us 5.0890 us 5.0961 us]                               
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
encode/keccak_512       time:   [7.5094 us 7.5458 us 7.5827 us]                               
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
encode/blake2b_256      time:   [1.3323 us 1.3336 us 1.3351 us]                                
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe
encode/blake2b_512      time:   [1.2899 us 1.2919 us 1.2941 us]                                
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
encode/blake2s_128      time:   [2.0425 us 2.0441 us 2.0460 us]                                
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low severe
  1 (1.00%) high mild
  1 (1.00%) high severe
encode/blake2s_256      time:   [2.0341 us 2.0633 us 2.0953 us]                                
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Signed-off-by: koushiro <koushiro.cqx@gmail.com>

vmx · 2020-02-25T12:06:18Z

Thanks for adding benchmarks. With the merge of #45 the codebase changed quite a lot. Could you please rebase the benchmarks?

Why switching crates if there is not really a perf difference?

Signed-off-by: koushiro <koushiro.cqx@gmail.com>

koushiro · 2020-02-25T13:37:50Z

Thanks for adding benchmarks. With the merge of #45 the codebase changed quite a lot. Could you please rebase the benchmarks?

Why switching crates if there is not really a perf difference?

Because sha-1, sha2 and sha3 are a series of libraries that have consistent API.

Cargo.toml

koushiro · 2020-02-25T13:52:35Z

Question:

/// The Identity hasher
#[derive(Clone, Debug)]
pub struct Identity;
impl MultihashDigest for Identity {
    fn code(&self) -> Code {
        Self::CODE
    }
    fn digest(&self, data: &[u8]) -> Multihash {
        Self::digest(data)
    }
}
/* Why need it?
impl Identity {
    /// The code of Identity hasher, 0x00.
    pub const CODE: Code = Code::Identity;
    /// Hash some input and return the raw binary digest.
    pub fn digest(data: &[u8]) -> Multihash {
        if (data.len() as u64) >= u64::from(std::u32::MAX) {
            panic!("Input data for identity hash is too large, it needs to be less the 2^32.")
        }
        wrap(Self::CODE, &data)
    }
}
*/

Why not that?

pub trait MultihashDigest {
    const CODE: Code;
    fn digest(&self, data: &[u8]) -> Multihash;
}
#[derive(Clone, Debug)]
pub struct Identity;
impl MultihashDigest for Identity {
    const Code: Code = Code::Identity;
    fn digest(&self, data: &[u8]) -> Multihash {
        if (data.len() as u64) >= u64::from(std::u32::MAX) {
            panic!("Input data for identity hash is too large, it needs to be less the 2^32.")
        }
        wrap(Self::Code, &data)
    }
}

vmx

I want to run the benchmarks myself locally (also with bigger data sizes), but until I've done that I've one more comment (rest looks good). I thought I publish it now before I run the benchmarks, so you can comment/react on it (or not :)

vmx · 2020-02-25T13:42:41Z

src/hashes.rs

@@ -1,12 +1,10 @@
 use blake2b_simd::Params as Blake2b;
 use blake2s_simd::Params as Blake2s;
 use digest::Digest;
-use sha1::Sha1 as Sha1Hasher;


This is a nitpick and not a reason for not merging this PR. I just want to bring it up that I did those imports at the top, so things are easy to grep for. Having absolute imports within the code makes it harder to scan over files if you like to manually refactor something or just want to get an overview over the files.

vmx · 2020-02-25T14:01:21Z

So it's about API ergonomics. The reason for having an impl and a trait is that often you don't need the trait at all.

Most simple case is:

use multihash:Sha2_256;
Sha2_256::digest("some data");

If you always implement the trait, you also always need to import it. Hence it's

use multihash:{MultihashDigest, Sha2_256};
Sha2_256.digest("some data");

Which is not as nice.

Also the trait is using instance methods (self as first argument), so that this trait can be used for trait objects. Though for hashing you don't actually need an instance. A clearer, more to the actual implementation truth is having a static method (Sha2_256::digest()).

Though I'm happy for cleaner/better ideas for the API. I find the current API better than the old one, the once it is used more widely I expect that it still changes based on experiences using it.

koushiro · 2020-02-25T14:10:03Z

So it's about API ergonomics. The reason for having an impl and a trait is that often you don't need the trait at all.

Most simple case is:
use multihash:Sha2_256;
Sha2_256::digest("some data");
If you always implement the trait, you also always need to import it. Hence it's
use multihash:{MultihashDigest, Sha2_256};
Sha2_256.digest("some data");
Which is not as nice.

Also the trait is using instance methods (self as first argument), so that this trait can be used for trait objects. Though for hashing you don't actually need an instance. A clearer, more to the actual implementation truth is having a static method (Sha2_256::digest()).

Though I'm happy for cleaner/better ideas for the API. I find the current API better than the old one, the once it is used more widely I expect that it still changes based on experiences using it.

Okay, I got it.
Thanks for your reply :)

vmx

I did run the benchmark locally with bigger sizes. There wasn't really any difference (some slower, some faster, but all in the range of just being noise (especially on my laptop where many other processes run in the background).

Also this PR #30 included that change of the hashing functions.

Thanks for the PR and also for adding more comments!

The only thing left if waiting for a final approval from @dignifiedquire.

Signed-off-by: koushiro <koushiro.cqx@gmail.com>

dignifiedquire

Looks good, thank you

Add benchmark for encoding

3ec1d65

Signed-off-by: koushiro <koushiro.cqx@gmail.com>

koushiro requested review from vmx and dignifiedquire February 25, 2020 06:55

Use sha-1 and sha3 instead of sha1 and tiny-keccak

495a09f

Signed-off-by: koushiro <koushiro.cqx@gmail.com>

Merge remote-tracking branch 'upstream/master' into update

52f4fd1

Signed-off-by: koushiro <koushiro.cqx@gmail.com>

vmx reviewed Feb 25, 2020

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

vmx reviewed Feb 25, 2020

View reviewed changes

koushiro changed the title ~~Replace sha1 and tiny-keccak with sha-1 and sha3~~ feat: replace sha1 and tiny-keccak with sha-1 and sha3 Feb 25, 2020

vmx approved these changes Feb 25, 2020

View reviewed changes

Revert cargo package changes

0907e5b

Signed-off-by: koushiro <koushiro.cqx@gmail.com>

dignifiedquire approved these changes Feb 25, 2020

View reviewed changes

vmx merged commit a37ddc7 into multiformats:master Feb 26, 2020

koushiro deleted the update branch February 26, 2020 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: replace sha1 and tiny-keccak with sha-1 and sha3 #52

feat: replace sha1 and tiny-keccak with sha-1 and sha3 #52

Uh oh!

koushiro commented Feb 25, 2020 •

edited

Loading

Uh oh!

koushiro commented Feb 25, 2020

Uh oh!

vmx commented Feb 25, 2020

Uh oh!

koushiro commented Feb 25, 2020

Uh oh!

Uh oh!

koushiro commented Feb 25, 2020 •

edited

Loading

Uh oh!

vmx left a comment

Uh oh!

vmx Feb 25, 2020

Uh oh!

vmx commented Feb 25, 2020

Uh oh!

koushiro commented Feb 25, 2020

Uh oh!

vmx left a comment

Uh oh!

dignifiedquire left a comment

Uh oh!

Uh oh!

feat: replace sha1 and tiny-keccak with sha-1 and sha3 #52

feat: replace sha1 and tiny-keccak with sha-1 and sha3 #52

Uh oh!

Conversation

koushiro commented Feb 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koushiro commented Feb 25, 2020

Benchmark

before

after

Uh oh!

vmx commented Feb 25, 2020

Uh oh!

koushiro commented Feb 25, 2020

Uh oh!

Uh oh!

koushiro commented Feb 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmx left a comment

Choose a reason for hiding this comment

Uh oh!

vmx Feb 25, 2020

Choose a reason for hiding this comment

Uh oh!

vmx commented Feb 25, 2020

Uh oh!

koushiro commented Feb 25, 2020

Uh oh!

vmx left a comment

Choose a reason for hiding this comment

Uh oh!

dignifiedquire left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

koushiro commented Feb 25, 2020 •

edited

Loading

koushiro commented Feb 25, 2020 •

edited

Loading