Skip to content

feat: replace sha1 and tiny-keccak with sha-1 and sha3 #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 26, 2020
Merged

feat: replace sha1 and tiny-keccak with sha-1 and sha3 #52

merged 4 commits into from
Feb 26, 2020

Conversation

koushiro
Copy link
Contributor

@koushiro koushiro commented Feb 25, 2020

  • Add benchmark for digest
  • Replace sha1 and tiny-keccak with sha-1 and sha3

Signed-off-by: koushiro <koushiro.cqx@gmail.com>
@koushiro
Copy link
Contributor Author

Benchmark

The results are not significantly different on my machine.

before

encode/identity         time:   [132.38 ns 132.90 ns 133.41 ns]                            
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
encode/sha1             time:   [2.0473 us 2.0528 us 2.0603 us]                         
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
encode/sha2_256         time:   [4.7345 us 4.7444 us 4.7558 us]                             
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  7 (7.00%) high mild
  3 (3.00%) high severe
encode/sha2_512         time:   [3.1024 us 3.1159 us 3.1335 us]                             
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe
encode/sha3_224         time:   [3.9798 us 3.9823 us 3.9854 us]                             
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
encode/sha3_256         time:   [3.9744 us 3.9776 us 3.9812 us]                             
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
encode/sha3_384         time:   [5.0447 us 5.0615 us 5.0808 us]                             
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high severe
encode/keccak_224       time:   [3.9861 us 3.9904 us 3.9956 us]                               
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
encode/keccak_256       time:   [3.9549 us 3.9580 us 3.9618 us]                               
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
encode/keccak_384       time:   [4.9629 us 4.9729 us 4.9843 us]                                
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
encode/keccak_512       time:   [7.4919 us 7.4974 us 7.5035 us]                               
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
encode/blake2b_256      time:   [1.3418 us 1.3426 us 1.3434 us]                                
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe
encode/blake2b_512      time:   [1.2973 us 1.2994 us 1.3014 us]                                
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
encode/blake2s_128      time:   [1.9659 us 1.9675 us 1.9692 us]                                
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
encode/blake2s_256      time:   [2.0079 us 2.0097 us 2.0117 us]                                
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild

after

encode/identity         time:   [126.36 ns 126.48 ns 126.58 ns]                            
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) low mild
encode/sha1             time:   [2.0177 us 2.0274 us 2.0377 us]                         
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
encode/sha2_256         time:   [4.8577 us 4.8865 us 4.9173 us]                             
encode/sha2_512         time:   [3.1260 us 3.1373 us 3.1510 us]                             
encode/sha3_224         time:   [4.0874 us 4.1135 us 4.1395 us]                             
encode/sha3_256         time:   [4.0373 us 4.0502 us 4.0663 us]                             
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
encode/sha3_384         time:   [5.0627 us 5.0666 us 5.0712 us]                             
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
encode/keccak_224       time:   [4.0856 us 4.1092 us 4.1340 us]                               
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
encode/keccak_256       time:   [4.1095 us 4.1300 us 4.1487 us]                               
encode/keccak_384       time:   [5.0831 us 5.0890 us 5.0961 us]                               
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
encode/keccak_512       time:   [7.5094 us 7.5458 us 7.5827 us]                               
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
encode/blake2b_256      time:   [1.3323 us 1.3336 us 1.3351 us]                                
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe
encode/blake2b_512      time:   [1.2899 us 1.2919 us 1.2941 us]                                
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
encode/blake2s_128      time:   [2.0425 us 2.0441 us 2.0460 us]                                
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low severe
  1 (1.00%) high mild
  1 (1.00%) high severe
encode/blake2s_256      time:   [2.0341 us 2.0633 us 2.0953 us]                                
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Signed-off-by: koushiro <koushiro.cqx@gmail.com>
@vmx
Copy link
Member

vmx commented Feb 25, 2020

Thanks for adding benchmarks. With the merge of #45 the codebase changed quite a lot. Could you please rebase the benchmarks?

Why switching crates if there is not really a perf difference?

Signed-off-by: koushiro <koushiro.cqx@gmail.com>
@koushiro
Copy link
Contributor Author

Thanks for adding benchmarks. With the merge of #45 the codebase changed quite a lot. Could you please rebase the benchmarks?

Why switching crates if there is not really a perf difference?

Because sha-1, sha2 and sha3 are a series of libraries that have consistent API.

@koushiro
Copy link
Contributor Author

koushiro commented Feb 25, 2020

Question:

/// The Identity hasher
#[derive(Clone, Debug)]
pub struct Identity;
impl MultihashDigest for Identity {
    fn code(&self) -> Code {
        Self::CODE
    }
    fn digest(&self, data: &[u8]) -> Multihash {
        Self::digest(data)
    }
}
/* Why need it?
impl Identity {
    /// The code of Identity hasher, 0x00.
    pub const CODE: Code = Code::Identity;
    /// Hash some input and return the raw binary digest.
    pub fn digest(data: &[u8]) -> Multihash {
        if (data.len() as u64) >= u64::from(std::u32::MAX) {
            panic!("Input data for identity hash is too large, it needs to be less the 2^32.")
        }
        wrap(Self::CODE, &data)
    }
}
*/

Why not that?

pub trait MultihashDigest {
    const CODE: Code;
    fn digest(&self, data: &[u8]) -> Multihash;
}
#[derive(Clone, Debug)]
pub struct Identity;
impl MultihashDigest for Identity {
    const Code: Code = Code::Identity;
    fn digest(&self, data: &[u8]) -> Multihash {
        if (data.len() as u64) >= u64::from(std::u32::MAX) {
            panic!("Input data for identity hash is too large, it needs to be less the 2^32.")
        }
        wrap(Self::Code, &data)
    }
}

Copy link
Member

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to run the benchmarks myself locally (also with bigger data sizes), but until I've done that I've one more comment (rest looks good). I thought I publish it now before I run the benchmarks, so you can comment/react on it (or not :)

@@ -1,12 +1,10 @@
use blake2b_simd::Params as Blake2b;
use blake2s_simd::Params as Blake2s;
use digest::Digest;
use sha1::Sha1 as Sha1Hasher;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nitpick and not a reason for not merging this PR. I just want to bring it up that I did those imports at the top, so things are easy to grep for. Having absolute imports within the code makes it harder to scan over files if you like to manually refactor something or just want to get an overview over the files.

@vmx
Copy link
Member

vmx commented Feb 25, 2020

So it's about API ergonomics. The reason for having an impl and a trait is that often you don't need the trait at all.

Most simple case is:

use multihash:Sha2_256;
Sha2_256::digest("some data");

If you always implement the trait, you also always need to import it. Hence it's

use multihash:{MultihashDigest, Sha2_256};
Sha2_256.digest("some data");

Which is not as nice.

Also the trait is using instance methods (self as first argument), so that this trait can be used for trait objects. Though for hashing you don't actually need an instance. A clearer, more to the actual implementation truth is having a static method (Sha2_256::digest()).

Though I'm happy for cleaner/better ideas for the API. I find the current API better than the old one, the once it is used more widely I expect that it still changes based on experiences using it.

@koushiro
Copy link
Contributor Author

So it's about API ergonomics. The reason for having an impl and a trait is that often you don't need the trait at all.

Most simple case is:

use multihash:Sha2_256;
Sha2_256::digest("some data");

If you always implement the trait, you also always need to import it. Hence it's

use multihash:{MultihashDigest, Sha2_256};
Sha2_256.digest("some data");

Which is not as nice.

Also the trait is using instance methods (self as first argument), so that this trait can be used for trait objects. Though for hashing you don't actually need an instance. A clearer, more to the actual implementation truth is having a static method (Sha2_256::digest()).

Though I'm happy for cleaner/better ideas for the API. I find the current API better than the old one, the once it is used more widely I expect that it still changes based on experiences using it.

Okay, I got it.
Thanks for your reply :)

@koushiro koushiro changed the title Replace sha1 and tiny-keccak with sha-1 and sha3 feat: replace sha1 and tiny-keccak with sha-1 and sha3 Feb 25, 2020
Copy link
Member

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did run the benchmark locally with bigger sizes. There wasn't really any difference (some slower, some faster, but all in the range of just being noise (especially on my laptop where many other processes run in the background).

Also this PR #30 included that change of the hashing functions.

Thanks for the PR and also for adding more comments!

The only thing left if waiting for a final approval from @dignifiedquire.

Signed-off-by: koushiro <koushiro.cqx@gmail.com>
Copy link
Member

@dignifiedquire dignifiedquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you

@vmx vmx merged commit a37ddc7 into multiformats:master Feb 26, 2020
@koushiro koushiro deleted the update branch February 26, 2020 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants