Scalar 4x64 performance improvements #453

peterdettman · 2017-04-18T07:06:02Z

(Not for immediate merge)

In #452 I noted that sqr and mul take about the same time in my config (OSX, 64-bit, no-asm, -O3 -march=native), so this is a quick attempt to speed up _scalar_sqr. This initial commit rewrites _scalar_sqr_512 for an ~ 8% improvement in _scalar_sqr. Second opinions/measurements would be appreciated.

It seems from the measurements that _scalar_reduce_512 is the real heavyweight here, so I'll be trying to re-implement that next.

I can rewrite in terms of macros (the current local code style) prior to any merge.

dcousens · 2017-04-18T07:07:59Z

@peterdettman other than inlining, why is this faster?

peterdettman · 2017-04-18T07:38:21Z

@dcousens The existing code is using macros not functions, so I doubt there is any benefit from writing inline code per se. Pre-loading a->d[0] etc. I don't think makes much difference (presumably the compiler does it anyway). Did you mean something else?

My assumption would be that the improvement is coming from using several uint128_t accumulators instead of the existing "160 bit accumulator" - muladd2 is quite awkward as a result, and presumably where the extra time is going.

peterdettman · 2017-04-19T06:03:26Z

New commit rewrites secp256k1_reduce_512. Cumulative speed improvement for _scalar_inverse is now measured at >30%, bench_sign measurements improved by ~7-8%.

The "trick" used with p4 (see lines 588, 611 and comments) warrants careful review.

sipa · 2017-04-25T22:51:23Z

Benchmarked bench_verify with GMP and asm disabled on a i7-6820HQ CPU, pegged to 2.6GHz:

Master: 101μs
This PR: 98μs

ofek · 2017-05-22T23:33:48Z

Is this ready to be merged?

peterdettman · 2017-05-23T05:52:30Z

No, it needs thorough review of the carry handling and modular reduction, which can have very subtle bugs that random tests won't catch. I'd also like to get around to rewriting _scalar_mul in the same spirit, although I expect less improvement from that, and probably putting the code back into a cleaner macro-style like the original code.

sipa · 2023-05-08T15:55:58Z

@peterdettman Are you still interested in this? It seems like a reasonable improvement, and I'm willing to look into verifying that the double-correction trick always yields the right answer. I do think we'll want it in macro-style though.

real-or-random · 2023-05-08T15:58:21Z

I do think we'll want it in macro-style though.

Or instead of macros, actual (inlinable) C functions with type checking and all that modern stuff. ;)

real-or-random mentioned this pull request Aug 7, 2020

Use double-wide types for additions in scalar code #786

Closed

peterdettman force-pushed the scalar_opt branch from 5c70a50 to 4372e28 Compare December 21, 2021 13:41

Rewrite _scalar_reduce_512

2bc31d3

peterdettman force-pushed the scalar_opt branch from 7ef4a3e to 2bc31d3 Compare December 23, 2021 13:16

real-or-random added the performance label May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scalar 4x64 performance improvements #453

Scalar 4x64 performance improvements #453

Uh oh!

peterdettman commented Apr 18, 2017

Uh oh!

dcousens commented Apr 18, 2017

Uh oh!

peterdettman commented Apr 18, 2017

Uh oh!

peterdettman commented Apr 19, 2017

Uh oh!

sipa commented Apr 25, 2017 •

edited

Loading

Uh oh!

ofek commented May 22, 2017

Uh oh!

peterdettman commented May 23, 2017

Uh oh!

sipa commented May 8, 2023

Uh oh!

real-or-random commented May 8, 2023

Uh oh!

Uh oh!

Scalar 4x64 performance improvements #453

Are you sure you want to change the base?

Scalar 4x64 performance improvements #453

Uh oh!

Conversation

peterdettman commented Apr 18, 2017

Uh oh!

dcousens commented Apr 18, 2017

Uh oh!

peterdettman commented Apr 18, 2017

Uh oh!

peterdettman commented Apr 19, 2017

Uh oh!

sipa commented Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ofek commented May 22, 2017

Uh oh!

peterdettman commented May 23, 2017

Uh oh!

sipa commented May 8, 2023

Uh oh!

real-or-random commented May 8, 2023

Uh oh!

Uh oh!

sipa commented Apr 25, 2017 •

edited

Loading