Skip to content

Chained hash pipelining in array hashing #58252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

adienes
Copy link
Member

@adienes adienes commented Apr 28, 2025

the proposed switch in #57509 from 3h - hash_finalizer(x) to hash_finalizer(3h -x) should increase the hash quality of chained hashes, as the expanded expression goes from something like sum((-3)^k * hash(x) for k in ...) to a non-simplifiable composition

this does have the unfortunate impact of long chains of hashes getting a bit slower as there is more data dependency and the CPU cannot work on the next element's hash before combining the previous one (I think --- I'm not particularly an expert on this low level stuff). As far as I know this only really impacts AbstractArray

so, I've implemented a proposal that does some unrolling / pipelining manually to recover AbstractArray hashing performance. in fact, it's quite a lot faster now for most lengths. I tuned the thresholds (8 accumulators, certain length breakpoints) by hand on my own machine.

@adienes adienes added performance Must go faster hashing labels Apr 28, 2025
@adienes adienes mentioned this pull request Apr 28, 2025
@oscardssmith
Copy link
Member

show performance benchmarks and then lgtm.

@adienes
Copy link
Member Author

adienes commented Apr 28, 2025

#57509 (comment) the vec column contains timing; the data for this PR is under :commit == "pipeline" sorry I should have been more clear in that comment

graphically:

image
image

note that this cannot merge before #57509, which is also waiting on #58053

@adienes
Copy link
Member Author

adienes commented May 11, 2025

CI failure unrelated

@adienes
Copy link
Member Author

adienes commented May 15, 2025

should this method be used for big Tuples as well?

@oscardssmith
Copy link
Member

how does the Tuple perf look? if it only helps for 100 or more, I wouldn't bother. if it's useful in the 10-100 range, I think we should

@adienes
Copy link
Member Author

adienes commented May 16, 2025

eh, idt it helps. I guess tuples should mostly be hashing at compile time anyway

@adienes
Copy link
Member Author

adienes commented May 21, 2025

another question popped up: should this be extracted do a different function (like hash_array or something) with a wider signature, and then hash(::AbstractArray) forwards to it? I ask because there are a few types around the ecosystem that could in theory be duck-typed into this method just fine, but they do not subtype AbstractArray. see BioSequences.jl for one such example where hashing would be 5x faster than what's implemented in that package (with foldl) if it could use this method.

@oscardssmith
Copy link
Member

seems reasonable

@adienes
Copy link
Member Author

adienes commented May 22, 2025

I'll spend some time trying to fiddle with it, but before that just for visibility into the current status of the perf on sparse things:

image
image

so it looks like < 100 elements, PR is always faster on sparse.
between ~10k and ~100k elements, PR is slower for sparsity < ~90%

@oscardssmith
Copy link
Member

Sparsity <99% isn't really worth caring about (and honestly, I'm not sure if anyone is hashing sparse matrices at all anyway)

@adienes
Copy link
Member Author

adienes commented May 23, 2025

ok, I halved the threshold to switch to hash_fib . this will be slightly more conservative / balanced for cases like sparse arrays or arrays of elements that themselves are a bit slow to hash --- of course at the (pretty minor) expense of hashing floats & ints and such in those lengths. this threshold also has the property that it's the first power of 2 after which hash_fib will hash roughly less than 1/2 the elements.

image

I think this is ready to go. the only question might be if hash_shaped should be made public, given a docstring, etc. or if it's ok for the two or three packages that might want to use it to just use it as "internals." IMO, the latter is fine... but let me know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants