Skip to content

use iteration in set ops, wrap compressed sketch and unpack in iterator #644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 28, 2025

Conversation

AlexanderSaydakov
Copy link
Contributor

No description provided.

@AlexanderSaydakov
Copy link
Contributor Author

This should speed up union of wrapped compressed Theta sketches

Copy link
Contributor

@jmalkin jmalkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not quite get the logic in DirectCompactCompressedSketch.getRetainedEntries() (I'm not saying it's wrong, just that I haven't yet stared at it enough to realize what exactly it's doing) but there were a few other things that seemed worth noting.

if (hash < unionThetaLong_ && hash < gadget_.getThetaLong()) {
gadget_.hashUpdate(hash); // backdoor update, hash function is bypassed
} else {
if (sketchIn.isOrdered()) { break; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a flag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on intersection appeared higher in my review. So the longer version of the relevant comment is that one.

matchSet[matchSetCount++] = hashIn;
}
} else {
if (sketchIn.isOrdered()) { break; } // early stop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't change over the course of the intersection. Should it be a flag pre-computed in advance to avoid a function call every time? Not sure if the JVM can easily determine there's no chance of a side-effect from other code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I prefer not to introduce aliases, but if you think this can help, I can do that. Perhaps we could try measuring the effect one day.

class DirectCompactCompressedSketch extends DirectCompactSketch {
/**
* Construct this sketch with the given memory.
* @param mem Read-only Memory object with the order bit properly set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than "properly set" maybe just say it must be ordered and with the bit set to true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this refers to memory byte order. Perhaps can be rephrased for clarity.


/**
* Wraps the given Memory, which must be a SerVer 4 compressed CompactSketch image.
* Must check the validity of the Memory before calling. The order bit must be set properly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again

@jmalkin
Copy link
Contributor

jmalkin commented Jan 27, 2025

Ok, for some reason I was thinking it was going over the entries themselves, not the value representing the number of entries. Got it now and it's fine.

@AlexanderSaydakov
Copy link
Contributor Author

If it was a bit difficult and confusing, perhaps some comments are needed. Let me see what I can do.

Copy link
Contributor

@jmalkin jmalkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@AlexanderSaydakov AlexanderSaydakov merged commit 186fb22 into main Jan 28, 2025
7 checks passed
@AlexanderSaydakov AlexanderSaydakov deleted the compressed_iterator branch January 28, 2025 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants