-
Notifications
You must be signed in to change notification settings - Fork 214
use iteration in set ops, wrap compressed sketch and unpack in iterator #644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This should speed up union of wrapped compressed Theta sketches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not quite get the logic in DirectCompactCompressedSketch.getRetainedEntries() (I'm not saying it's wrong, just that I haven't yet stared at it enough to realize what exactly it's doing) but there were a few other things that seemed worth noting.
if (hash < unionThetaLong_ && hash < gadget_.getThetaLong()) { | ||
gadget_.hashUpdate(hash); // backdoor update, hash function is bypassed | ||
} else { | ||
if (sketchIn.isOrdered()) { break; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider a flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on intersection appeared higher in my review. So the longer version of the relevant comment is that one.
matchSet[matchSetCount++] = hashIn; | ||
} | ||
} else { | ||
if (sketchIn.isOrdered()) { break; } // early stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't change over the course of the intersection. Should it be a flag pre-computed in advance to avoid a function call every time? Not sure if the JVM can easily determine there's no chance of a side-effect from other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I prefer not to introduce aliases, but if you think this can help, I can do that. Perhaps we could try measuring the effect one day.
class DirectCompactCompressedSketch extends DirectCompactSketch { | ||
/** | ||
* Construct this sketch with the given memory. | ||
* @param mem Read-only Memory object with the order bit properly set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than "properly set" maybe just say it must be ordered and with the bit set to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this refers to memory byte order. Perhaps can be rephrased for clarity.
|
||
/** | ||
* Wraps the given Memory, which must be a SerVer 4 compressed CompactSketch image. | ||
* Must check the validity of the Memory before calling. The order bit must be set properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again
Ok, for some reason I was thinking it was going over the entries themselves, not the value representing the number of entries. Got it now and it's fine. |
If it was a bit difficult and confusing, perhaps some comments are needed. Let me see what I can do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
No description provided.