-
-
Notifications
You must be signed in to change notification settings - Fork 30
Add benchmarks for VOTable binary parsing #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a nice idea. Thanks.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks good to my untrained eye. One small comment about column types, which you can ignore. (But if you do think it is better to follow, then perhaps run flake8
and address the small PEP8 violations.)
benchmarks/votable.py
Outdated
SMALL_SIZE = 1000 | ||
LARGE_SIZE = 200000 | ||
|
||
ra_data = np.random.uniform(0, 360, LARGE_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't probably really matter for the tests, but woudln't you want ra
, dec
be float
? In a way, parsing that fast seems more useful, though I guess you include some in flux_data
already.
Thanks for the feedback! I've made some fixes for PEP8 compliance and changed ra/dec to be |
@mhvk , is your comment addressed? Is this good to merge? I took the liberty to push a minor commit to sort the imports. Hope you don't mind, @stvoutsin . Thanks, all! |
Oh wait, |
Oh no problem thanks for the fix, I should have checked the import sorting as well. |
Yes, having ra and dec as float32 is great, so all good from my side (once tests pass of course!). |
1681ce2
to
bbf2812
Compare
I've made one small change to remove an extra serialization/deserialization step that wasn't needed, perhaps that was causing some kind of memory/timeout issue with 200k rows. Perhaps we can trigger one more run with this change and see if that resolved it? |
Now it passes on astropy-dev but fails a lot more on stable astropy... 🧐
|
77e0e0e
to
8a8999e
Compare
I think the issue with the stable version is that we cannot set the format this way: Because we run into a KeyError here: Since when initializing a Table using from_table() I don't think a config is created. My understanding is that this was fixed in astropy/astropy#18366 which is why it worked for the dev version. I've updated the votable creation step to pass in the table format to the to_xml() call, so can we run the action again to see if it is now fixed? |
It is green. Let's get it in then. Thank you! |
Following a suggestion from @mhvk in the issue raised here: astropy/astropy#18442
this PR adds some benchmarks for VOTable binary/binary2 parsing to establish a baseline for a follow-up performance-improvement suggestion.