Skip to content

Add benchmarks for VOTable binary parsing #141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 24, 2025

Conversation

stvoutsin
Copy link
Contributor

Following a suggestion from @mhvk in the issue raised here: astropy/astropy#18442
this PR adds some benchmarks for VOTable binary/binary2 parsing to establish a baseline for a follow-up performance-improvement suggestion.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a nice idea. Thanks.

@pllim

This comment was marked as resolved.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks good to my untrained eye. One small comment about column types, which you can ignore. (But if you do think it is better to follow, then perhaps run flake8 and address the small PEP8 violations.)

SMALL_SIZE = 1000
LARGE_SIZE = 200000

ra_data = np.random.uniform(0, 360, LARGE_SIZE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't probably really matter for the tests, but woudln't you want ra, dec be float? In a way, parsing that fast seems more useful, though I guess you include some in flux_data already.

@stvoutsin
Copy link
Contributor Author

Thanks for the feedback! I've made some fixes for PEP8 compliance and changed ra/dec to be float32

@pllim
Copy link
Member

pllim commented Jul 23, 2025

@mhvk , is your comment addressed? Is this good to merge?

I took the liberty to push a minor commit to sort the imports. Hope you don't mind, @stvoutsin .

Thanks, all!

@pllim
Copy link
Member

pllim commented Jul 23, 2025

Oh wait, votable.TimeVOTableMixed.time_mixed_binary and votable.TimeVOTableMixed.time_mixed_binary2 failed.

@stvoutsin
Copy link
Contributor Author

@mhvk , is your comment addressed? Is this good to merge?

I took the liberty to push a minor commit to sort the imports. Hope you don't mind, @stvoutsin .

Thanks, all!

Oh no problem thanks for the fix, I should have checked the import sorting as well.

@mhvk
Copy link
Contributor

mhvk commented Jul 23, 2025

Yes, having ra and dec as float32 is great, so all good from my side (once tests pass of course!).

@stvoutsin stvoutsin force-pushed the votable-benchmarks branch from 1681ce2 to bbf2812 Compare July 23, 2025 23:02
@stvoutsin
Copy link
Contributor Author

Oh wait, votable.TimeVOTableMixed.time_mixed_binary and votable.TimeVOTableMixed.time_mixed_binary2 failed.

I've made one small change to remove an extra serialization/deserialization step that wasn't needed, perhaps that was causing some kind of memory/timeout issue with 200k rows. Perhaps we can trigger one more run with this change and see if that resolved it?

@pllim
Copy link
Member

pllim commented Jul 24, 2025

Now it passes on astropy-dev but fails a lot more on stable astropy... 🧐

[47.39%] ··· ...imeVOTableBooleanFields.time_booleans_binary             failed
[47.52%] ··· ...meVOTableBooleanFields.time_booleans_binary2             failed
[47.64%] ··· ...eVOTableLongStrings.time_long_strings_binary             failed
[47.77%] ··· ...VOTableLongStrings.time_long_strings_binary2             failed
[47.89%] ··· votable.TimeVOTableMixed.time_mixed_binary                  failed
[48.01%] ··· votable.TimeVOTableMixed.time_mixed_binary2                 failed
[48.14%] ··· votable.TimeVOTableNumeric.time_numeric_binary              failed
[48.26%] ··· votable.TimeVOTableNumeric.time_numeric_binary2             failed
[48.39%] ··· ...OTableShortStrings.time_short_strings_binary             failed
[48.51%] ··· ...TableShortStrings.time_short_strings_binary2             failed
[48.64%] ··· ...e.TimeVOTableSmallOverhead.time_small_binary             failed
[48.76%] ··· ....TimeVOTableSmallOverhead.time_small_binary2             failed
[48.88%] ··· ...tringIntensive.time_string_intensive_binary2             failed

@stvoutsin stvoutsin force-pushed the votable-benchmarks branch from 77e0e0e to 8a8999e Compare July 24, 2025 18:01
@stvoutsin
Copy link
Contributor Author

Now it passes on astropy-dev but fails a lot more on stable astropy... 🧐

[47.39%] ··· ...imeVOTableBooleanFields.time_booleans_binary             failed
[47.52%] ··· ...meVOTableBooleanFields.time_booleans_binary2             failed
[47.64%] ··· ...eVOTableLongStrings.time_long_strings_binary             failed
[47.77%] ··· ...VOTableLongStrings.time_long_strings_binary2             failed
[47.89%] ··· votable.TimeVOTableMixed.time_mixed_binary                  failed
[48.01%] ··· votable.TimeVOTableMixed.time_mixed_binary2                 failed
[48.14%] ··· votable.TimeVOTableNumeric.time_numeric_binary              failed
[48.26%] ··· votable.TimeVOTableNumeric.time_numeric_binary2             failed
[48.39%] ··· ...OTableShortStrings.time_short_strings_binary             failed
[48.51%] ··· ...TableShortStrings.time_short_strings_binary2             failed
[48.64%] ··· ...e.TimeVOTableSmallOverhead.time_small_binary             failed
[48.76%] ··· ....TimeVOTableSmallOverhead.time_small_binary2             failed
[48.88%] ··· ...tringIntensive.time_string_intensive_binary2             failed

I think the issue with the stable version is that we cannot set the format this way:
votable.get_first_table().format = format_type

Because we run into a KeyError here:
KeyError: 'version_1_3_or_later'

Since when initializing a Table using from_table() I don't think a config is created. My understanding is that this was fixed in astropy/astropy#18366 which is why it worked for the dev version.

I've updated the votable creation step to pass in the table format to the to_xml() call, so can we run the action again to see if it is now fixed?

@pllim
Copy link
Member

pllim commented Jul 24, 2025

It is green. Let's get it in then. Thank you!

@pllim pllim merged commit 6a4e0f3 into astropy:main Jul 24, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants