Add tests for `std::[unordered_][multi]set` #39

enirolf · 2025-02-28T14:21:53Z

Closes #14.

hahnjo

Thanks for the PR! I left some comments, mostly we should start by figuring out what we want to test for the binary format in particular. Originally I considered being lazy for other containers and put all std::[unordered_][multi]set in a single test, without the full combination of index column types. But I think the multi containers make this awkward because there we actually want to test that the duplicates are preserved, and also we may want to be 100% certain that we get the index columns right. For the non-multi containers though, duplicates are essentially handled on the C++ side, before it comes to RNTuple I think, so not sure if we need those entries here...

types/set/fundamental/write.C

types/set/nested/LinkDef.h

types/set/nested/write.C

types/README.md

types/multiset/fundamental/write.C

types/unordered_set/fundamental/write.C

types/set/nested/README.md

types/set/fundamental/read.C

types/set/nested/write.C

enirolf · 2025-03-04T11:25:59Z

Thanks for the PR! I left some comments, mostly we should start by figuring out what we want to test for the binary format in particular. Originally I considered being lazy for other containers and put all std::[unordered_][multi]set in a single test, without the full combination of index column types. But I think the multi containers make this awkward because there we actually want to test that the duplicates are preserved, and also we may want to be 100% certain that we get the index columns right. For the non-multi containers though, duplicates are essentially handled on the C++ side, before it comes to RNTuple I think, so not sure if we need those entries here...

Yeah that's fair enough! I suppose we could squash [unordered_]set and [unordered_]multiset. One thing to consider is that it's not a given that every writer/reader will be written in C++ (we already know this isn't the case), but the spec explicitly refers to C++ types. But you're right that still the ordering and duplicate handling is not something that the format itself is responsible for.

Taking it even further, the specification explicitly states that the on-disk representation is identical to std::vector, so if we really want to be strict about the scope we could even argue that these tests are somewhat redundant (except for maybe type handling?)..

hahnjo · 2025-03-04T14:57:15Z

I suppose we could squash [unordered_]set and [unordered_]multiset.

That might be a good compromise because we probably want the same input entries / entry classes for multiset and unordered_multiset...

Taking it even further, the specification explicitly states that the on-disk representation is identical to std::vector, so if we really want to be strict about the scope we could even argue that these tests are somewhat redundant (except for maybe type handling?)..

Yes, in my opinion we want each supported C++ type to appear at least once in the validation suite. But indeed, the question is how much different is it from a binary format perspective. There's two axes to that: index column encoding and nesting (e.g. std::set<std::set<std::int32_t>>).

pcanal · 2025-03-04T17:43:15Z

But indeed, the question is how much different is it from a binary format perspective.

As a side note, in roottest we do have a test of reading each STL collections in file format into all other STL collections of the same content (for a large subset of cases)

hahnjo · 2025-03-05T07:54:44Z

After thinking this over, here are some more considerations:

I think we want the nested tests of containers, both that values can be non-fundamental types (non-simple fields in ROOT) and that they work when below another container. It's not really much different from a binary format point of view, but I think the suite should also (try to) validate the write and read implementation and there I see plenty of ways how the (de)serialization can get things wrong (for example, just taking the entry number to get the size of a collection). If we can do the two things with a single test (e.g. std::set<std::set<std::int32_t>>) the better.
We may want at least kIndex64 and kSplitIndex64 because those are the two default encodings (depending on compression on/off), so they are what will be used most often. kIndex32 and kSplitIndex32 are kind of niche. Again not really an argument of the binary format but more of the implementation.

The decision whether to merge [unordered_]set and [unordered_]multiset I leave to you. In the end, given that we already have the tests written out, we may as well just keep them and make it easier to find the tests for a particular container.

enirolf · 2025-03-17T10:30:52Z

Based on this statement:

Yes, in my opinion we want each supported C++ type to appear at least once in the validation suite.

and because I right now don't have a clear idea how to nicely merge the ordered and unordered variants (two field types instead of one, i.e., [Unordered][Split]Index{32,64}? that would mix the field type with the column representation in a way though, which I don't like), for now I would opt to keep the four variants separate. I can remove the "redundant" test cases (e.g., the duplications in std::set) but I also don't think a bit of redundancy necessarily hurts here. What do you think?

hahnjo · 2025-03-20T16:31:07Z

Thanks for the work of also updating the infrastructure and the CI, seems to pass 😃

for now I would opt to keep the four variants separate

Fine with me.

I can remove the "redundant" test cases (e.g., the duplications in std::set) but I also don't think a bit of redundancy necessarily hurts here. What do you think?

I would tend to remove the "duplicate" and "reverse" tests because at least I personally find it confusing that the output doesn't match the input, and it's actually not RNTuple that is responsible for that. Let's hear some opinions from others maybe?

types/unordered_set/fundamental/read.C

pcanal · 2025-05-08T17:53:18Z

I would tend to remove the "duplicate" and "reverse" ....

I agree that those tests would not be adding much value. On the other hand, see my question/pondering on unordered_ and platform independence.

types/README.md

types/multiset/fundamental/README.md

hahnjo · 2025-05-09T14:55:56Z

#39 (comment)

I agree that the semantic data on disk is identical to std::vector, so it's indeed not a matter of the binary format.

While typing this, I realized that the same reasoning could be applied to all the nested tests: In the end they are the same as std::vector<std::vector<std::int32_t>...

Edit: and the Index column encoding...

enirolf · 2025-05-12T09:46:31Z

#39 (comment)

I agree that the semantic data on disk is identical to std::vector, so it's indeed not a matter of the binary format.

While typing this, I realized that the same reasoning could be applied to all the nested tests: In the end they are the same as std::vector<std::vector<std::int32_t>...

Edit: and the Index column encoding...

So, what would your proposal be? I mean, fundamentally they are all identical to std::vector but we also agreed we want to have at least natively covered C++ type in once. Taking these two arguments, if you ask me it sort of makes sense to have a one-to-one mirror of the std::vector tests because that would also allow to cross-reference between the outputs of the std::vector tests and the set-related tests. In other words, I would propose to merge the tests as they are now. I still don't think there's a perfect test case that exactly matches the requirements for the tests we set ourselves (i.e., only test the binary format but also cover natively supported C++ types), simply because they end up contradicting each other.

pcanal · 2025-05-12T19:53:26Z

Taking these two arguments, if you ask me it sort of makes sense to have a one-to-one mirror of the std::vector tests

This indeed has the 'nice' side effect that (except for map/multimap which contains pairs) the json files would be almost identical (i.e. just the (type)names would be different, isn't it)

hahnjo · 2025-05-13T07:00:31Z

Taking these two arguments, if you ask me it sort of makes sense to have a one-to-one mirror of the std::vector tests

This indeed has the 'nice' side effect that (except for map/multimap which contains pairs) the json files would be almost identical (i.e. just the (type)names would be different, isn't it)

Not quite, since (a) sets don't preserve their order and (b) we don't store duplicate elements. We could make the std::vector tests match these restrictions, but given that maps will produce a different json file anyway I don't think we should. However, by not storing duplicate elements even for multiset, we have the same reference files for all set containers, which is probably a good idea and another reason to do it that way.

hahnjo

LGTM, apart from two comments that apply to all nested tests.

types/set/nested/write.C

hahnjo · 2025-05-13T08:13:32Z

I mean, fundamentally they are all identical to std::vector but we also agreed we want to have at least natively covered C++ type in once. Taking these two arguments, if you ask me it sort of makes sense to have a one-to-one mirror of the std::vector tests because that would also allow to cross-reference between the outputs of the std::vector tests and the set-related tests.

I would have previously argued that we don't need nesting to have the C++ type once, a fundamental item type could be sufficient. But the discussions convinced me that this is useful to have the nesting since it directly affects the *Index* column data.

So, what would your proposal be?

I don't have a really good proposal. The best I could come up with after some thinking is effectively documenting the reasoning behind the current tests, and just generalize based on that: #50 Let me know what you think.

...for the tests that need them (currently, all nested `std::set` and friends).

enirolf added the types label Feb 28, 2025

enirolf requested review from hahnjo, pcanal and silverweed February 28, 2025 14:21

enirolf self-assigned this Feb 28, 2025

enirolf mentioned this pull request Feb 28, 2025

Fix Experimental-related deprecation warnings #40

Closed

enirolf force-pushed the types-sets branch from 570132f to 8ebb4a1 Compare February 28, 2025 16:14

hahnjo requested changes Mar 4, 2025

View reviewed changes

hahnjo reviewed Mar 4, 2025

View reviewed changes

types/set/fundamental/read.C Show resolved Hide resolved

hahnjo reviewed Mar 4, 2025

View reviewed changes

types/set/nested/write.C Outdated Show resolved Hide resolved

enirolf force-pushed the types-sets branch from 8ebb4a1 to ae167fa Compare March 17, 2025 10:25

enirolf requested a review from hahnjo March 17, 2025 10:35

hahnjo added this to the 1.0 milestone May 1, 2025

enirolf force-pushed the types-sets branch from ae167fa to b6324df Compare May 8, 2025 06:53

pcanal reviewed May 8, 2025

View reviewed changes

types/unordered_set/fundamental/read.C Outdated Show resolved Hide resolved

hahnjo reviewed May 9, 2025

View reviewed changes

types/README.md Show resolved Hide resolved

types/multiset/fundamental/README.md Show resolved Hide resolved

enirolf force-pushed the types-sets branch from b6324df to 7d0e683 Compare May 9, 2025 08:07

enirolf mentioned this pull request May 9, 2025

Add tests for std::pair and std::tuple #46

Merged

enirolf force-pushed the types-sets branch from 7d0e683 to ea59b8a Compare May 9, 2025 13:34

enirolf requested a review from hahnjo May 12, 2025 09:46

hahnjo approved these changes May 13, 2025

View reviewed changes

types/set/nested/write.C Outdated Show resolved Hide resolved

types/set/nested/write.C Outdated Show resolved Hide resolved

enirolf added 6 commits May 13, 2025 12:00

Add test for std::set

e390940

Add test for std::unordered_set

8e2efd4

Add test for std::multiset

fcb348b

Add test for std::unordered_multiset

e9ee6aa

Automatically create and load dictionaries

5e573e3

...for the tests that need them (currently, all nested `std::set` and friends).

Slightly reword and fix grammar

016171c

enirolf force-pushed the types-sets branch from ea59b8a to 016171c Compare May 13, 2025 10:04

enirolf merged commit 9d22af6 into root-project:main May 14, 2025
1 check passed

enirolf deleted the types-sets branch May 14, 2025 15:08

Add tests for std::[unordered_][multi]set #39

Add tests for std::[unordered_][multi]set #39

Uh oh!

Conversation

enirolf commented Feb 28, 2025

Uh oh!

hahnjo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

enirolf commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hahnjo commented Mar 4, 2025

Uh oh!

pcanal commented Mar 4, 2025

Uh oh!

hahnjo commented Mar 5, 2025

Uh oh!

enirolf commented Mar 17, 2025

Uh oh!

hahnjo commented Mar 20, 2025

Uh oh!

Uh oh!

pcanal commented May 8, 2025

Uh oh!

Uh oh!

Uh oh!

hahnjo commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enirolf commented May 12, 2025

Uh oh!

pcanal commented May 12, 2025

Uh oh!

hahnjo commented May 13, 2025

Uh oh!

hahnjo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hahnjo commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

Add tests for `std::[unordered_][multi]set` #39

Add tests for `std::[unordered_][multi]set` #39

enirolf commented Mar 4, 2025 •

edited

Loading

hahnjo commented May 9, 2025 •

edited

Loading