On the collapsing of duplicate types in unions #748
Replies: 0 comments 20 replies
-
I am a little confused by your example because I do not think that the type you use to represent the different iterators is actually a sum type. IIUC, you need one additional property that is only incidentally provided by I would rather use a collection of iterators in your situation, or at least define a data structure that conceptually operates as a collection.
This is a very reasonable feature and it can definitely discussed. As of now, however, I am not yet entirely convinced it has its place in Val. Given that "naming" individual elements of a sum type can be done by simply defining other types, I'm still on the look for use cases that would demonstrate how a fully-fledged feature would significantly improve usability and/or expressiveness and/or performance. For an example of what you can do now, look at the program below:
|
Beta Was this translation helpful? Give feedback.
-
Yes, what I am saying is that this information is, indeed, important and is information that should not be discarded. Your description of the alternatives as being unordered is a fine way of looking at a subset of use-cases of discriminated unions, but not all use-cases of discriminated unions. A discriminated union that allows duplicates can be used to implement your existing facility trivially (the user or an alternative language facility can remove duplicates, for example, and trivially be implemented on top of the facility that otherwise allows duplicates). The workarounds to effectively get duplicates when working with a facility that disallows them add complexity (for the user and for the implementation), with the most common solution being to have the user wrap the duplicate types in a dummy struct with a single member, so that the "duplicate" types become different. The user then further unwraps the type upon access. For some perspective, in C++ the leading sum-type facility (boost::variant) did not allow duplicates. Over the years that it was used, the limitations of this became clear. It is not incidental that the standard facility and many other implementations that followed allow duplicates.
By collection do you mean a struct/product type of the iterators? If so, that does not accurately represent the situation here. My use of the term "iterator" here is in reference to a C++ iterator, in other words, a locator of a single element of a range. If you concatenate 10 ranges, for example, your
Right, that was the workaround we did for years in C++ until accepting discriminated unions with duplicate types. At the moment you are requiring the user to make additional types solely to abide by the restriction of duplicates being disallowed. Those new types serve no other purpose and are effectively just acting as names or indices anyway. So in that sense, what exactly is being accomplished by the restriction at all? The user can still do what they need to do by artificially creating types, and the implementation is still going through the process of checking for duplicates to collapse them out, the user needs to unwrap the types, and the implementation now has more datatypes to deal with overall as well. It ends up being more work for the users and for the implementation. On the other hand, what is lost to the user/implementation when duplicates are simply allowed? In such a case, the user does not need to create intermediate wrapper types/tuples, and the implementation does less work (it does not need to trim duplicate types, and it is not dealing with these little dummy types). The only thing the user loses is the ability to index unambiguously by type, but if they want to do that, such a facility can be built on top of a discriminated union that allows duplicates (apply the restriction in a higher-level facility). The situation for the user of a discriminated union that disallows duplicate types gets further complicated when a couple of other situations naturally happen.
Refer to the concatenation example for why both of these are problematic. In the case of (1), consider you are writing that generic "concatenate" operation. It should ideally work perfectly fine whether passed a In the case of (2), there isn't anything particularly more complicated except for the fact that names alone do not save you unless you have a way to programmatically create names. For instance, if your concatenate is variadic and you concatenate 10 ranges, the iterator type of the range contains a discriminated union of the iterator types of the subranges. To avoid the chance of getting collapsing of "duplicates" the writer of the generic code would need to per-emptively wrap each of those types into a unique type. Having worked for a long time in the C++ world prior to allowing duplicates there is a more subtle issue that comes up when writing generic code. If you allow duplicates, everything tends to work by default. If you disallow duplicates, the problems can be latent until a user just-so-happens to be dealing with types that would end up with the implementation creating a discriminated union with duplicates. In other words, even when workarounds are known, stuff "half works" when the workarounds are not used. I suspect we are talking past each other because you are starting with the view that the discriminated union "should" not have duplicates. This property is not a given to argue away from, but rather it is a restriction that needs to have a practical reason for being. The uses for discriminated unions with duplicates do, indeed exist and naturally arise. It is possible to make workarounds for such a restriction, but the simpler solution is to not have the restriction to begin with. A discriminated union without duplicates can be easily and efficiently built from one that does allow duplicates. |
Beta Was this translation helpful? Give feedback.
-
This part, I find quite compelling. I think we may be discovering that the union types with subtype relationships, which are non-nominal and collapse duplicates, are simply a different beast from tagged union types, which are usually nominal, allow duplicate payload types but don't support subtype relationships. I don't believe the latter are simply a notional tuple of optionals; there's an additional invariant that exactly one of the tuple elements is non-nil, and that leads to fundamental efficiency differences. That's no less fundamental than the difference between a linked list and an array. It may simply be a question of how often the need for each arises, and how painful it is to build one in terms of the other. I note that since we probably need enums for C++ interop anyway, it might not be stupid to think about extending them to support payloads, if the answers to the above questions don't look favorable. |
Beta Was this translation helpful? Give feedback.
-
I’m pretty sure neither of these is more fundamental, at least if you have generics. Either one can be implemented in terms of the other
…Sent from my iPhone
On Sep 29, 2022, at 12:58 PM, Matt Calabrese ***@***.***> wrote:
I would ask to separate out the functionality and still have a way to make the more fundamental, simple discriminated union.
|
Beta Was this translation helpful? Give feedback.
-
On Sep 29, 2022, at 6:44 PM, Dimitri Racordon ***@***.***> wrote:
Well, experience has showed me that Swift-like enums lack sub typing, which I believe to be a fundamental property.
Yes, I thought of that right after I posted. And IIRC we worked through all this and that’s why we ended up with sum types instead of tagged variants, as long as we were going to have just one feature. But the universe is now telling us that premise may have been wrong.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm just now starting to follow Val and am very pleased with the overall direction of the language. I have not used it yet in any capacity and am still at the level of just reading through the docs. One thing that jumps out at me that I disagree with and would like to start a discussion on is this line from the docs:
I do not know how open for discussion this design decision is. What I take issue with is the first part of that quote and would like to push against the de-duplication of types. Instead, I suggest that duplicate types should never implicitly collapse -- this suggestion primarily comes from experience as a long-standing user of the (surprisingly many) algebraic datatype libraries in C++. The basis of this view is that different alternatives that happen to have the same type can still have different meaning, semantically, and so implicitly collapsing out of duplicate types can make working with these kinds of situations difficult (concrete example later in this post). I suggest that, instead, no type lists here are collapsed to remove duplicates, and I also encourage the ability to give the alternatives names, similar to how Val allows tuple elements to have names. If a user wishes to collapse out duplicates, they certainly still can do so if they want to, which most-commonly is desired when the particular union is being used for something like type-erasure over a fixed-set of possible types. There can certainly be language facilities to make such de-duplication easy, however, the creation of sum-types that have duplicate alternative types should ideally be directly supported and I'd argue should be the default.
A quick practical example of where you can naturally end up with duplicate types in a sum type and where de-duplication causes problems is with something akin to the concatenation of ranges in C++ (a view of N ranges that each have the same value type as-if they were a single range, one after the other in order). Forgive me for the C++ example, it is just what I am most used to (a talk describing the exact case I refer to can be seen here. , though I'll describe the relevant points in this post). In this case, the iterator type of the "concatenated" range ends up containing a sum-type of the iterator types of the source ranges. For instance, if you concatenate a
std::vector<int>
and astd::list<int>
, then the iterator type of the result contains a sum type ofstd::vector<int>::iterator
andstd::list<int>::iterator
. This is because any given iterator may be referring to either an element of thevector
or to an element of thelist
. Similarly, when you increment an iterator of the concatenated list, if you happen to be referring toa
vector
element, you need to check against theend
iterator of thatvector
after incrementing the underlying iterator, and if you reach theend
element, you update the sum-type to now contain thebegin
element of thelist
.In this particular situation, there happen to be no duplicate types and there are no problems. However, consider instead that you are concatenating a
vector<int>
and anothervector<int>
. There is nothing logically wrong with this situation and in some sense even appears simpler. Similar to the earlier example, what you naturally end up with inside of theiterator
type of the concatenated view is a sum type ofvector<int>::iterator
andvector<int>::iterator
. Even though these are the same type, their "position" (or index) into the sum type imbues additional semantics beyond the datatype that are necessary in order to implement the new iterator. They cannot be collapsed down to a sum type with a single alternative type without pushing additional complexities onto the user, since the differing discriminator value is precisely how you know whether the iterator refers to the element of the first underlying range or the element of the second underlying range.If the exact problem that arises is not clear, consider again what an increment of the
iterator
into the concatenated view looks like. When thatiterator
is incremented, we first increment the underlyingiterator
, and if that iterator then happens to be referring to theend
of the first range, we need to set the sum-type to now contain thebegin
of the second range. Again, we know whether the current underlyingiterator
refers to an element of the first range or the second range based on the discriminator of the sum-type (whether it is the 0th or 1st alternative). If the sum-type, instead, implicitly collapses away duplicates, this discrimination is lost to the user! The easiest solution user-side ends up being to add additional wrapping to each duplicate alternative type before forming the sum-type, and then "unwrapping" upon access. This kind of unfortunate workaround becomes necessary more generally any time you need to put types that are parameters into a sum-type if the facility being used to form the sum-type implicitly "collapses" duplicates.Situations like this are not specific to concatenation of ranges and naturally come up when you build more complicated datastructures around algebraic datatypes. Supporting duplicates is a default that makes sense, and users who want de-duplication for a specific case can still do that de-duplication with some kind of type-function if they wish to.
Beta Was this translation helpful? Give feedback.
All reactions