Skip to content

Use typing features for supported languages #14784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dcbaker
Copy link
Member

@dcbaker dcbaker commented Jul 10, 2025

This uses to a mixture of Literal and TypeAlias to provide better checking for compilers/programming language access. It's a bit of extra code, and does require a few casts to help out cases that it absolutely shouldn't need casts for, but it does catch a real bug in the CMake dependency code.

@dcbaker dcbaker added the typing label Jul 10, 2025
@dcbaker dcbaker force-pushed the submit/supported-compiler-languages-typing branch 2 times, most recently from 3a52655 to 112e34b Compare July 10, 2025 20:58
Comment on lines 303 to 305
# It really should be enough to do `lang: AllLanguages`, but neither
# mypy nor pyright seem to be able to understand the list here matches that
for lang in T.cast('T.List[AllLanguages]', ['objcpp', 'cpp', 'objc', 'fortran', 'c']):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to python/mypy#18826, you could perhaps use a tuple here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( I completely agree with the poster on that issue, this is not what tuples are for, this is a place for a list. I guess I can use a tuple but they're right that it's not the correct place for one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The poster is totally wrong.

As a general rule, I try to maintain a semantic distinction between tuples and lists: tuples are used to collect unlike elements of related data together so they can be manipulated as a single object, whereas lists are used to collect a sequence of common items for iteration. I'll use the 'wrong' type if I have to, e.g. converting a list to a tuple if I need to use it as a dictionary key, or using a list to assemble the pieces of related data one-by-one before I then convert them to a tuple once complete.

So, if you "have to", you'll use a tuple for the actual semantic purpose of a tuple, and if you don't have to, you maintain a meaningless nonsensical made-up distinction? No thanks -- use immutable fixed-length types when mutability / hashability doesn't matter, not for "unlike elements of related data together" (unlike elements? what?)

@dcbaker dcbaker force-pushed the submit/supported-compiler-languages-typing branch from 112e34b to 356ab05 Compare July 14, 2025 18:16
dcbaker added 2 commits July 14, 2025 12:15
Which would prevent ever finding an ObjC++ dependency with CMake...
This data isn't needed until after the super() call anyway, and by doing
that we have the `self.for_machine` variable, which can be used for
getting the correct compiler dictionary.
@dcbaker dcbaker force-pushed the submit/supported-compiler-languages-typing branch from 356ab05 to 30e5455 Compare July 14, 2025 19:16
Comment on lines 303 to 305
# It really should be enough to do `lang: AllLanguages`, but neither
# mypy nor pyright seem to be able to understand the list here matches that
for lang in T.cast('T.List[AllLanguages]', ['objcpp', 'cpp', 'objc', 'fortran', 'c']):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The poster is totally wrong.

As a general rule, I try to maintain a semantic distinction between tuples and lists: tuples are used to collect unlike elements of related data together so they can be manipulated as a single object, whereas lists are used to collect a sequence of common items for iteration. I'll use the 'wrong' type if I have to, e.g. converting a list to a tuple if I need to use it as a dictionary key, or using a list to assemble the pieces of related data one-by-one before I then convert them to a tuple once complete.

So, if you "have to", you'll use a tuple for the actual semantic purpose of a tuple, and if you don't have to, you maintain a meaningless nonsensical made-up distinction? No thanks -- use immutable fixed-length types when mutability / hashability doesn't matter, not for "unlike elements of related data together" (unlike elements? what?)

@dcbaker
Copy link
Member Author

dcbaker commented Jul 14, 2025

So, if you "have to", you'll use a tuple for the actual semantic purpose of a tuple, and if you don't have to, you maintain a meaningless nonsensical made-up distinction? No thanks -- use immutable fixed-length types when mutability / hashability doesn't matter, not for "unlike elements of related data together" (unlike elements? what?)

The semantic purpose of a tuple is to hold data where the index of the data has semantic meaning, and where it can be of different types. The entire point of tuples as opposed to lists is that you can write tuple[str, int, Foo]. and any index of [2] out of such a tuple will always be a Foo. that's the point, that's what they're for. That's not what I'm writing here, I'm writing a sequence of like types where the index is immaterial, you will get an AllLanguage no matter which index you take.

@eli-schwartz
Copy link
Member

The entire point of tuples as opposed to lists is that you can write tuple[str, int, Foo]. and any index of [2] out of such a tuple will always be a Foo. that's the point, that's what they're for.

Tuples predate the typing system and exist to disallow modifications and allow hashing / use as dict keys. This remains the case regardless of people's fictional dreams about the type system. When people begrudge python datatypes because they are "forced" to "violate the semantic meaning of tuples" in order to use them for their original runtime purpose -- I call foul.

@dcbaker dcbaker force-pushed the submit/supported-compiler-languages-typing branch from 30e5455 to 11680f8 Compare July 14, 2025 20:27
@dcbaker
Copy link
Member Author

dcbaker commented Jul 14, 2025

That's simply not true. Tuples are basically structs with index lookup instead of attribute lookup. The fact that they are immutable is a design decision that python made, as is allowing for tuple[str, ...] in the type system. They are about grouping related data, not about sequencing. The Python docs say as much: https://docs.python.org/3/faq/design.html#why-are-there-separate-tuple-and-list-data-types

@eli-schwartz
Copy link
Member

The data in this PR is small collections of related data. You even operate on them as a group. (The sub-algorithm operates element by element but applies the same algorithm to each one).

Objecting that it's "semantically inappropriate and shouldn't be allowed" to use tuples if all elements have the same type, is bizarre.

So is arguing that because it's "similar to a C struct" it's "semantically inappropriate and shouldn't be allowed" to use tuples if you don't happen to take advantage of "the index of the data has semantic meaning".

Frankly, conversations like this cause me to want to turn away from type annotations entirely.

@QuLogic
Copy link
Member

QuLogic commented Jul 14, 2025

I don't really care about this discussion, but the docs you linked don't really ascribe any such meaning that you are claiming. And sure, if you want to type hint a Cartesian coordinate made up of two numbers across an API boundary, then a fixed-size tuple makes sense.

But we're talking about literals that are passed directly to a for loop; the distinction between tuples and lists is meaningless.

@dcbaker dcbaker force-pushed the submit/supported-compiler-languages-typing branch from 11680f8 to b0ef487 Compare July 14, 2025 21:31
@dcbaker
Copy link
Member Author

dcbaker commented Jul 14, 2025

I'm done arguing this because all we're doing is making each other mad and not making any progress. I've changed to a tuple, but linked the issue from above, as this is a bug in mypy, since it correctly identifies set(), frozenset() and tuple() as Literal, but not list().

@eli-schwartz
Copy link
Member

eli-schwartz commented Jul 14, 2025

My usual hangout's type system fans agree with me that "this [for loops] is actually the perfect place to use a tuple".

Also, "mypy is correct":

it's very goofy to make type deduction positional in that manner

But we're talking about literals that are passed directly to a for loop; the distinction between tuples and lists is meaningless.

The optimizer compiles it to a tuple if you choose to write a list, incidentally. See via dis.dis().

@dcbaker
Copy link
Member Author

dcbaker commented Jul 14, 2025

Having worked on optimizing compilers, I'd expect that the optimizer would attempt to turn lists into tuples and sets into frozensets if it can, or even some lower level things that are not exposed publicly like a really primitive singly-linked list. Optimization is basically the black art of taking safe, readable, understandable, maintainable code and turning it into very fast code that gets the same result, all of those previous things are optional :)

@dcbaker
Copy link
Member Author

dcbaker commented Jul 14, 2025

Sigh, and we still get to cast because old versions of mypy.

@dcbaker dcbaker force-pushed the submit/supported-compiler-languages-typing branch from b0ef487 to 05b0dc0 Compare July 14, 2025 21:51
@dcbaker
Copy link
Member Author

dcbaker commented Jul 14, 2025

Okay, so mypy for Python 3.7 doesn't do the deduction correctly with tuples. I have put them back to a list with a different comment, explaining this. I think the cast(list[AllLanguages], [...]) is clearer than cast(tuple[AllLanguages, ...]), (...)). We can still use tuples if everyone thinks that's better.

As I realized there was already a place where we were casting a tuple[Language,...] to T.Iterable[Language] to work around the above old mypy issue, I've applied that pattern consistantly, so its now cast(T.Iterable[Language], tuple(...))

@dcbaker dcbaker force-pushed the submit/supported-compiler-languages-typing branch from 05b0dc0 to c2084ed Compare July 14, 2025 22:14
This found a real bug in the cmake code! (fixed in a previous patch).
This also changes `all_languages` to an OrderedSet, removing the need to
sort it in many places, but preserving the ability to use set operations
on it.

There is some casting to `list[AllLanguages]` that's going on here. Per:
python/mypy#18826, this casting can be avoided
by using a `tuple[AllLanguages, ...]` instead. However, this doesn't
work in older versions of mypy which we run in CI. I have chosen to use
lists here because I think the annotations in the case are more
straightforward to understand than the tuple ones are. I believe this
could be revisted when we drop Python 3.7 support.
@dcbaker dcbaker force-pushed the submit/supported-compiler-languages-typing branch from c2084ed to 1f709bb Compare July 14, 2025 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants