Skip to content

opaque type translation #208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

ssyram
Copy link
Contributor

@ssyram ssyram commented Jun 5, 2025

Originally, Eurydice simply drops the external types with private fields (opaque types), which causes many converted codes referring to undefined types. Thanks to the update in Charon in Charon/PR#704. Now we can fetch the layout info and translate the opaque types to:

typedef struct Type_s { uint8_t data[SIZE]; } Type;

Here SIZE is the actual size coming from the layout info.

This helps maintain the correct semantics of the original program and inter-crate modular translation.

Currently, the monomorphised types are still not supported to fetch layout info. Once Charon is ready, the code should work smoothly for types like Vec<i32>.

@Nadrieril
Copy link
Member

Nadrieril commented Jun 5, 2025

Do note that translating the layout of monomorphized generic types is not on the short-term queue, it will be a very complex change.

@ssyram
Copy link
Contributor Author

ssyram commented Jun 5, 2025

Well, a more temporary way would be to parse the result from "-Zprint-type-sizes", I guess everything is already included.

@Nadrieril
Copy link
Member

Nadrieril commented Jun 5, 2025

Let's keep that discussion to Zulip

@msprotz
Copy link
Contributor

msprotz commented Jun 9, 2025

Your proposal description is very high-level, so I'm going to try to fill in the blanks so that we can describe precisely what it is that you propose. If I had to guess, what you're proposing is the following. Please fill in the blanks and/or rephrase, I left question marks where I'm trying to guess.

Background:

  • to support separate compilation, Rust provides size information for types defined in other crates (?) so as to allow stack-allocating them (??) -- this is e.g. for the case where a crate provides pub fn f() -> T but T is not pub (???)
  • this allows reconciling at link-time code that was previously compiled separately
  • this is known as "opaque types" and is different from https://rustc-dev-guide.rust-lang.org/opaque-types-type-alias-impl-trait.html which is another concept entirely (but also called opaque types)

(Question: is this only for a multi-crate scenario or can this happen within a single crate?)

Proposal:

  • such opaque types compile to typedef struct Type_s { uint8_t data[SIZE]; } Type;
  • this allows allocating proper size information
  • C files can be compiled separately and reconciled at link-time, thus mimicking what is happening in Rust

(Question: what if only some of the fields of a type are private? what happens then?)

Open questions:

  • the C compiler might introduce some extra padding but it's ok to have more size than needed
  • as long as the two definitions (the complete definition and the opaque definition) do not coexist within the same C file we should be ok

Please comment -- thanks!

@ssyram
Copy link
Contributor Author

ssyram commented Jun 10, 2025

Thanks for this! Let me fill the blanks:

Background

  • to support separate compilation, Rust provides size information for types defined in other crates (YES) so as to allow stack-allocating them (YES) -- this is e.g. for the case where a crate provides pub fn f() -> T but T is not pub (YES and also fn f(t : T) or so).
  • this allows reconciling at link-time code that was previously compiled separately
  • this is known as "opaque types" and is different from https://rustc-dev-guide.rust-lang.org/opaque-types-type-alias-impl-trait.html which is another concept entirely (but also called opaque types), and the name opaque type is taken from the Charon production. Previously, a type decl that is Opaque simply returns None (i.e., [] now).

Question: is this only for a multi-crate scenario or can this happen within a single crate?

A: In summary, a multi-crate scenario is so common as real-world codes almost always use std, not to mention industrial-strength projects usually rely heavily on other crates. As a more general comment, I'm considering to make the Eurydice conversion more modular instead of a pure out-of-box usage, which alleviate our efforts in translating all the std.

As an example, my plan of systematic support of the std would be to parse std separately (which is still quite far away as Charon is still limited in ability, but may be not that far if we just need a part of them like Vec, Box), which provides the implementation of the types and functions like Vec. While in our projects that use the std library. We simply let Vec<T> for some specific T be some types with the same layout as the Vec (info is assumed to be exact as it is provided by Rustc).

Proposal

  • such opaque types compile to typedef struct Type_s { uint8_t data[SIZE]; } Type;
  • this allows allocating proper size information
  • C files can be compiled separately and reconciled at link-time, thus mimicking what is happening in Rust

Question: what if only some of the fields of a type are private? what happens then?

A: I'm considering to either make the fields all public by --include in Charon or simply padding the unknown fields. The latter can be supported now as the layout info from Charon is quite sophisticated -- it contains the exact offset info for each field with names. But yes, Charon should provide the information about which field is public and it is indeed an issue.

Open questions

the C compiler might introduce some extra padding but it's ok to have more size than needed

Yes.

as long as the two definitions (the complete definition and the opaque definition) do not coexist within the same C file we should be ok

Yes, and this should be guaranteed by Charon.

@ssyram
Copy link
Contributor Author

ssyram commented Jun 10, 2025

Seems that the current Mlkem error is from that you have provided a true definition for the type, while it is deemed opaque and the layout is generated in the new implementation.

@Nadrieril
Copy link
Member

To be clear, this notion of opaqueness is not a rust concept. Rustc has access to the entirety of all type definitions in crate dependencies. These opaque types are a notion invented by charon to avoid translating everything at all times.

Running charon with --include='*' (also known as --extract-opaque-bodies) in theory produces a llbc file without any of these opaque types. In practices this triggers errors because charon is not complete yet.

@ssyram
Copy link
Contributor Author

ssyram commented Jun 10, 2025

Thanks for the note. Translating everything very easily leads to errors when the codes use std.

@Nadrieril
Copy link
Member

In this case the easiest solution is to use --include and --exclude to select which definitions to extract. Hopefully like that you can avoid the definitions that charon can't extract for now.

@msprotz
Copy link
Contributor

msprotz commented Jun 10, 2025

@Nadrieril and I are discussing this right now -- we have further thoughts about your proposal:

  • we are concerned it will lead to irreconcilable issues owing to the C ABI calling convention, as dictated by either Windows or the System-V ABI
  • lacking alignment information, the sizes will not match
  • this might lead to issues with strict aliasing violations.

Questions for you:

  • how do you handle the case of alignment?
  • how can you argue that the C compiler will generate the same code when there are precise rules that dictate whether passing struct by value happens in registers or by spilling on the stack? I thought these rules were type-directed, especially for arm64 where register vs. stack-passing depends on the presence of floats or not in the struct.

@msprotz
Copy link
Contributor

msprotz commented Jun 10, 2025

@Nadrieril's proposal seems like it might help you achieve what you want -- perhaps let us know a little bit more about your use-case? we might be able to help you better this way (and if this is something you'd rather discuss in private, that's fine, and we can have a private chat on zulip!) cheers

@ssyram
Copy link
Contributor Author

ssyram commented Jun 10, 2025

@Nadrieril Thanks for your proposal, but this requires too much human-work, and I would prefer a more auto way. On the other hand, to include everything, even if possible, produces too much extra-stuff that requires more cleanup?

@msprotz Thanks for your considerations and kindness! For the alignment problem, as the layout info from Charon includes this, I think it can be easily added. For the second, let us study more about this in a broader range. But we are sure that we are not working on compilers that might get this wrong. But yes, this might not be correct for every potential compiler. So shall we make this translation optional?

@msprotz
Copy link
Contributor

msprotz commented Jun 10, 2025

@Nadrieril Thanks for your proposal, but this requires too much human-work, and I would prefer a more auto way. On the other hand, to include everything, even if possible, produces too much extra-stuff that requires more cleanup?

There is a mechanism in Eurydice to eliminate all unused definitions -- see the .yaml file I added in test/, it tells Eurydice to only keep the definitions that are reachable from that particular test file. Would this help you?

For the second, let us study more about this in a broader range. But we are sure that we are not working on compilers that might get this wrong. But yes, this might not be correct for every potential compiler. So shall we make this translation optional?

So let's think a little bit here, and let's look at, say, arm64: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#composite-types. It's clear that you must preserve the alignment between the two "visions" of the type (raw bytes, and actual type definition). But then there are other subtleties: if you look at 5.10, a C struct becomes what's known as a composite type in that document... unless it's made up entirely of floats! See 5.10.5.1 -- in that case it's called a homogeneous floating-point aggregate (HFA)

Then, let's look at 6 https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#the-base-procedure-call-standard. The rules are different for structs and for HFA... see 6.8.2. Structs might go into general-purpose registers if there are some left (Rule C.12), or to the stack if there aren't enough registers left (Rule C.15), unless the struct has only float fields, and no more than 4, in which case it's an HFA and the struct goes into SIMD/Floating-point registers (Rule C.2).

I'm not trying to be difficult here: I'm simply pointing out that the scheme you describe will be guaranteed not to work in some cases, because compilers have to obey the ABI. So it's not a matter of "we know the compiler won't get it wrong", it's a matter of "the compiler is required to get it wrong".

Given that information, I want to make sure we exhaust every other possibility before contemplating what you describe, because your solution comes with deep limitations (see above) which in turn will require more design work and will make it harder for me to say "great idea! let's merge immediately".

Questions for you.

  • Rust promotes whole-dependency compilation schemes where all the dependencies are available in cargo... what prevents you from doing a single run of Charon?
  • If the issue is that stuff in std is not extractable by Charon, would a --extract_opaque_defs combined with --exclude 'the::problematic::namespace::*' work? Along with a custom Eurydice config file to only retain those definitions that actually matter to you.

If the code that you're trying to run through Eurydice is public, it might help @Nadrieril and I provide good guidance, but understood if it's not.

@msprotz
Copy link
Contributor

msprotz commented Jun 10, 2025

Rust promotes whole-dependency compilation schemes where all the dependencies are available in cargo... what prevents you from doing a single run of Charon?

I also forgot to mention that --extract-opaque-bodies should pick all the definitions that are reachable

@ssyram
Copy link
Contributor Author

ssyram commented Jun 11, 2025

Thanks for the useful info! Let us study more on this case!

@ssyram
Copy link
Contributor Author

ssyram commented Jun 13, 2025

Sure, your points make sense, and this implementation is potentially buggy. Thanks for the very insightful discussion!

@ssyram ssyram closed this Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants