-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Background
Today, std.Target.Cpu.Arch
consists of 45 tags. Many of these are variations and not really full architectures in their own right, e.g. aarch64_be
is big-endian aarch64
, mips64
is 64-bit mips
, etc. This is a problem for three main reasons that I see:
- People tend to write code like
if (target.cpu.arch == .aarch64)
, not realizing that they've just created a portability bug. We have helpers likeArch.isAARCH64()
,Arch.isMIPS()
, etc to try to alleviate this, but they don't seem to really stop people from writing the incorrect check. - The list of
Arch
tags is going to get out of hand in the near future. I have a local branch that adds target info for a bunch of old and new architectures; in the current design, that means adding 102 new tags. Now, I might choose to exclude some of these if they're "dead enough", but even so, that is clearly not a reasonable number, and is only going to exacerbate the first problem. - The
Arch
tags are very inconsistent with regards to the spelling of endianness. We haveaarch64
/aarch64_be
,arm
/armeb
,bpfel
/bpfeb
,mips
/mipsel
,powerpc64
/powerpc64le
, etc. This is all inherited from the GNU ecosystem. My branch additionally addsriscv32be
/riscv64be
, among others. This is fine if you have all these tags memorized, but I think it's safe to say that most people don't, especially new users. Even I sometimes mix uppowerpc64le
andpowerpc64el
.
Proposal
pub const Cpu = struct {
arch: Arch,
bits: Bits,
endian: std.builtin.Endian,
// ...remaining fields same as status quo...
pub const Arch = enum {
amdgcn,
arc,
arm,
thumb,
aarch64,
avr,
bpf,
csky,
hexagon,
kalimba,
lanai,
loongarch,
m68k,
mips,
msp430,
nvptx,
powerpc,
propeller,
riscv,
s390x,
sparc,
spirv,
ve,
wasm,
x86,
xcore,
xtensa,
};
pub const Bits = enum {
@"16" = 16,
@"32" = 32,
@"64" = 64,
// ...potentially other esoteric sizes...
};
};
This takes us from 45 to 27 tags. It would take my branch from 102 new tags to around 70 new tags (not counting potential exclusions). It's still a lot of tags, but it's about as compressed as it's ever going to get, so I think for all practical purposes it solves point 2. It also solves point 1 because there's now just one tag you have to check for when you want to ask "am I targeting architecture family xyz
?". Finally, it solves point 3 by moving endianness into a separate field.
The way this works in the std.Target
API is straightforward enough, but what about UX? The way I envision this working is that, for the most part, we stick to a standard format for the arch
component of the arch-os-abi
triple: <name><bits><endian>
, with the meaning of each part probably being self-explanatory. An underscore can optionally be written between each part (allowing e.g. the familiar x86_64
which is much more readable than x8664
), and endian
can be spelled as any of le
, el
, be
, or eb
. name
is mandatory, but whether bits
and/or endian
are mandatory (or are allowed at all) would be specific to the architecture. Some examples:
aarch64
permitsendian
but defaults toel
.bits
is not permitted because it is always64
.- Yes, the tag name is
aarch64
. It's a bit of an odd duck, but it actually makes sense why this doesn't use thearm
tag when you realize that AArch64 has little in common with AArch32, and there exist AArch64 cores that can't even run A32/T32 userland code.- This is similarly why
thumb
still exists: There are cores that can only run Thumb code.
- This is similarly why
- Yes, the tag name is
bpf
requiresendian
.bits
is not permitted.csky
permitsendian
but defaults toel
.bits
is not permitted because it is always32
.loongarch
does not permitendian
because it is alwaysel
.bits
is required.mips
permitsendian
andbits
with both being optional, and defaulting toeb
and32
respectively.x86
does not permitendian
because it is alwaysel
.bits
is optional and defaults to32
.
There would be more architecture-specific validation on top of this; for example, a bits
value of 16
makes no sense for most architectures.
When printing target triples, there will be a canonical form of the arch
component. For example, x86_64
will be preferred over x8664
but mips64
will be preferred over mips_64
, el
/eb
will be preferred over le
/be
across the board, default bits
and endian
values will be omitted, etc.
This scheme means that any arch
component spelling that is valid today will continue to be valid. But it also lets us get rid of the eyesore that is aarch64_be
, for example, where the canonical form would become aarch64eb
.
Alternatives
This is a bit more "out there": We could allow colon as a separator in the This won't work.arch
component. Maybe even make canonical form always use it, i.e. x86:64
, arm:eb
, mips:64:el
, etc. 🤷