Skip to content

Proposal: Compress std.Target.Cpu.Arch and rework the arch component syntax #23530

@alexrp

Description

@alexrp

Background

Today, std.Target.Cpu.Arch consists of 45 tags. Many of these are variations and not really full architectures in their own right, e.g. aarch64_be is big-endian aarch64, mips64 is 64-bit mips, etc. This is a problem for three main reasons that I see:

  • People tend to write code like if (target.cpu.arch == .aarch64), not realizing that they've just created a portability bug. We have helpers like Arch.isAARCH64(), Arch.isMIPS(), etc to try to alleviate this, but they don't seem to really stop people from writing the incorrect check.
  • The list of Arch tags is going to get out of hand in the near future. I have a local branch that adds target info for a bunch of old and new architectures; in the current design, that means adding 102 new tags. Now, I might choose to exclude some of these if they're "dead enough", but even so, that is clearly not a reasonable number, and is only going to exacerbate the first problem.
  • The Arch tags are very inconsistent with regards to the spelling of endianness. We have aarch64/aarch64_be, arm/armeb, bpfel/bpfeb, mips/mipsel, powerpc64/powerpc64le, etc. This is all inherited from the GNU ecosystem. My branch additionally adds riscv32be/riscv64be, among others. This is fine if you have all these tags memorized, but I think it's safe to say that most people don't, especially new users. Even I sometimes mix up powerpc64le and powerpc64el.

Proposal

pub const Cpu = struct {
    arch: Arch,
    bits: Bits,
    endian: std.builtin.Endian,
    // ...remaining fields same as status quo...

    pub const Arch = enum {
        amdgcn,
        arc,
        arm,
        thumb,
        aarch64,
        avr,
        bpf,
        csky,
        hexagon,
        kalimba,
        lanai,
        loongarch,
        m68k,
        mips,
        msp430,
        nvptx,
        powerpc,
        propeller,
        riscv,
        s390x,
        sparc,
        spirv,
        ve,
        wasm,
        x86,
        xcore,
        xtensa,
    };

    pub const Bits = enum {
        @"16" = 16,
        @"32" = 32,
        @"64" = 64,
        // ...potentially other esoteric sizes...
    };
};

This takes us from 45 to 27 tags. It would take my branch from 102 new tags to around 70 new tags (not counting potential exclusions). It's still a lot of tags, but it's about as compressed as it's ever going to get, so I think for all practical purposes it solves point 2. It also solves point 1 because there's now just one tag you have to check for when you want to ask "am I targeting architecture family xyz?". Finally, it solves point 3 by moving endianness into a separate field.

The way this works in the std.Target API is straightforward enough, but what about UX? The way I envision this working is that, for the most part, we stick to a standard format for the arch component of the arch-os-abi triple: <name><bits><endian>, with the meaning of each part probably being self-explanatory. An underscore can optionally be written between each part (allowing e.g. the familiar x86_64 which is much more readable than x8664), and endian can be spelled as any of le, el, be, or eb. name is mandatory, but whether bits and/or endian are mandatory (or are allowed at all) would be specific to the architecture. Some examples:

  • aarch64 permits endian but defaults to el. bits is not permitted because it is always 64.
    • Yes, the tag name is aarch64. It's a bit of an odd duck, but it actually makes sense why this doesn't use the arm tag when you realize that AArch64 has little in common with AArch32, and there exist AArch64 cores that can't even run A32/T32 userland code.
      • This is similarly why thumb still exists: There are cores that can only run Thumb code.
  • bpf requires endian. bits is not permitted.
  • csky permits endian but defaults to el. bits is not permitted because it is always 32.
  • loongarch does not permit endian because it is always el. bits is required.
  • mips permits endian and bits with both being optional, and defaulting to eb and 32 respectively.
  • x86 does not permit endian because it is always el. bits is optional and defaults to 32.

There would be more architecture-specific validation on top of this; for example, a bits value of 16 makes no sense for most architectures.

When printing target triples, there will be a canonical form of the arch component. For example, x86_64 will be preferred over x8664 but mips64 will be preferred over mips_64, el/eb will be preferred over le/be across the board, default bits and endian values will be omitted, etc.

This scheme means that any arch component spelling that is valid today will continue to be valid. But it also lets us get rid of the eyesore that is aarch64_be, for example, where the canonical form would become aarch64eb.

Alternatives

This is a bit more "out there": We could allow colon as a separator in the arch component. Maybe even make canonical form always use it, i.e. x86:64, arm:eb, mips:64:el, etc. 🤷 This won't work.

Metadata

Metadata

Assignees

Labels

breakingImplementing this issue could cause existing code to no longer compile or have different behavior.proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.standard libraryThis issue involves writing Zig code for the standard library.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions