Proposal: Compress `std.Target.Cpu.Arch` and rework the arch component syntax

## Background

Today, `std.Target.Cpu.Arch` consists of 45 tags. Many of these are variations and not really full architectures in their own right, e.g. `aarch64_be` is big-endian `aarch64`, `mips64` is 64-bit `mips`, etc. This is a problem for three main reasons that I see:

* People tend to write code like `if (target.cpu.arch == .aarch64)`, not realizing that they've just created a portability bug. We have helpers like `Arch.isAARCH64()`, `Arch.isMIPS()`, etc to try to alleviate this, but they don't seem to really stop people from writing the incorrect check.
* The list of `Arch` tags is going to get out of hand in the near future. I have a local branch that adds target info for a bunch of old and new architectures; in the current design, that means adding **102 new tags**. Now, I might choose to exclude some of these if they're "dead enough", but even so, that is clearly not a reasonable number, and is only going to exacerbate the first problem.
* The `Arch` tags are very inconsistent with regards to the spelling of endianness. We have `aarch64`/`aarch64_be`, `arm`/`armeb`, `bpfel`/`bpfeb`, `mips`/`mipsel`, `powerpc64`/`powerpc64le`, etc. This is all inherited from the GNU ecosystem. My branch additionally adds `riscv32be`/`riscv64be`, among others. This is fine if you have all these tags memorized, but I think it's safe to say that most people don't, especially new users. Even I sometimes mix up `powerpc64le` and `powerpc64el`.

## Proposal

```zig
pub const Cpu = struct {
    arch: Arch,
    bits: Bits,
    endian: std.builtin.Endian,
    // ...remaining fields same as status quo...

    pub const Arch = enum {
        amdgcn,
        arc,
        arm,
        thumb,
        aarch64,
        avr,
        bpf,
        csky,
        hexagon,
        kalimba,
        lanai,
        loongarch,
        m68k,
        mips,
        msp430,
        nvptx,
        powerpc,
        propeller,
        riscv,
        s390x,
        sparc,
        spirv,
        ve,
        wasm,
        x86,
        xcore,
        xtensa,
    };

    pub const Bits = enum {
        @"16" = 16,
        @"32" = 32,
        @"64" = 64,
        // ...potentially other esoteric sizes...
    };
};
```

This takes us from 45 to 27 tags. It would take my branch from 102 new tags to around 70 new tags (not counting potential exclusions). It's still a lot of tags, but it's about as compressed as it's ever going to get, so I think for all practical purposes it solves point 2. It also solves point 1 because there's now just one tag you have to check for when you want to ask "am I targeting architecture family `xyz`?". Finally, it solves point 3 by moving endianness into a separate field.

The way this works in the `std.Target` API is straightforward enough, but what about UX? The way I envision this working is that, for the most part, we stick to a standard format for the `arch` component of the `arch-os-abi` triple: `<name><bits><endian>`, with the meaning of each part probably being self-explanatory. An underscore can optionally be written between each part (allowing e.g. the familiar `x86_64` which is much more readable than `x8664`), and `endian` can be spelled as any of `le`, `el`, `be`, or `eb`. `name` is mandatory, but whether `bits` and/or `endian` are mandatory (or are allowed at all) would be specific to the architecture. Some examples:

* `aarch64` permits `endian` but defaults to `el`. `bits` is not permitted because it is always `64`.
    * Yes, the tag name is `aarch64`. It's a bit of an odd duck, but it actually makes sense why this doesn't use the `arm` tag when you realize that AArch64 has little in common with AArch32, and there exist AArch64 cores that can't even run A32/T32 userland code.
        * This is similarly why `thumb` still exists: There are cores that can only run Thumb code.
* `bpf` requires `endian`. `bits` is not permitted.
* `csky` permits `endian` but defaults to `el`. `bits` is not permitted because it is always `32`.
* `loongarch` does not permit `endian` because it is always `el`. `bits` is required.
* `mips` permits `endian` and `bits` with both being optional, and defaulting to `eb` and `32` respectively.
* `x86` does not permit `endian` because it is always `el`. `bits` is optional and defaults to `32`.

There would be more architecture-specific validation on top of this; for example, a `bits` value of `16` makes no sense for most architectures.

When printing target triples, there will be a canonical form of the `arch` component. For example, `x86_64` will be preferred over `x8664` but `mips64` will be preferred over `mips_64`, `el`/`eb` will be preferred over `le`/`be` across the board, default `bits` and `endian` values will be omitted, etc.

This scheme means that any `arch` component spelling that is valid today will continue to be valid. But it also lets us get rid of the eyesore that is `aarch64_be`, for example, where the canonical form would become `aarch64eb`.

## Alternatives

~~This is a bit more "out there": We could allow colon as a separator in the `arch` component. Maybe even make canonical form always use it, i.e. `x86:64`, `arm:eb`, `mips:64:el`, etc. :shrug:~~ [This won't work.](https://github.com/ziglang/zig/issues/23530#issuecomment-2795529814)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Proposal: Compress `std.Target.Cpu.Arch` and rework the arch component syntax #23530

Background

Proposal

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Proposal: Compress std.Target.Cpu.Arch and rework the arch component syntax #23530

Description

Background

Proposal

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Proposal: Compress `std.Target.Cpu.Arch` and rework the arch component syntax #23530