Skip to content

my-LinkedIn/bluepill-zig

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bare Metal Zig on STM32

This is about using Zig alone to directly program boards based on STM32F103xx MCU, such as "Blue Pill" clones of Maple Mini.

Requirements

  • A Blue Pill board or similar, see here for general and physical info.

  • Zig to build our examples. (I used version 0.13.0 here.)

  • ST-Link Tools to flash them.

Docs

Target

One of the cool things about Zig is that (thanks to LLVM) it can compile for many architectures out of the box and yet manages to keep its MacOS package size under 50MB without any external dependencies!

Zig expects its -target command line switch to be a dash-separated triple identifying architecture, operating system and ABI (application binary interface) of the target system. See the whole list by running:

zig targets | less

Our architecture is ARM, we don't have any OS and we don't care about any particular ABI, so we are going to use arm-freestanding-none as our -target.

Another command line switch, -mcpu, should be used to specify the processor to generate the code for. STM32F103xx is based on Cortext-M3, so consulting the list of supported targets the most logical choice seemed to be cortex_m3. I am getting build errors when using one however (something about an instruction in IT block), so I decided to step back to cortex_m23 that seems to be a subset of Cortext-M3 instruction-wise, missing exactly the IT instruction. (I don't expect the compiler to use "TrustZone" security extensions. We could also use cortex_m0 to be 100% sure it won't generate something unsupported.)

Before we go to the remaining compiler settings let's type in some code first. Let's make the classic "Blink" example toggling the on-board LED (see v0/main.zig). It's not going to be nice, but we'll improve it later.

Version 0

Looking at the Blue Pill Schematic we see that the LED is attached to PC13 of the MCU which can be controlled via port 13 of GPIO bank C. GPIO and the corresponding registers are described in Chapter 9 of the Reference Manual, while the base address of bank C, 0x4001_1000, can be found in the STM32F103x8 Datasheet.

Before we can use the GPIO bank C however we need to enable it via one of the Reset and Clock Control (RCC) registers (base 0x4002_1000), see chapter 7.3. GPIO bank C is controlled by bit 4 of RCC_APB2ENR, offset 0x18:

reg(0x4002_1000 + 0x18).* |= 1 << 4;

Where reg() is a simple wrapper that gets us a volatile pointer, which we need when working with memory mapped registers for Zig's optimizer to not try removing or reorder our reads/writes.

fn reg(comptime address: u32) *volatile u32 {
    return @ptrFromInt(address);
}

Before we can start toggling port 13 however we need to configure it as output via GPIOx_CRH register (offset 0x04, see chapter 9.2.2). Each nibble in this register is responsible for configuration of ports 8-15:

reg(0x4001_1000 + 0x04).* |= 0b01_10 << (4 * (13 - 8)); // Relying on the reset value being 0b01_00.

We'll be using GPIOx_BSRR register (offset 0x10 from the base, see chapter 9.2.5) to control the output state of our port. Setting bits 0-15 here sets the output on ports 0-15 to 1, while setting bits 16-32 resets the output on the same ports. (This register is more convenient than more traditional GPIOx_ODR, because there is no need to read its current state to modify a single bit.)

const GPIOC_BSRR = reg(0x4001_1000 + 0x10);
while (true) {
    GPIOC_BSRR.* = 1 << (13 + 16);
    delay(50);
    GPIOC_BSRR.* = 1 << 13;
    delay(950);
}

To implement delay() we'll just do something in a long loop. (We'll return to better implementation in the next version of the example.)

fn delay_ticks(ticks: u32) void {
    var i = ticks;
    while (i > 0) {
        // Reading any location to prevent the loop from being optimized out.
        _ = reg(0x2000_0000).*;
        i -= 1;
    }
}

To calculate the number of ticks we need to iterate to get a millisecond delay we need to know that after reset our CPU runs approximately at 8MHz and that every iteration in the above loop takes 6 CPU cycles (more on this below):

fn delay(comptime ms: u32) void {
    delay_ticks(ms * 8_000 / 6);
}

Linker Script

OK, now when we have a basic program (see v0/main.zig) we can try to compile it:

zig build-exe -target arm-freestanding-none -mcpu cortex_m23 -O ReleaseSmall -femit-asm main.zig

The extra -femit-asm flag makes Zig produce assembly output which is handy when examining our code. For example, this is how we can calculate how many CPU clock cycles delay_ticks() spends per tick (comments added by me using info on this page):

main.delay_ticks:           ; r0 contains the number of ticks already.
    movs    r1, #1
    lsls    r1, r1, #29     ; (1 << 29) is this 0x20000000 address we are reading from below.
.LBB1_1:
    cbz r0, .LBB1_3         ; 1 cycle, branch not taken. (Jump out of the loop if tick counter in r0 is zero.)
    ldr r2, [r1]            ; 2 cycles. (Our fake read.)
    subs    r0, r0, #1      ; 1 cycle. (Decrement the tick counter in r0.)
    b   .LBB1_1             ; 2 cycles, branch is taken. (Repeat the loop.)
.LBB1_3:
    bx  lr                  ; Return from the function.

Speaking of assembly, we could also disassemble the output file directly with objdump -d main, but it might be harder to see what's going on, here is the same delay_ticks() function:

20130: 2101             movs    r1, #1
20132: 0749             lsls    r1, r1, #29
20134: b110             cbz r0, 0x2013c <.text+0x50> @ imm = #4
20136: 680a             ldr r2, [r1]
20138: 1e40             subs    r0, r0, #1
2013a: e7fb             b   0x20134 <.text+0x48>    @ imm = #-10
2013c: 4770             bx  lr

Another thing that we can see with objdump is that our code starts at address 000200ec which is quite wrong for our MCU where flash memory begins at 0x08000000:

main:   file format elf32-littlearm

Disassembly of section .text:

000200ec <.text>:
   200ec: 480c          ldr r0, [pc, #48]           @ 0x20120 <.text+0x34>
   200ee: 6801          ldr r1, [r0]
   200f0: 2210          movs    r2, #16
...

Well, this is logical because Zig does not really know much about our MCU. We need to help it by writing a "linker script". The official documentation on the topic mentioned above is easy to read and actual scripts are fairly self-explanatory.

The first thing we do in our script (v0/bluepill.ld) is describing relevant memory regions, which is quite simple in our case as we have 128K of flash memory starting at 0x08000000 and 20K of RAM starting at 0x20000000 (see chapter 4 in the Datasheet):

MEMORY {
    flash (rx)  : o = 0x08000000, l = 128K
    sram (rw)   : o = 0x20000000, l = 20K
}

(The names of the regions can be arbitrary here, the linker does not know what the "flash" is.)

The next part of the script tells what should be placed into the flash memory:

SECTIONS {
    .text : {
        ...
    } >flash

We cannot tell it to begin filling with the code from the start as the first word has to be the value of the main stack pointer (MSP), as per chapter 2.1.2 of the Programming Manual:

On reset, the processor loads the MSP with the value from address 0x00000000.

(The address is from the start of the flash, 0x08000000 in our case.)

Next go interrupt vectors (see table 63 in the Reference Manual) of which we are only interested in the first one, Reset, as we don't use interrupts just yet:

.text : {
    /* The initial value of SP, past the end of RAM. */
    LONG(ORIGIN(sram) + LENGTH(sram))
    /* Reset vector. */
    LONG(_start)
    /* We should put a bunch of other vectors here, but since none are used yet we can use the space. */
    /* So now goes our code. */
    *(.text)
    /* Then read-only data. */
    *(.rodata.*) 
    *(.rodata) 
} >flash

Next we tell that our writable data (static variables) is expected in RAM. We don't have such variables in our simple example yet, but that'll be handy later.

.bss : { 
    *(.bss) 
} >sram

And finally we exclude a few extra code segments that otherwise would increase the size of our binary:

/DISCARD/ : { 
    /* I don't want to keep sections needed only when printing stack traces. */
    *(.ARM.*)
}

Let's compile using our linker script now:

zig build-exe -target arm-freestanding-none -mcpu cortex_m23 -femit-asm -O ReleaseSmall --script bluepill.ld main.zig

Disassembling with objdump shows that the addresses are correct now:

main:   file format elf32-littlearm

Disassembly of section .text:

08000000 <.text>:
 8000000: 20005000      andhs   r5, r0, r0
 8000004: 08000009      stmdaeq r0, {r0, r3}
 8000008: 6801480c      stmdavs r1, {r2, r3, r11, lr}
...

The first word appears to be the desired stack pointer just beyond the RAM followed by the Reset vector pointing to the next word. The number is even to indicate Thumb mode. Let's add --mcpu=cortex-m23 to force Thumb mode:

08000000 <.text>:
 8000000: 5000          str r0, [r0, r0]
 8000002: 2000          movs    r0, #0
 8000004: 0009          movs    r1, r1
 8000006: 0800          lsrs    r0, r0, #32
 8000008: 480c          ldr r0, [pc, #48]           @ 0x800003c <.text+0x3c>
 800000a: 6801          ldr r1, [r0]
 800000c: 2210          movs    r2, #16
 ...

OK, now the part starting at 0x8000008 looks like the code in our .s file.

Flashing

We'll be using st-flash utility which expects either a raw binary with the starting address passes separately or an Intel hex file that already contains addresses. Let's use the latter by converting our build to .hex with Zig:

zig objcopy -O hex main main.hex

Flashing is then as simple as:

st-flash --reset --format ihex write main.hex 

You'll see something like this and your LED will hopefully start blinking every second:

st-flash 1.8.0
2024-08-18T22:01:38 INFO common.c: STM32F1xx_MD: 20 KiB SRAM, 128 KiB flash in at least 1 KiB pages.
2024-08-18T22:01:38 INFO common_flash.c: Attempting to write 90 (0x5a) bytes to stm32 address: 134217728 (0x8000000)
-> Flash page at 0x8000000 erased (size: 0x400)
2024-08-18T22:01:38 INFO flash_loader.c: Starting Flash write for VL/F0/F3/F1_XL
2024-08-18T22:01:38 INFO flash_loader.c: Successfully loaded flash loader in sram
2024-08-18T22:01:38 INFO flash_loader.c: Clear DFSR
  1/1   pages written
2024-08-18T22:01:38 INFO common_flash.c: Starting verification of write complete
2024-08-18T22:01:38 INFO common_flash.c: Flash written and verified! jolly good!

Also, as you can see our code is just 90 bytes, which is quite nice given all the required setup instructions.

V1

Now let's improve the example showing some power of Zig:

export fn _start() noreturn {
    const bankC = GPIOBank(.C);
    bankC.init();

    const led = bankC.port(13);
    led.setOutput(.openDrain, .max2MHz);

    while (true) {
        led.reset();
        delay(50);
        led.set();
        delay(950);
    }
}

The GPIOBank is an abstraction that is more readable, more reusable (we can use all banks/ports) but does not add any overhead as all the selection of the bank and port happen at compile time. Our program is 94 bytes now, which is just 4 bytes larger only because we are not relying on the reset values when writing to GPIOC_CRH as we want to change pin configuration at runtime:

pub fn GPIOBank(comptime bank: GPIOBankIndex) type {
    return struct {
        /* ... */
        pub fn port(comptime pin: u4) type {
            return struct {
                fn reg(comptime offset: u32) *volatile u32 {
                    return @ptrFromInt(switch (bank) {
                        .A => 0x4001_0800,
                        .B => 0x4001_0C00,
                        .C => 0x4001_1000,
                        .D => 0x4001_1400,
                        .E => 0x4001_1800,
                    } + offset);
                }

                fn setModeBits(comptime bits: u32) void {
                    const CRx = reg(if (pin >= 8) 0x04 else 0x00);
                    const shift = 4 * @as(u8, if (pin >= 8) pin - 8 else pin);
                    CRx.* = CRx.* & ~(@as(u32, 0xF) << shift) | (bits << shift);
                }
                
                const BSRR = reg(0x10);

                pub fn set() void {
                    BSRR.* = 1 << pin;
                }
                /* ... */
            };
        }
    };
}

As you can see reg() depends on the bank, but since both bank and pin are comptime, thus both GPIOx_CRx (L or H) and GPIOx_BSRR are picked at compile time as well and we always get highly optimized code.

V2

Here we are adding a SysTick timer for a better delay() along with USART to say the actual Hello. (I've moved all helpers into a module called z41 here.)

Our binary is 792 bytes now, or 314 if we completely remove the line with usart.writer.print, and 432 if we keep it but don't output the number. In other words, the use of std.fmt adds overhead only when specifiers are actually used, something that would be hard to achieve with a printf()-style C/C++ function.

const std = @import("std");
const z41 = @import("z41");

export fn _start() noreturn {
    const rcc = z41.RCC(.internalRC);
    rcc.init();

    const SysTick = z41.SysTick(rcc.SYSCLK, 50);
    SysTick.init();

    const led = z41.GPIO(rcc, .C).port(13);
    led.Bank.init();
    led.setOutput(.openDrain, .max2MHz);

    const usart = z41.USART(rcc, .usart1);
    usart.init(115200);

    usart.writeBytes("\nHello! It's V2\n\n");

    while (true) {
        led.reset();
        SysTick.delay(50);

        led.set();
        SysTick.delay(950);

        try usart.writer.print("\rUptime: {}s", .{SysTick.milliseconds() / 1000});
    }
}

RegisterSet

I've added RegisterSet under the hood to help with definition of hardware registers. It's similar to this reg() helper from v0, but allows using structs as well. For example, this is how STK_CTRL is described in SysTick (you should appreciate Zig allowing anonymous enums like here in CLKSOURCE):

const STK_CTRL = regs.at(0, packed struct(u32) {
    /// Counter enable.
    ENABLE: bool,
    TICKINT: bool,
    CLKSOURCE: enum(u1) {
        /// AHB/8.
        AHB_8 = 0,
        /// Processor clock (AHB).
        AHB = 1,
    },
    _r1: u13 = 0,
    COUNTFLAG: bool = false,
    _r2: u15 = 0,
});

You can still use raw u32 registers where needed:

const STK_LOAD = regs.at(4, u32);

The helper checks the type you pass to make sure it's u32 or u32-backed packed struct:

pub fn at(comptime offset: u32, comptime reg_type: type) *volatile reg_type {
    const valid = switch (@typeInfo(reg_type)) {
        .Struct => |s| switch (s.layout) {
            .@"packed" => s.backing_integer == u32,
            else => false,
        },
        .Int => |i| i.bits == 32,
        else => false,
    };
    if (!valid) {
        @compileError("Expected `reg_type` to be u32 or a packed struct backed by u32");
    }
    return @ptrFromInt(base + offset);
}

SysTick

A SysTick timer is described in the chapter 4.5 of the Programming Manual and is something common to all processors based on Cortex®-M3. It's a simple counter that is decremented on every (or every 8ths) CPU clock cycle and generates an interrupt when it reaches zero. It can be used to implement a notion of system time (e.g. milliseconds since system start) along with better delays, where we don't have to rely on how exactly our code is compiled.

We need to be able to handle interrupts for this helper and this is where our linker script needs to be changed. The handler itself is simple:

pub fn SysTick(comptime cpuFreq: u32, comptime msPerTick: u32) type {
    return struct {
        export fn SysTick_Vector() void {
            total_ms +%= msPerTick;
        }

        var total_ms: u32 = undefined;
        ...

It needs to be exported for our linker script to place a pointer to it into an appropriate location. (Note that the export only happens when SysTick is used, something that would be hard to achieve in C/C++ without macros.)

Other than export no other attributes are needed here thanks to the clever way interrupts ("exceptions") are handled (see chapter 2.3.7 in the Programming Manual):

  • registers r0-r3 are automatically pushed to the stack along with flags when an interrupt occurs, while remaining registers are already expected to be preserved by the compiler even in regular functions;

  • unlike other architectures no special "return from interrupt" instruction is needed, because the return address in LR register is set to a special value that any return from function (bx lr) will be recognized as a return from an interrupt restoring r0-r3, etc.

So we need to add a pointer to our handler into the interrupt vector table at the start of our code (see table 63 in the Reference Manual again):

.text : {
    ...
	LONG(_start);

    /* We don't use any interrupt vectors before SysTick, so let's just fill.  */
    FILL(0); . = ADDR(.text) + 0x003C;

    LONG(DEFINED(SysTick_Vector) ? SysTick_Vector : 0xDEAD);

	/* Other vectors follow, but since we are not using them we can just start our code earlier. */
    ...

Note the use of DEFINED: it allows correct linking even when the target program does not need the SysTick timer. (By the way, the use of 0xDEAD for undefined handlers is temporary here, a central "panic" handler halting the MCU would be a better option eventually.)

build.zig

I've been using a simple shell script to build and flash the first 2 examples:

#!/bin/sh -e
zig build-exe \
	-target arm-freestanding-none \
	-mcpu cortex_m23 \
	-femit-asm \
	-O ReleaseSmall \
	--script bluepill.ld \
	main.zig
zig objcopy -O hex main main.hex
rm main main.o
st-flash --reset --format ihex write main.hex 

However in this one we want to be able to pull our helpers from a "module" in ./lib. This still could be described in a shell script of course, but I also was curious about Zig's build system, so I've added build.zig.

Now the example can be compiled with zig build or flashed with zig build flash.


About

Bare Metal Zig on STM32

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Zig 98.7%
  • Shell 1.3%