Skip to content

Minimal binary size #4290

@diorcety

Description

@diorcety

I'm currently investigating the potential of Rust for MCU. The features provided by Embassy and Rust are really a good step forward better codebases, but there is a huge drawback: the binary sizes.

A simple blink program for a stm32f4 target in C is close to 1k bytes, the half being the vector table (512 bytes). Adding async behavior, for example using ProtoThread, will slightly increase this size.

The current default release binary (text size returned by arm-none-eabi-size) for stm32f4 blinky with embassy is 21260 bytes which is huge.

File  .text    Size            Crate Name
0.1%  18.5%  3.4KiB    embassy_stm32 embassy_stm32::_generated::<impl core::ops::arith::Mul<stm32_metapac::rcc::vals::Plln> for embassy_stm32::time::Hertz>::mul
0.0%   8.0%  1.5KiB    embassy_stm32 embassy_stm32::rcc::_version::init
0.0%   6.5%  1.2KiB              std core::str::count::do_count_chars
0.0%   6.0%  1.1KiB    embassy_stm32 embassy_stm32::rcc::_version::init_pll
0.0%   5.6%  1.0KiB    embassy_stm32 embassy_stm32::_generated::<impl core::ops::arith::Div<stm32_metapac::rcc::vals::Pllm> for embassy_stm32::time::Hertz>::div
0.0%   3.9%    744B              std compiler_builtins::mem::memcpy
0.0%   3.2%    602B              std core::fmt::Formatter::pad_integral
0.0%   2.5%    476B    embassy_stm32 embassy_stm32::rcc::bd::LsConfig::init
0.0%   2.5%    472B              std core::fmt::Formatter::pad
0.0%   2.3%    438B    embassy_stm32 embassy_stm32::init
0.0%   2.2%    408B        defmt_rtt <defmt_rtt::Logger as defmt::traits::Logger>::write
0.0%   2.0%    384B    embassy_stm32 TIM4
0.0%   1.9%    366B              std core::fmt::write
0.0%   1.9%    364B    embassy_stm32 embassy_stm32::dma::dma_bdma::<impl embassy_stm32::dma::AnyChannel>::on_irq
0.0%   1.9%    356B    embassy_stm32 <embassy_stm32::_generated::Clocks as defmt::traits::Format>::_format_data
0.0%   1.6%    310B embassy_executor embassy_executor::raw::TaskStorage<F>::poll
0.0%   1.6%    300B embassy_executor embassy_executor::raw::wake_task
0.0%   1.6%    300B embassy_executor embassy_executor::raw::waker::wake
0.0%   1.4%    274B        defmt_rtt <defmt_rtt::Logger as defmt::traits::Logger>::release
0.0%   1.4%    266B embassy_executor embassy_executor::raw::Executor::spawn
0.1%  23.3%  4.3KiB                  And 87 smaller methods. Use -n N to show more.

Which may not be an issue for big MCU (>= 256k) but it makes embassy not usable for some low flash MCU (64k).

First step we can remove all defmt logging and panic handler associated in release binary (diorcety@dcb1590). The binary size is divided by 2: 9112 bytes

File  .text   Size                    Crate Name
1.0%  50.3% 4.2KiB            embassy_stm32 embassy_stm32::rcc::_version::init_pll
0.4%  22.5% 1.9KiB         embassy_executor embassy_executor::raw::TaskStorage<F>::poll
0.1%   3.8%   328B            embassy_stm32 embassy_stm32::dma::dma_bdma::<impl embassy_stm32::dma::AnyChannel>::on_irq
0.1%   3.0%   258B embassy_time_queue_utils embassy_time_queue_utils::queue_integrated::Queue::next_expiration
0.0%   2.5%   212B             embassy_time <embassy_time::timer::Timer as core::future::future::Future>::poll
0.0%   2.4%   202B         embassy_executor embassy_executor::arch::thread::Executor::run
0.0%   1.9%   160B            embassy_stm32 embassy_stm32::time_driver::RtcDriver::set_alarm
0.0%   1.6%   138B            embassy_stm32 TIM4
0.0%   1.5%   128B         embassy_executor embassy_executor::raw::waker::wake
0.0%   1.4%   116B            embassy_stm32 embassy_stm32::rcc::RccInfo::enable_and_reset_with_cs
0.0%   1.2%   106B            embassy_stm32 embassy_stm32::exti::on_irq
0.0%   1.0%    84B            embassy_stm32 embassy_stm32::time_driver::RtcDriver::next_period
0.0%   0.7%    64B            embassy_stm32 embassy_stm32::gpio::Flex::set_as_output
0.0%   0.7%    60B             embassy_sync embassy_sync::waitqueue::atomic_waker::AtomicWaker::wake
0.0%   0.5%    44B            embassy_stm32 embassy_stm32::_generated::<impl core::ops::arith::Div<stm32_metapac::rcc::vals::Ppre> for embassy_stm32::time::Hertz>::div
0.0%   0.5%    40B              cortex_m_rt Reset
0.0%   0.3%    22B                   blinky blinky::__cortex_m_rt_main
0.0%   0.2%    16B         embassy_executor embassy_executor::raw::waker::clone
0.0%   0.2%    14B            embassy_stm32 DMA1_STREAM0
0.0%   0.2%    14B            embassy_stm32 DMA1_STREAM1
0.1%   3.5%   300B                          And 27 smaller methods. Use -n N to show more.

An huge part is this code is the RCC initialization. Compare to C blink for example (https://github.com/platformio/platform-ststm32/tree/develop/examples/stm32cube-ll-blink), which takes less that 100 bytes,
in embassy the RCC takes 4K ! Doing a dummy RCC initialization (diorcety@a2bc3c7) reduce the binary to 2728 bytes

File  .text   Size                    Crate Name
0.1%  16.4%   358B         embassy_executor embassy_executor::raw::TaskStorage<F>::poll
0.1%  15.0%   328B            embassy_stm32 embassy_stm32::dma::dma_bdma::<impl embassy_stm32::dma::AnyChannel>::on_irq
0.1%  11.8%   258B embassy_time_queue_utils embassy_time_queue_utils::queue_integrated::Queue::next_expiration
0.1%   9.3%   202B         embassy_executor embassy_executor::arch::thread::Executor::run
0.1%   7.1%   154B             embassy_time <embassy_time::timer::Timer as core::future::future::Future>::poll
0.0%   5.9%   128B         embassy_executor embassy_executor::raw::waker::wake
0.0%   4.9%   106B            embassy_stm32 embassy_stm32::exti::on_irq
0.0%   4.4%    96B                [Unknown] SysTick
0.0%   2.9%    64B            embassy_stm32 embassy_stm32::gpio::Flex::set_as_output
0.0%   2.8%    60B             embassy_sync embassy_sync::waitqueue::atomic_waker::AtomicWaker::wake
0.0%   1.8%    40B              cortex_m_rt Reset
0.0%   1.0%    22B                   blinky blinky::__cortex_m_rt_main
0.0%   0.7%    16B         embassy_executor embassy_executor::raw::waker::clone
0.0%   0.6%    14B            embassy_stm32 DMA1_STREAM0
0.0%   0.6%    14B            embassy_stm32 DMA1_STREAM1
0.0%   0.6%    14B            embassy_stm32 DMA1_STREAM2
0.0%   0.6%    14B            embassy_stm32 DMA1_STREAM3
0.0%   0.6%    14B            embassy_stm32 DMA1_STREAM4
0.0%   0.6%    14B            embassy_stm32 DMA1_STREAM5
0.0%   0.6%    14B            embassy_stm32 DMA1_STREAM6
0.1%  10.6%   230B                          And 22 smaller methods. Use -n N to show more.

Duration, Instant and other associated functions use u64 type. The usage of u64, in some cases, may be overkill and not very efficient on 32bit targets, we can replace the usage of u64 by u32(diorcety@002fefd): 2644 bytes

File  .text   Size                    Crate Name
0.1%  16.6%   348B         embassy_executor embassy_executor::raw::TaskStorage<F>::poll
0.1%  15.6%   328B            embassy_stm32 embassy_stm32::dma::dma_bdma::<impl embassy_stm32::dma::AnyChannel>::on_irq
0.1%   9.6%   202B         embassy_executor embassy_executor::arch::thread::Executor::run
0.1%   9.5%   200B embassy_time_queue_utils embassy_time_queue_utils::queue_integrated::Queue::next_expiration
0.1%   6.6%   138B             embassy_time <embassy_time::timer::Timer as core::future::future::Future>::poll
0.0%   6.1%   128B         embassy_executor embassy_executor::raw::waker::wake
0.0%   5.1%   106B            embassy_stm32 embassy_stm32::exti::on_irq
0.0%   4.5%    94B                [Unknown] SysTick
0.0%   3.1%    64B            embassy_stm32 embassy_stm32::gpio::Flex::set_as_output
0.0%   2.9%    60B             embassy_sync embassy_sync::waitqueue::atomic_waker::AtomicWaker::wake
0.0%   1.9%    40B              cortex_m_rt Reset
0.0%   1.0%    22B                   blinky blinky::__cortex_m_rt_main
0.0%   0.8%    16B         embassy_executor embassy_executor::raw::waker::clone
0.0%   0.7%    14B            embassy_stm32 DMA1_STREAM0
0.0%   0.7%    14B            embassy_stm32 DMA1_STREAM1
0.0%   0.7%    14B            embassy_stm32 DMA1_STREAM2
0.0%   0.7%    14B            embassy_stm32 DMA1_STREAM3
0.0%   0.7%    14B            embassy_stm32 DMA1_STREAM4
0.0%   0.7%    14B            embassy_stm32 DMA1_STREAM5
0.0%   0.7%    14B            embassy_stm32 DMA1_STREAM6
0.1%  11.0%   230B                          And 22 smaller methods. Use -n N to show more.
0.8% 100.0% 2.0KiB                          .text section size, the file size is 258.6KiB

EXTI and DMA IRQ are implemented even if they are not used at all(diorcety@52d0353): 1768 bytes

File  .text   Size                    Crate Name
0.1%  25.8%   348B         embassy_executor embassy_executor::raw::TaskStorage<F>::poll
0.1%  15.0%   202B         embassy_executor embassy_executor::arch::thread::Executor::run
0.1%  14.8%   200B embassy_time_queue_utils embassy_time_queue_utils::queue_integrated::Queue::next_expiration
0.1%  10.2%   138B             embassy_time <embassy_time::timer::Timer as core::future::future::Future>::poll
0.1%   9.5%   128B         embassy_executor embassy_executor::raw::waker::wake
0.0%   7.0%    94B                [Unknown] SysTick
0.0%   4.7%    64B            embassy_stm32 embassy_stm32::gpio::Flex::set_as_output
0.0%   3.0%    40B              cortex_m_rt Reset
0.0%   1.6%    22B                   blinky blinky::__cortex_m_rt_main
0.0%   1.2%    16B         embassy_executor embassy_executor::raw::waker::clone
0.0%   0.9%    12B         embassy_executor embassy_executor::raw::util::UninitCell<T>::write_in_place
0.0%   0.6%     8B                      std core::cell::panic_already_borrowed
0.0%   0.6%     8B                      std core::option::unwrap_failed
0.0%   0.6%     8B                      std core::panicking::panic_fmt
0.0%   0.6%     8B                [Unknown] main
0.0%   0.4%     6B              cortex_m_rt HardFault_
0.0%   0.4%     6B               panic_halt __rustc::rust_begin_unwind
0.0%   0.4%     6B         embassy_executor embassy_executor::raw::waker::drop
0.0%   0.4%     6B              cortex_m_rt DefaultPreInit
0.0%   0.4%     6B              cortex_m_rt DefaultHandler_
0.6% 100.0% 1.3KiB                          .text section size, the file size is 227.2KiB

A big remain part is the fmt functions called in panic! or assert! macro. Most of the MCU program in release mode don't have debug interface, all the fmt done in these macros are made for nothing.
Forcing some parameters in config.toml can be used to remove these formattings:

[unstable]
build-std = ["core", "panic_abort"]
build-std-features = ["panic_immediate_abort"]

Final size(diorcety@971b8a8): 1720 bytes

File  .text   Size                    Crate Name
0.1%  26.2%   340B         embassy_executor embassy_executor::raw::TaskStorage<F>::poll
0.1%  15.4%   200B embassy_time_queue_utils embassy_time_queue_utils::queue_integrated::Queue::next_expiration
0.1%  15.4%   200B         embassy_executor embassy_executor::arch::thread::Executor::run
0.1%  10.2%   132B             embassy_time <embassy_time::timer::Timer as core::future::future::Future>::poll
0.1%   9.8%   128B         embassy_executor embassy_executor::raw::waker::wake
0.0%   7.1%    92B                [Unknown] SysTick
0.0%   4.9%    64B            embassy_stm32 embassy_stm32::gpio::Flex::set_as_output
0.0%   3.1%    40B              cortex_m_rt Reset
0.0%   1.7%    22B                   blinky blinky::__cortex_m_rt_main
0.0%   1.2%    16B         embassy_executor embassy_executor::raw::waker::clone
0.0%   0.9%    12B         embassy_executor embassy_executor::raw::util::UninitCell<T>::write_in_place
0.0%   0.6%     8B                [Unknown] main
0.0%   0.5%     6B              cortex_m_rt HardFault_
0.0%   0.5%     6B         embassy_executor embassy_executor::raw::waker::drop
0.0%   0.5%     6B              cortex_m_rt DefaultPreInit
0.0%   0.5%     6B              cortex_m_rt DefaultHandler_
0.5% 100.0% 1.3KiB                          .text section size, the file size is 237.3KiB

The remaining overhead compare to a C blinky (about 700 bytes) is async/await mechanisms, which seem fair.

I'm not saying that my modifications here are corrects (notably the u32 for Durations/Ticks part or the dummy RCC, which are done in brainless mode), just to pinpoint that some design choices make almost impossible to use embassy on low end devices.

In order to resume, here the modifications that could be done in order to permit the usage of embassy on such devices:

  • Provide a way to reduce the size of init/rcc::init. Maybe using a compile time RCC initialization code generation? With the support of multiple profiles allowing to switch between them at runtime (as available with some MCU toolchain)?
  • Create another time_driver, only using native unsigned integer, if possible using systick available in almost (all?) cortex, reducing in the same time the resource usage.
  • Removed unused IRQs: Some "real world" application doesn't even require DMA or EXTI.
  • More aggressive optimization in release: maybe using a feature or by default

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions