How to optimize a Firmware for minimal Size

From SEGGER Wiki
Jump to: navigation, search

Toolchains offer various options to reduce firmware size and require less Flash memory. There are additional tweaks which can be applied in code to remove unused functionality or pull in less overhead from libraries. This article describes most effective steps.

Compiler

The compiler is responsible to translate high-level C and C++ code into assembly instructions. It plays the major role for firmware size.

The SEGGER Compiler has been developed to produce optimized code for “bare-metal” firmware on microcontrollers. Its code generation can be configured to fit different use cases and hardware specifications.

Build for Release

The size requirements of a firmware should always be measured with a release build. When debugging is enabled, most compilers generate extra instructions which help debuggers to track symbols. In a release build this is not necessary.

Embedded Studio

Project Option Code -> Code Generation -> Debugging Level = “None”

Command Line

Remove -g and -Og from the command line.

Optimize for Size

The compiler can apply various optimizations to tweak code generation to favor faster or smaller code. If size matters most, choose high optimization for size.

Embedded Studio

Project Option Code -> Code Generation -> Optimization Level = “Level 2 for size”

Command Line

Add -Oz to the command line. Remove any other -O*

Use minimum Enum Size

Enumerations do not need to be stored in integers. If the range of values fits into a smaller data type, the compiler can use this instead.

If the numerical value of an enum type does not matter to the application, do not set it to let the compiler choose the best values.

Embedded Studio

Project Option Code -> Code Generation -> Enumeration Size = “Minimal Container Size”

Command Line

Add -fshort-enums to the command line.

Disable Exceptions

To handle exceptions additional code is attached to each function to unwind its call stack. This code may be part of the firmware image even when there is no try catch block.

If exceptions are and shall not be used, the feature can be disabled in code generation.

Embedded Sutdio

Project Option Code -> Code Generation -> Enable Exception Support = “No”

Command Line

Remove -fcxx-exceptions -fexceptions from the command line.

Standard Libraries

The C and C++ Standard Libraries provide a set of most commonly used functions and features for almost every application. These features include string and memory functions, arithmetic functions, standard container implementation, and low-level routines.

The SEGGER Runtime Library has been developed with the features and limitations of small microcontrollers in mind. It is available in configurations to implement functions with least code, as fast as possible, or a “middle-ground” of both.

Use the small Library

The small library configuration of the SEGGER Runtime Library implements most functions with the smallest algorithm. For example memcpy() is implemented with just X instructions. While this may affect application performance if such functions are frequently called, it can drastically reduce firmware size.

Embedded Studio

Project Option Code -> Library -> Library Optimization = “Small”

Command Line

Link against libc_*_small.a, SEGGER_crtinit_*_small.a, prinops_*_small.a, heapops_*_small.a, strops_*_small.a.

Select a matching Heap Implementation

If an application uses dynamic memory allocation it will usually use standard heap functions such as malloc() and free(). With C++ and constructing objects with new, the heap is used, too.

A firmware may have different requirements to the heap. When a lot of objects are allocated and deallocated during runtime, a constant-time allocator might be useful. Otherwise a simple block allocator can provide sufficient performance with less code. When the heap is used only to allocate global objects (at startup) which are never freed, an alloc-only heap implementation can be used to save more space.

Embedded Studio

Project Option Code -> Library -> Library Heap = “Minimal”

Command Line

Link against heapops_basic_*.a or heapops_minimal_*.a.

Remove unused printf Formatters

When printf(), sprintf() and other functions from the printf group are used, code to parse format strings and print arguments are included. A fully C-standard compatible implementation is quite large, even when only a few formatter features are used.

The SEGGER Runtime Library provides options to limit the formatter support to what is actually required by an application. It can enable or disable printing floating point numbers and 64-bit numbers, as well as turn support for width and precision specifiers on and off. If the requirements for format strings in an application are known, the formatter can be configured accordingly to reduce unused code.

Embedded Sutdio

Project Option Code -> Printf/Scanf -> Printf Floating Point Supported = “No”, Code -> Printf/Scanf -> Printf Integer Support = “Int”, Code -> Printf/Scanf -> Printf Width/Precision Supported = “No”

Command Line

Add *--defsym=__SEGGER_RTL_vfprintf=__SEGER_RTL_vfprintf_int_nwp* to linker command line.

Remove printf Formatting at all

If printf group functions are used for debugging output only, formatting can be removed from the application and delegated to the debugger instead.

With Embedded Studio this is available with Semihosting and Host Formatting.

Embedded Studio

Project Option Code -> Library -> Library I/O = “SEMIHOST (host-formatted)”

Disable locales

The C standard defines locales to enable internationalization of text and specifies which functions should use locales. These are seemingly simple functions, such as toupper(), which then generate extra code.

Locales are usually not required in embedded systems. In that case support for locales can be removed from the library and functions can be implemented with simple code treating only ASCII.

Do not use Exceptions

The C++ Standard Library implements some functions to raise an exception in case of an error. If exceptions shall not be use in general, a C++ Library without exceptions can be used instead.

Note: Without exceptions a runtime error in some functions cannot be caught when there is no error return value.

Embedded Studio

Project Option Code -> Libraries -> C++ Library = “Yes (No Exceptions)”

Command Line

Link against lib*_noexcept*.a

Linker

The linker get all code and data sections from input object files and libraries and links them together to produce the firmware image. It “decides” which sections are used in an application and how and where to place them in memory. These decisions affect firmware size.

The SEGGER Linker is designed from ground up for embedded systems and small microcontrollers. It efficiently places sections with least space requirements and can transform and merge sections to reduce size even further.

Enable Compression

Initialized static and global variables need the initialization values to be stored in non-volatile memory, which are copied to RAM by the runtime initialization code.

The SEGGER Linker can store initialization values in compressed form and decompress in the runtime initialization. It supports different compression algorithms and can automatically choose the best one for every kind of data.

Linker Script

Enable packing for copy sections: initialize by copy with packing=auto { section .data, section .data.*, section .fast, section .fast.* };

Do not initialize unused Blocks

The SEGGER Linker generates runtime initialization code based on the actual data of a firmware. It eliminates generic initialization of blocks, which usually is present even when the block size is 0.

Additional initialization can also be conditionally included based on information from the link step. For example a heap block only needs to be initialized if the heap is actually used.

Linker Script

Initialize heap block only if any heap function is used:

#define USES_ALLOC_FUNC                                   \
  linked symbol malloc || linked symbol aligned_alloc ||  \
  linked symbol calloc || linked symbol realloc

initialize by calling __SEGGER_init_heap if USES_ALLOC_FUNC { block heap };  // Init the heap if one is required

Deduplicate Code and Data

Sometimes the same functionality is implemented multiple times in an application.

The SEGGER Linker can find read-only code sections and read-only data sections that are identical and discard duplicates.

Embedded Studio

Project Option Code -> Linker -> Deduplicate Code Sections = “Yes” and Code -> Linker -> Deduplicate Data Sections = “Yes”

Command Line

Add --dedupe-code and --dedupe-data to linker command line.

Merge String Constants

Every constand string is stored in a firmware and takes up space.

The SEGGER Linker can deduplicate string constants and merge strings constant where one string is a substring at the end of another string.

Embedded Studio

Project Option Code -> Linker -> Merge String Constants = “Yes”

Code

While a toolchain can do its best to optimize the input it is given, there is usually potential for optimization in the source code itself, too.

This applies to how user code is written, but also to third-party code, which may be kept generic to fit all use cases and produces more code instead. In the context of the toolchain such an example is startup code and system initialization, which can be tweaked and minimized to the actual use case in a firmware.

Use minimal Startup Code

A Reset_Handler needs to call the system initialization (SystemInit()) and the runtime initialization (usually _start).

It may contain additional code to:

  • Add debug support
  • Initialize external memory
  • Copy vectors and configure vector table register
  • Enable the floating-point unit

If these parts are not required by the firmware, they can be removed from Reset_Handler

Example Code
// Minimal Reset Handler
Reset_Handler:
        bl      SystemInit
        bl      _start

Use minimal Runtime Initialization

The runtime initialization usually initializes data sections in RAM, initializes a heap, calls static constructors, and finally calls main(). How the initialization code is written or constructed depends on the toolchain.

Only sections which are used by a firmware should be initialized, while the initialization code may favor code size over performance.

With the SEGGER Toolchain the runtime initialization is as small as possible and the SEGGER Linker dynamically adds initialization code to it.

Merge unused Interrupt Handlers

Arm Cortex-M targets and other microcontrollers with a vector interrupt controller require a table with function pointers for each exception or interrupt handler.

Usually the vector table for a device includes interrupt handlers for all supported interrupts and a dummy handler is implemented for each interrupt, to be overwritten by a user if the interrupt is to be used.

A handler for each interrupt can help while debugging unexpected interrupts. For a size-optimized build one handler can be used for all unused interrupts.

Example Code
        .section .init.Dummy_Handler, "ax"
        .thumb_func
        .weak Dummy_Handler
        .balign 2
Dummy_Handler:
        1: b 1b   // Endless loop

        .section .vectors, "ax"
        .code 16
        .balign 512
        .global _vectors
_vectors:
        //
        // Internal exceptions and interrupts
        //
        __stack_end__
        Reset_Handler
        [...]
        SysTick_Handler
        //
        // External interrupts
        //
#ifndef __NO_EXTERNAL_INTERRUPTS
        Dummy_Handler
        Dummy_Handler
        Dummy_Handler
        Dummy_Handler
        Dummy_Handler
        [...]

Trim interrupt table

The vector table of a device usually includes all possible interrupt handlers, even if some peripherals are not used by the firmware.

Unused handlers after the last used one can be removed to minimize the size of the vector table. In the minimal case only initial stack pointer and reset handler are present.

Example Code
        .section .vectors, "ax"
        .code 16
        .balign 512
        .global _vectors
_vectors:
        //
        // Internal exceptions and interrupts
        //
        __stack_end__
        Reset_Handler
_vectors_end:

Minimize HardFault Handler

The HardFault Handler is especially useful while debugging, to catch any crash or fault in a firmware. The default implementation usually contains some code to analyze reasons for the fault and might automatically return.

In release firmware a fault may still happen. A HardFault Handler then should recover from the fault, e.g. reset the device, or endlessly hang.

Example Code
        .section .init.HardFault_Handler, "ax"
        .balign 2
        .thumb_func
HardFault_Handler:
        b .   // Endlessly hang (until reset by watchdog)

Further Options

Enable Link-Time Optimization

A compiler can optimize within a compilation unit, usually one source file.

Link-Time Optimization (LTO) can apply further optimization on the pre-compiled code across all compilation units before the application is linked. LTO can improve performance or reduce size according to the optimization level.

Embedded Studio

Project Option Code -> Code Generation -> Link Time Optimization = “Yes”

Use the Floating-Point Unit

Floating-point operations can be done in software by using only integer registers, or they can be done in hardware using a floating-point unit (FPU). When an FPU is available, such as on Cortex-M4F and Cortex-M7, its use should be preferred.

Additionally float function arguments can be passed in FPU registers instead of general-purpose registers, which saves the instructions to copy between these registers.

Embedded Studio

Project Options Code -> Code Generation -> ARM FPU Type to match the hardware and Code -> Code Generation -> ARM FP ABI Type = “Hard” or “SoftFP”

Use Hardware Features

Additional hardware features and peripherals, such as cryptography coprocessors do not only increase performance, but can reduce code size if the hardware-supported algorithm does not need to be implemented in the firmware.