Stack Overflow Prevention

From SEGGER Wiki
Jump to: navigation, search

Introduction

In order to detect stack overflows in a running embedded system the SEGGER Compiler is able to generate code to check for stack overflows in every function. This can be activated using the command line switch -mstack-overflow-check (cc1 interface). For secure systems, a stack overflow must be detected before any memory is destroyed by the overflowing stack, therefore a check is required on any change of the stack pointer and before any large stack growth.

Stack Overflow Prevention (STOP) with Embedded Studio

Embedded Studio makes the use of Stack Overflow Prevention as easy as possible.

To enable the feature, set the project option Code -> Code Generation -> Enable Stack Overflow Prevention to Yes.

If the error callback __SEGGER_STOP_X_OnError is not implemented in the project, it defaults to staying in an endless loop. An example implementation for error handling is included in SEGGER_STOP.c in $(StudioDir)/samples.

Compiler-generated Code

If stack overflow checking is activated in the compiler different code is the generated as follows:

  1. Functions that don't touch the stack are not changed.
  2. In functions that uses a local stack frame but don't use R3 as a function parameter: The set up of the stack frame (usually a sub sp, #size) is replaced by loading the required size into register R3, then calling the function __SEGGER_STOP_GROW_R3().
  3. In functions that uses a local stack frame and uses R3 as a function parameter: Same as 2., but using register R4 instead of R3 and calling the function __SEGGER_STOP_GROW_R4(). This means, that R4 must be pushed at function entry.
  4. In functions that don't use a local stack frame but need to save register on the stack, the function __SEGGER_STOP_GROW_0() (with no parameter) is called after pushing the registers.
  5. Functions that need dynamic stack allocation (for example if alloca() or variable sized arrays are used) will also call __SEGGER_STOP_GROW_R3(). Because this may happen in the middle of a function, the register allocator will be instructed to make sure, R3 can be used as argument.

The called functions then can check for a stack overflow using a stack limit that is stored in a global variable. These functions are called after registers to be saved are pushed on the stack. Therefore the stack limit must be calculated such that there is always space for:

  • All general purpose registers that may be pushed at function entry (R0 - R11, LR). There are some optimization that leads to pushing R0 - R3.
  • All callee saved floating point / vector registers (D8 - D15)
  • Registers save on interrupt entry (8 words)
  • 3 (spare) words for alignment and emergency spill splots

Stack checking code generation can be disabled for single functions with __attribute__((no_stack_overflow_check)), which may be required for internal RTOS functions.

Generated code examples

Original code:

ReadReqEP:
    push    {lr}
    sub     sp, #12
    and     r12, r0, #7
    ...

is changed to

ReadReqEP:
    push    {lr}
    movs    r3, #12
    bl      __SEGGER_STOP_GROW_R3
    and     r12, r0, #7
    ...

Original code:

 
SEGGER_CRC_Calc:                     
    push    {r4, r5, r6, r7, lr}       
    sub     sp, #64                    
    mov     lr, r2                    
    ...

is changed to

 
SEGGER_CRC_Calc:
    push    {r4, r5, r6, r7, lr}
    movs    r4, #64
    bl      __SEGGER_STOP_GROW_R4
    mov     lr, r2
    ...

Original code:

SEGGER_CRC_CalcBitByBit:
    push    {r4, r5, lr}
    cbz     r1, .LBB0_4
    ...

is changed to

SEGGER_CRC_CalcBitByBit:
    push    {r4, r5, lr}
    bl      __SEGGER_STOP_GROW_0
    cbz     r1, .LBB0_4
    ...

Stack Check and Error Handling

The following stack check functions must be implemented when Stack Overflow Prevention is enabled. In Embedded Studio, they are automatically added with the standard library.

  • __SEGGER_STOP_GROW_R3
  • __SEGGER_STOP_GROW_R4
  • __SEGGER_STOP_GROW_0

On stack overflow the stack check functions should jump to a user-provided error handling callback __SEGGER_STOP_X_OnError().

The stack check functions

The functions called by the compiler generated code must check the remaining stack size and must not return in case of a stack overflow. For efficiency the functions do not follow the standard calling convention. Therefore the functions must not modify any registers except R12 and the register containing the size argument (if any). The functions also must adjust the stack pointer before they return.

A sample implementation for a Cortex-M CPU with 2 stack pointers is like this:

START_FUNC __SEGGER_STOP_GROW_R3
        //
        // Check which stack pointer is currently used:
        // bit 1 of CONTROL register 0 -> MSP, 1 -> PSP
        //
        mrs     r12, CONTROL
        lsls    r12, #30
        //
        // Calculate the new stack pointer position
        //
        sub     r12, sp, r3
        //
        // Load corresponding stack limit
        //
        ite     mi
        ldrmi   r3, =__SEGGER_STOP_Limit_PSP
        ldrpl   r3, =__SEGGER_STOP_Limit_MSP
        ldr     r3, [r3]
        //
        // Compare with new stack value
        //
        cmp     r12, r3
        blt     L(MightError)
L(Done):
        //
        // Check passed.
        // Adjust stack and return
        //
        mov     sp, r12
        bx      lr
L(MightError):
        //
        // A stack limit value of 0 means: Check is disabled
        //
        cmp     r3, #0
        beq     L(Done)
        b       __SEGGER_STOP_X_OnError
END_FUNC __SEGGER_STOP_GROW_R3

START_FUNC __SEGGER_STOP_GROW_R4
        //
        // Check which stack pointer is currently used:
        // bit 1 of CONTROL register 0 -> MSP, 1 -> PSP
        //
        mrs     r12, CONTROL
        lsls    r12, #30
        //
        // Calculate the new stack pointer position
        //
        sub     r12, sp, r4
        //
        // Load corresponding stack limit
        //
        ite     mi
        ldrmi   r4, =__SEGGER_STOP_Limit_PSP
        ldrpl   r4, =__SEGGER_STOP_Limit_MSP
        ldr     r4, [r4]
        //
        // Compare with new stack value
        //
        cmp     r12, r4
        blt     L(MightError)
L(Done):
        //
        // Check passed.
        // Adjust stack and return
        //
        mov     sp, r12
        bx      lr
L(MightError):
        //
        // A stack limit value of 0 means: Check is disabled
        //
        cmp     r4, #0
        beq     L(Done)
        //
        // Store limit value in R3
        //
        mov     r3, r4
        b       __SEGGER_STOP_X_OnError
END_FUNC __SEGGER_STOP_GROW_R4

START_FUNC __SEGGER_STOP_GROW_0
        //
        // Check which stack pointer is currently used:
        // bit 1 of CONTROL register 0 -> MSP, 1 -> PSP
        //
        mrs     r12, CONTROL
        lsls    r12, #30
        //
        // Load corresponding stack limit
        //
        ite     mi
        ldrmi   r12, =__SEGGER_STOP_Limit_PSP
        ldrpl   r12, =__SEGGER_STOP_Limit_MSP
        ldr     r12, [r12]
        //
        // Compare with current stack value
        //
        cmp     sp, r12
        blt     L(MightError)
L(Done):
        //
        // Check passed. Return.
        //
        bx      lr
L(MightError):
        //
        // A stack limit value of 0 means: Check is disabled
        //
        cmp     r12, #0
        beq     L(Done)
        //
        // Store limit value in R3
        // Store overflowed SP in R12
        //
        mov     r3, r12
        mov     r12, sp
        b       __SEGGER_STOP_X_OnError
END_FUNC __SEGGER_STOP_GROW_0

The error handling callback

To make sure the error handling callback does not use the overflowing stack, it should be implemented in pure assembly and must not be compiled with stack overflow checks.

In the default implementation, __SEGGER_STOP_X_OnError is defined as

 __attribute__((naked, no_stack_overflow_check)) void __SEGGER_STOP_X_OnError(void);

It is tail-called by the stack check functions and does not follow the regular calling conventions. The stack limit value, new stack pointer value, and caller are passed in R3, R12, and LR.

The error handling callback might reset the overflowing stack to a safe value. If it does, it may call additional functions, such as to log the error and reset the system.

extern unsigned char __STACKSIZE__[];             // Linker-generated symbol for system stack size.
extern unsigned char __stack_start__[];           // Linker-generated symbol for lower bound of system stack.
extern unsigned char __stack_end__[];             // Linker-generated symbol for lower bound of system stack.

extern unsigned char __STACKSIZE_PROCESS__[];     // Linker-generated symbol for process stack size.
extern unsigned char __stack_process_start__[];   // Linker-generated symbol for lower bound of process stack.
extern unsigned char __stack_process_end__[];     // Linker-generated symbol for upper bound of process stack.

/*********************************************************************
*
*       _HandleStackError()
*
*  Function description
*   Example error handler for stack overflow prevention.
*   Log error and halt system.
*
*  Parameters
*   SP:        New (overflown) stack pointer value.
*   Limit:     Stack pointer limit value.
*   Caller:    Caller of the stack checking function.
*   UsedStack: < 0: Process Stack. >= 0: Main Stack.
*
*  Additional information
*   When __SEGGER_STOP_RESET_STACK is set, the error handler runs
*   on a freshly initialized stack and may call other functions.
*   When __SEGGER_STOP_RESET_STACK is not set,
*   no other function must be called.
*
*   The error handler must not return.
*   It should trigger a system reset or stay in an endless loop.
*
*   This function should not be compiled with stack overflow checks.
*/
__attribute__((used, no_stack_overflow_check)) static void _HandleStackError(unsigned int SP, unsigned int Limit, unsigned int Caller, int UsedStack) {
  //
  // Only call any function if the stack has been reset
  //
  fprintf(stderr, "SYSTEM ABORT: Stack Overflow Prevented at 0x%.8X.\n"
                  "              SP:      0x%.8X\n"
                  "              Limit:   0x%.8X\n",
                  Caller, SP, Limit);
  //
  // Enable interrupt if main stack did not overflow.
  //
  if (UsedStack < 0) {
    asm("cpsie i");
  }
  //
  // Stay in endless loop.
  // TODO: Reset the system instead.
  //
  for (;;) { ; }
}

/*********************************************************************
*
*       __SEGGER_STOP_X_OnError()
*
*  Function description
*   Callback called by stack check on stack overflow.
*   Reset stack pointer to safe value and call error handler.
*
*  Additional information
*   This callback is tail-called by the stack checking functions.
*   It assumes that:
*     - R3 contains the stack limit value.
*     - R12 contains the new stack pointer value.
*     - LR contains the caller of the stack checking function.
*     - The stack pointer has not been adjusted.
*     - __stack_end__ is the start of the main stack.
*     - __stack__process_end__ is the start of the process stack.
*
*   This function does not follow the regular calling convention.
*   This callback is implemented as naked to make sure the compiler
*   does not add a prologue which might use the stack.
*
*   This function must not be compiled with stack overflow checks.
*/
void __SEGGER_STOP_X_OnError(void) {
  asm(
      "cpsid i\n"                       // Disable interrupts
      "mov     r0, r12\n"               // Save overflowed SP
      "mov     r1, r3\n"                // Save SP limit
      "sub     r2, lr, #5\n"            // Save caller
      "mrs     r3, CONTROL\n"           // Get currently used stack
      "lsls    r3, #30\n"
      "ittee   pl\n"                    // Reset this stack
      "ldrpl   r12, =__stack_end__\n"
      "msrpl   msp, r12\n"
      "ldrmi   r12, =__stack_process_end__\n"
      "msrmi   psp, r12\n"
      "bl _HandleStackError\n"            // Call error handler
      "b .\n"                             // Stay here
      );
}

Stack Limits

The startup code must initialize at least initialize the main stack limit variable __SEGGER_STOP_Limit_MSP. In the default implementation the symbol is automatically initialized to its default value by the runtime init.

To adjust the value, such as to change the space reserved for saved registers, and to initialize __SEGGER_STOP_Limit_PSP, __SEGGER_STOP_X_InitLimits should be implemented and called.

With the SEGGER Linker, __SEGGER_STOP_X_InitLimits can be called automatically by the runtime init:

 initialize by calling __SEGGER_STOP_X_InitLimits    { section .data.stop.* };

With the GNU Linker, __SEGGER_STOP_X_InitLimits should be called as the first function in main:

int main(void) {
  int NumItems;
  
#if !defined (__SEGGER_LINKER)
  //
  // Optionally initialize stack limits if not done by runtime init.
  //
  __SEGGER_STOP_X_InitLimits();
#endif
  ...
}

Calling functions before runtime init

When the system calls functions before runtime init, which is default on Cortex-M with call to SystemInit in Reset_Handler, __SEGGER_STOP_Limit_MSP should be set to 0, to disable stack checks.

Reset_Handler:
        .extern __SEGGER_STOP_Limit_MSP
        //
        // Initialize main stack limit to 0 to disable stack checks before runtime init
        //
        movs    R0, #0
        ldr     R1, =__SEGGER_STOP_Limit_MSP
        str     R0, [R1]
        //
        // Call SystemInit
        //
        bl      SystemInit
        ...

Using an RTOS

When using an RTOS or different mechanism for multi tasking, the task switching routines must update the stack limit variable (usually __SEGGER_STOP_Limit_PSP) when switching stacks.

ChangeTask:
  ...
  ldr      r0, [r1, #0]   // OS.pCurTask
  ldr      r3, [r0, #8]   // OS.pCurTask->pStackBottom
  add      r3, #100       // Buffer before stack overflow
  ldr      r2, =__SEGGER_STOP_Limit_PSP
  str      r3, [r2]       // Update stack limit
  ...

It is recommended to enable stack check for all tasks. However, an RTOS might disable stack check for some tasks, by setting the limit variable to 0.

Examples

Stack overflows may be a problem in almost every system. The following examples show the behavior of Stack Overflow Prevention in common use cases.

An Embedded Studio solution to test and run the examples is available: STOP Examples

Stack overflow on main stack

In a simple system there is usually only one stack, the main stack, used for the main loop and for interrupts. When the stack would overflow, the complete system should be halted and/or reset.

In this example the stack requirements have been measured with known input for an algorithm, but the worst case call graph has not been identified. This may lead to stack overflows depending on the input.

A system which takes input from an external source should prevent stack overflows.

Source listing (main_Recursion.c)

The full source code is available in the example project.

#include "SEGGER_STOP.h"

static int Arr[] = { 
  ...
};

/*********************************************************************
*
*       QuickSort()
*
*  Function description
*   Implementation of QuickSort, using the last item as pivot point.
*/
void QuickSort(int Level, int* Arr, int Low, int High);
void QuickSort(int Level, int* Arr, int Low, int High) {
  int PivotPoint;
  int Left;
  int Right;
  
  if ((Low >= 0) && (High >= 0) && (Low < High)) {
    ...
    QuickSort(Level + 1, Arr, Low, PivotPoint - 1);
    QuickSort(Level + 1, Arr, PivotPoint + 1, High);
  }
}

/*********************************************************************
*
*       main()
*
*  Function description
*   Application entry point.
*/
int main(void) {
  int NumItems;
  
#if !defined (__SEGGER_LINKER)
  //
  // Optionally re-initialize stack limits if not done by runtime init.
  //
  __SEGGER_STOP_X_InitLimits();
#endif
  
  NumItems = (sizeof(Arr) / sizeof(Arr[0]));
  QuickSort(0, Arr, 0, NumItems - 1);
  QuickSort(0, Arr, 0, NumItems - 1);
}

Example output (main_Recursion.c)

    #### Sorting:
    32, 84, 36, 88, 90, 25, 78, 34, 61, 87, 53, 49, 65, 80, 90, 68, 
    28, 67, 79, 24,  7, 90, 27, 58, 73, 55, 54, 48, 88, 26, 33,  5, 
    93,  2, 63, 93, 21, 78, 70, 90,  1, 30, 54, 97, 46, 52, 18, 21, 
    47,  8, 92, 93, 72, 36, 22, 30, 89, 98, 28, 42, 22, 53, 69, 30, 
    73, 53, 64, 85, 76, 73, 85, 86, 66, 38, 10, 50, 13, 55, 80, 64, 
    78, 35, 13, 12, 59, 13, 15, 31,  9, 91, 38, 93,  7, 52, 90, 32, 
    26, 86, 32, 95, 28, 16, 30,  8, 12, 44, 99, 32, 47, 65, 52, 56, 
    78,  2, 69, 26,  8, 22, 24, 65, 11,  6, 32, 15, 62, 17, 15, 35
    Lvl  0: Slice: [  0:127] Pivot: 35
    Lvl  1: Slice: [  0: 51] Pivot: 15
    Lvl  2: Slice: [  0: 19] Pivot: 15
    ...
    Lvl  9: Slice: [ 44: 47] Pivot: 32
    Lvl 10: Slice: [ 44: 46] Pivot: 32
    Lvl 11: Slice: [ 44: 45] Pivot: 32
    Lvl  1: Slice: [ 53:127] Pivot: 72
    Lvl  2: Slice: [ 53: 91] Pivot: 62
    ...
    Lvl  7: Slice: [120:121] Pivot: 93
    Lvl  4: Slice: [125:127] Pivot: 97
    Lvl  5: Slice: [126:127] Pivot: 98
    Sorted.

    #### Sorting:
     1,  2,  2,  5,  6,  7,  7,  8,  8,  8,  9, 10, 11, 12, 12, 13, 
    13, 13, 15, 15, 15, 16, 17, 18, 21, 21, 22, 22, 22, 24, 24, 25, 
    26, 26, 26, 27, 28, 28, 28, 30, 30, 30, 30, 31, 32, 32, 32, 32, 
    32, 33, 34, 35, 35, 36, 36, 38, 38, 42, 44, 46, 47, 47, 48, 49, 
    50, 52, 52, 52, 53, 53, 53, 54, 54, 55, 55, 56, 58, 59, 61, 62, 
    63, 64, 64, 65, 65, 65, 66, 67, 68, 69, 69, 70, 72, 73, 73, 73, 
    76, 78, 78, 78, 78, 79, 80, 80, 84, 85, 85, 86, 86, 87, 88, 88, 
    89, 90, 90, 90, 90, 90, 91, 92, 93, 93, 93, 93, 95, 97, 98, 99
    Lvl  0: Slice: [  0:127] Pivot: 99
    Lvl  1: Slice: [  0:126] Pivot: 98
    Lvl  2: Slice: [  0:125] Pivot: 97
    ...
    Lvl 42: Slice: [  0: 85] Pivot: 65
    Lvl 43: Slice: [  0: 84] Pivot: 65
    Lvl 44: Slice: [  0: 83] Pivot: 65
    Lvl 45: Slice: [  0: 82] Pivot: 64
    Lvl 46: Slice: [  0: 81] Pivot: 64
    SYSTEM ABORT: Stack Overflow Prevented at 0x080008AE.
                  SP:      0x2001F840
                  Limit:   0x2001F858

Stack overflow on global process stack

A system might separate main application and interrupt service routines. The main application runs on the process stack. Interrupts run on the main stack. Stack overflows can happen on both stacks. Even though ISRs are usually kept simple, their estimated stack requirements might be exceeded.

When the process stack overflows, the system could continue running and serve interrupts only, or immediately reset. When the main stack overflows, the system should reset in all cases.

Source listing (main_PSP.c)

The full source code is available in the example project.

#include "SEGGER_STOP.h"

static unsigned int _Calc(unsigned char* p, int Pos) {
  int Result;

  ...
  return Result + _Calc(p, Pos -1);
}

static int _GetBits(const char* sContext, int NumBits) {
  unsigned char ac[] = {0xaa, 0x55, 0xaa, 0x55, 0xaa, 0x55, 0xaa, 0x55};
  
  return _Calc(ac, NumBits - 1);
}

/*********************************************************************
*
*       SysTick_Handler()
*
*  Function description
*   ISR for the SysTick.
*   
*/
void SysTick_Handler(void);  
void SysTick_Handler(void) {
  static int _Ticks;
  
  _GetBits("SYSTICK", _Ticks++ % 80);
}

/*********************************************************************
*
*       main_app()
*
*  Function description
*   Application entry point.
*   
*/
void main_app(void);
void main_app(void) {
  int i;
#if !defined (__SEGGER_LINKER)
  //
  // Initialize stack limits if not done by runtime init.
  // Required to set PSP limit.
  //
  __SEGGER_STOP_X_InitLimits();
#endif
  //
  // Enable SysTick timer interrupt
  //
  ...
  
  i = 0;
  for (;;) {
    _GetBits("MAIN", i++ % 80);
  }
}

/*********************************************************************
*
*       main()
*
*  Function description
*   Low-level application entry point.
*   Prepare the system to run on the PSP and start it.
*/
__attribute__((naked)) int main(void) {
  asm("push {r0, lr}\n"
      "ldr r0, =__stack_process_end__\n"  // Initialize PSP
      "msr psp, r0\n"
      "mrs r0, control\n"                 // Switch to PSP
      "orr r0, #0x2\n"
      "msr control, r0\n"
      "bl main_app\n"                     // Call application main
      "pop {r0, pc}\n"
      );
}

Example output (main_PSP.c)

    MAIN: ac has 0 of 0 bits set.
    MAIN: ac has 0 of 1 bits set.
    MAIN: ac has 1 of 2 bits set.
    MAIN: ac has 1 of 3 bits set.
    MAIN: ac has 2 of 4 bits set.
    SYSTICK: ac has 0 of 0 bits set.
    MAIN: ac has 2 of 5 bits set.
    MAIN: ac has 3 of 6 bits set.
    MAIN: ac has 3 of 7 bits set.
    MAIN: ac has 4 of 8 bits set.
    SYSTICK: ac has 0 of 1 bits set.
    MAIN: ac has 5 of 9 bits set.
    MAIN: ac has 5 of 10 bits set.
    MAIN: ac has 6 of 11 bits set.
    SYSTICK: ac has 1 of 2 bits se
    MAIN: ac has 7 of 13 bits set.
    MAIN: ac has 7 of 14 bits set.
    MAIN: ac has 8 of 15 bits set.
    SYSTICK: ac has 1 of 3 bits set.
    MAIN: ac has 8 of 16 bits set.
    MAIN: ac has 8 of 17 bits set.
    MAIN: ac has 9 of 18 bits set.
    SYSTICK: ac has 2 of 4 bits set.
    MAIN: ac has 9 of 19 bits set.
    MAIN: ac has 10 of 20 bits set.
    MAIN: ac has 10 of 21 bits set.
    SYSTICK: ac has 2 of 5 bits set.
    SYSTEM ABORT: Stack Overflow Prevented at 0x08000948.
                  SP:      0x2001F838
                  Limit:   0x2001F858
    SYSTICK: ac has 3 of 6 bits set.
    SYSTICK: ac has 3 of 7 bits set.
    SYSTICK: ac has 4 of 8 bits set.
    SYSTICK: ac has 5 of 9 bits set.
    SYSTICK: ac has 5 of 10 bits set.
    SYSTICK: ac has 6 of 11 bits set.
    SYSTICK: ac has 6 of 12 bits set.
    SYSTICK: ac has 7 of 13 bits set.
    SYSTICK: ac has 7 of 14 bits set.
    SYSTICK: ac has 8 of 15 bits set.
    SYSTICK: ac has 8 of 16 bits set.
    SYSTICK: ac has 8 of 17 bits set.
    SYSTICK: ac has 9 of 18 bits set.
    SYSTICK: ac has 9 of 19 bits set.
    SYSTICK: ac has 10 of 20 bits set.
    SYSTICK: ac has 10 of 21 bits set.
    SYSTEM ABORT: Stack Overflow Prevented at 0x08000948.
                  SP:      0x2001FC38
                  Limit:   0x2001FC58

Stack overflow in tasks

In a system using an RTOS or similar way of multi tasking, each task uses a separate stack. A stack overflow on one task stack can be handled to halt or delete only this task, while the remaining system keeps running.

The RTOS or task switching routine needs to make sure to switch the stack limit before running another task.

Source listing (main_SimpleOS.c)

The full source code is available in the example project.

static Task         _MainTask;
static unsigned int _MainStack[1024/4];

static Task         _BGTask;
static unsigned int _BGStack[384/4];

static unsigned int _Factorial(unsigned int F) {
  if (F == 1) {
    return 1;
  }
  return F * _Factorial(F -1);
}

static void _Main(void) {
  int i = 0;
  for (;;) {
    printf("Main Task running. (%d)\n", i++);
    _Delay(100);
  }
}

static void _BG(void) {
  int i = 0;
  for (;;) {
    printf("BG Task running. (%d)\n", i++);
    _Delay(20);
    _Factorial(i);
  }
}

/*********************************************************************
*
*       SysTick_Handler()
*
*  Function description
*   SysTick interrupt handler.
*   Increment system tick and perform task switch.
*/
__attribute__((naked)) void SysTick_Handler(void) {
  asm("nop\n"
      ...
      "ldr      r0, [r1, #0]\n"   // OS.pCurTask
      "ldr      r3, [r0, #8]\n"   // OS.pCurTask->pStackBottom
      "add      r3, #100\n"       // Buffer before stack overflow
      "ldr      r2, =__SEGGER_STOP_Limit_PSP\n"
      "str      r3, [r2]\n"       // Update stack limit
      ...
      "msr      psp, r3\n"        // Switch stack
      "bx       lr\n"             // Return to task
      );
}

/*********************************************************************
*
*       PendSV_Handler()
*
*  Function description
*   PendSV Interrupt handler used to initialize the tasking system.
*/
__attribute__((naked)) void PendSV_Handler(void) {
  asm("nop\n"
      ...
      "add      r1, r3, #4\n"     // OS.pFirstTask
      "ldr      r2, [r1]\n"
      "ldr      r0, [r2,#4]\n"    // OS.pFirstTask->pStack
      ...
      "ldr      r2, =__SEGGER_STOP_Limit_PSP\n"
      "str      r3, [r2]\n"       // Update stack limit
      ...
      "msr      psp, r0\n"        // Set task stack
      "bx       lr\n"             // Return from interrupt into task
  );
}

/*********************************************************************
*
*       OS_InitTask()
*
*  Function description
*   Initialize a task context and add it to the list.
*/
void OS_InitTask(Task* pTask, unsigned char* pStack, unsigned int StackSize, TaskFunc pFunc);
void OS_InitTask(Task* pTask, unsigned char* pStack, unsigned int StackSize, TaskFunc pFunc) {
  unsigned int* pFrame;
  //
  // Initialize context
  //
  pTask->pFunc        = pFunc;
  pTask->pStackBottom = pStack;
  pTask->pStackTop    = pStack + StackSize - 8;
  pTask->pStack       = pTask->pStackTop - (_INT_FRAME_SIZE + _PRESERVED_REGS_SIZE);
  //
  // Add task to list
  //
  ...
}

/*********************************************************************
*
*       main()
*
*  Function description
*   Application entry point.
*/
int main(void) {
  //
  // Create tasks
  //
  OS_InitTask(&_MainTask, (unsigned char*)_MainStack, sizeof(_MainStack), _Main);
  OS_InitTask(&_BGTask,   (unsigned char*)_BGStack,   sizeof(_BGStack),   _BG);
  //
  // Enable SysTick timer interrupt
  //
  ...
  //
  // Start the OS via PendSV
  //
  ...
  for (;;) { ; }
}

Example output (main_SimpleOS.c)

    Main Task running. (0)
    BG Task running. (0)
    BG Task running. (1)
    BG Task running. (2)
    BG Task running. (3)
    BG Task running. (4)
    Main Task running. (1)
    BG Task running. (5)
    BG Task running. (6)
    BG Task running. (7)
    BG Task running. (8)
    BG Task running. (9)
    Main Task running. (2)
    BG Task running. (10)
    Main Task running. (3)
    Main Task running. (4)
    Main Task running. (5)
    Main Task running. (6)
    ...