Synchronizing multiple instances of embOS on a multi-core processor

From SEGGER Wiki
Jump to: navigation, search

embOS can be utilized on multi-core processors by running separate embOS instances on individual cores. For synchronization purposes and in order to exchange data between the cores, embOS includes a comprehensive spinlock API which can be used to control access to shared memory and peripherals.

This article demonstrates the use of multiple embOS instances on individual cores and the synchronization of these cores by using the embOS spinlock API. The provided sample projects were made for an STM32MP157 with two Cortex-A7 cores using hardware spinlocks and for an LPC55S69 with two Cortex-M33 using software spinlocks.

What are Spinlocks?

Spinlocks are a synchronization mechanism which makes use of shared memory and architecture-specific instructions to acquire exclusive access to a resource. The lock is acquired by a processor's core if it is able to set the lock variable from zero to one without being interfered by another core. The architecture-specific instructions ensure that the read-modify-write was performed atomically by one core and was not interfered by another core in the processor, which also tried to acquire the lock. As long as a core fails to acquire the lock, e.g. because the lock is already set or because the read-modify-write wouldn't be atomic and thus the write access was aborted, it will spin actively in a loop without yielding the execution to another task on the same core until it acquires the lock successfully.

embOS provides an architecture-independent spinlock API that can be used to synchronize multiple cores. The architecture-specific synchronization mechanism for locking and unlocking the resource is already implemented by embOS, as long as the architecture supports spinlocks. However, the user might has to meet architecture-specific conditions for the spinlocks to work correctly. Arm devices, for example, require a global monitor which has to be implemented for the shared memory. Furthermore, the shared memory requires the shareable memory attribute. Else, the global monitor might not function properly and may not guarantee anymore that the read-modify-write is atomic. embOS also provides a software spinlock API which can be used for multi-core processors that don't meet all requirements for hardware spinlocks, but have shared memory that can be accessed and used by both cores to exchange information. The sample project for the LPC55S69 makes use of this software spinlock API.

The Sample Projects

The sample projects contain two Embedded Studio projects, one for each core. Both cores can be debugged simultaneously.

Project: STM32MP157

Embedded Studio debug sessions for core 0 (left) and core 1 (right)

This project uses the hardware spinlock embOS API.

Processor: ST STM32MP157
Board: ST STM32MP15X EV1
IDE: SEGGER Embedded Studio
embOS version: V5.16.1.1
Download: embOS_ARM_ES_Obj_SFL_V5.16.1.1_STM32MP1_Spinlocks.zip

Memory Layout

  • 0x2FFC0000 - 0x2FFDFFFF is used for the Cortex-A7 core 0
  • 0x2FFE0000 - 0x2FFFEFFF is used for the Cortex-A7 core 1
  • 0x2FFFF000 - 0x2FFFFFFF is used as shared memory

How to Debug

This project is a simple RAM configurations. That means, the applications will be downloaded by the J-Link into the device's internal SRAM. In order to debug the projects, open both projects with Embedded Studio and build them. Reset the STM32MP15X EV1 board by pressing the reset button on the board. Start the debug session of the core 0 and then the debug session of core 1. It should look similar to the screenshot with the two Embedded Studio instances on the right. Core 0 should continue execution in main() before core 1, because core 0 has to perform some initialization that is done before starting the embOS scheduler. When core 0 is running and already toggling the LED, the execution of core 1 can be continued. Now, both debug sessions can be halted and continued independently of each other.

Project: LPC55S69

This project uses the software spinlock embOS API, since the device does not meet all conditions required for hardware spinlocks.

Processor: NXP LPC55S69JBD100
Board: NXP LPCxpresso55S69
IDE: SEGGER Embedded Studio
embOS version: V5.16.0.0
Download: embOS_CortexM_ES_Obj_SFL_V5.16.0.0_LPC55S69_Spinlocks.zip

Memory Layout

  • Flash: 0x00000000 - 0x00050000 is used for the application and code by the Cortex-M33 core 0
  • Flash: 0x00050000 - 0x0009D800 is used for the application and code by the Cortex-M33 core 1
  • SRAM: 0x20000000 - 0x20020000 is used for data by the Cortex-M33 core 0
  • SRAM: 0x20020000 - 0x20040000 is used for data by the Cortex-M33 core 1
  • SRAM: 0x20040000 - 0x20050000 is used as shared memory

How to Debug

In order to debug the projects, open both projects with Embedded Studio and build them. Start the debug session of the core 0. It will download the elf files of both projects into the flash of the device. After the debug session of core 0 halts at main(), start the debug session in the project for core 1 by attaching the debugger (Target -> Attach Debugger) and set a breakpoint in main(). Now, if you let the core 0 run, or if you step over the MCMGR_StartCore() function in main(), core 1 will be started and the debug session for core 1 halts at the breakpoint.

The Sample Application

The sample application consists of two C-files, OS_SendString_Core0.c for core 0 and OS_SendString_Core1.c for core 1. Furthermore, a header file called BSP_SharedMemory.h, which is included by both C-files, provides the declaration of the data structure used to access the shared data.

BSP_SharedMemory.h

The shared data structure contains an OS_SPINLOCK object and a 256 byte buffer. The spinlock object is required to acquire exclusive access to the shared data while the buffer is used to hold data in the form of a string.

26 typedef struct {
27   OS_SPINLOCK Spinlock;     // Used to acquire exclusive access to the shared data
28   char        Buffer[256];  // Used by core 1 to store strings which are periodically read and printed by core 0
29 } shared_t;

The data structure is accessed by using a macro called SHARED_DATA. The macro simply uses the beginning of the shared memory for the data structure.

18 #define SHARED_DATA       (*(volatile shared_t*)SHARED_RAM_START)  // Used to access the shared data.

OS_SendString_Core0.c

Core 0 is responsible for initializing the shared memory and the spinlock. We call memset() on the shared memory to zero the memory and call OS_SPINLOCK_Create() to initialize the spinlock.

48 int main(void) {
49   BSP_Init();   // Initialize LED ports
50   OS_Init();    // Initialize embOS
51   OS_InitHW();  // Initialize required hardware
52   OS_TASK_CREATE(&TCBHP, "HP Task", 100, HPTask, StackHP);
53   OS_TASK_CREATE(&TCBLP, "LP Task",  10, LPTask, StackLP);
54   //
55   // Initialize shared memory and create the spinlock which is used by both cores.
56   // Core 0 must initialize the shared memory before core 1 continues execution in main().
57   //
58   memset((void*)SHARED_RAM_START, 0, SHARED_RAM_SIZE);
59   OS_SPINLOCK_Create(&SHARED_DATA.Spinlock);
60   OS_Start();   // Start embOS
61   return 0;
62 }

The application for core 0 runs an instance of embOS with two tasks: a high priority and a low priority task. The high priority task just toggles an LED every 50 ticks. The low priority task is used to read the shared buffer and print the string to stdout. To ensure that the buffer isn't read by core 0 while being written by core 1, both cores need to call OS_SPINLOCK_Lock() and OS_SPINLOCK_Unlock() to gain exclusive access to the shared buffer.

Keep in mind that tasks which are trying to acquire a spinlock but fail to do so won't yield the CPU, but are instead waiting actively for the lock to become free, blocking the execution of any other task that has a lower priority and is ready to run.

23 static void HPTask(void) {
24   while (1) {
25     BSP_ToggleLED(0);
26     OS_TASK_Delay(50);
27   }
28 }
29 
30 static void LPTask(void) {
31   OS_TIME Timestamp = 0;
32 
33   while (1) {
34     OS_SPINLOCK_Lock(&SHARED_DATA.Spinlock);
35     //
36     // Print the string that is currently contained in the shared memory.
37     //
38     fputs((const char*)SHARED_DATA.Buffer, stdout);
39     OS_SPINLOCK_Unlock(&SHARED_DATA.Spinlock);
40     OS_TASK_DelayUntil(Timestamp += 500);
41   }
42 }

OS_SendString_Core1.c

The application for core 1 runs embOS with just a single task which stores the execution time as a string in the shared buffer. After core 1 acquired the lock, it reads the current execution time and stores it as a string in the shared buffer. This is done within an endless loop.

20 static void Task(void) {
21   OS_I32 CurrentTime;
22   OS_I32 Minutes;
23   OS_I32 Seconds;
24   OS_I32 Milliseconds;
25 
26   while (1) {
27     OS_SPINLOCK_Lock(&SHARED_DATA.Spinlock);
28     CurrentTime  = (OS_I32)OS_Global.Time;
29     Minutes      = CurrentTime / 1000 / 60;
30     Seconds      = CurrentTime / 1000 % 60;
31     Milliseconds = CurrentTime % 1000;
32     memset((void*)SHARED_DATA.Buffer, 0, sizeof(SHARED_DATA.Buffer));
33     sprintf((char*)SHARED_DATA.Buffer, "Time: %d:%02d.%03d\n", Minutes, Seconds, Milliseconds);
34     OS_SPINLOCK_Unlock(&SHARED_DATA.Spinlock);
35   }
36 }

Sample Output

Now, if both cores are running, this is how the output will look like:

Time: 0:00.072
Time: 0:00.572
Time: 0:01.072
Time: 0:01.573
Time: 0:02.072
Time: 0:02.572
Time: 0:03.072
Time: 0:03.572
Time: 0:04.074
Time: 0:04.574
Time: 0:05.072
Time: 0:05.572
Time: 0:06.072
Time: 0:06.572
Time: 0:07.072
Time: 0:07.572
Time: 0:08.072
Time: 0:08.572

The LPTask of core 0 prints the execution time twice a second.

Concurrent Access

But what would happen if we didn't use a spinlock to synchronize the shared memory access of both cores?

The task on core 1 clears the shared buffer by calling memset(), which theoretically is redundant, but demonstrates nicely what can happen if both cores access the same data at the same time. Simply remove the OS_SPINLOCK_Lock() and OS_SPINLOCK_Unlock() calls in one or both applications and let the applications run again to test it yourself.

Sample Output

The output should look similar to this:

TimTime: 0:0Time: TimeTimeTimTime: TiTimeTime: 0:Time: 0Time: 0:

Here, the core 0 reads and prints the string while it wasn't fully written to the buffer, yet. As it probably never gets to print the whole string, it will never print the new line located at the end of the string and, thus, the output is a one-liner.