Difference between revisions of "Thread-Local Storage"

From SEGGER Wiki
Jump to: navigation, search
(Handling thread-local storage)
 
Line 36: Line 36:
 
The system needs to provide the information which matches the architecture-specific implementation and the memory layout.
 
The system needs to provide the information which matches the architecture-specific implementation and the memory layout.
   
  +
Some architectures have or define a register to be the "thread pointer".
In the simple case for Arm, the function __aeabi_read_tp is required.
 
  +
The thread pointer needs to change when there is a task or thread change and points to the data relevant to the current thread.
  +
  +
In the case for Arm, there is no dedicated thread pointer. Instead the function __aeabi_read_tp is used by the compiler and required to be implemented.
  +
 
When the memory layout has the sections .tbss and .tdata (in this order),
 
When the memory layout has the sections .tbss and .tdata (in this order),
 
__aeabi_read_tp can simply return the start address of .tbss - 8.
 
__aeabi_read_tp can simply return the start address of .tbss - 8.
Line 45: Line 49:
 
ldr R0, =__tbss_start__-8
 
ldr R0, =__tbss_start__-8
 
bx LR
 
bx LR
  +
  +
==== Why .tbss - 8? ====
  +
  +
__aeabi_read_tp is not directly intended to return a pointer to the thread-local data.
  +
Instead it shall return the "thread pointer", which points to a structure describing the data of the thread.
  +
  +
With dynamically loaded libraries, the structure content needs to be evaluated to get to the actual data of a variable.
  +
With statically linked and loaded applications, which is usually the case fore embedded firmware, the compiler can take a shortcut.
  +
  +
In the TLS structure the thread-local data is stored after the "task control block" at a known offset.
  +
The compiler knows the offset of a variable in the thread-local storage section and the fixed offset.
  +
It can therefore directly use the offset to the thread pointer to address a thread-local variable.
  +
  +
Since the "task control block" is not used by a firmware, this is a virtual construct.
  +
The bytes preceding the thread-local storage section do not have to be available anywhere in memory, only the theoretical number needs to be known to be used to get the thread pointer.
  +
  +
For Arm the known offset from thread pointer to start of data is: '''8'''.
   
 
== Linker configuration ==
 
== Linker configuration ==

Latest revision as of 15:21, 23 January 2023

Thread-Local Storage (TLS) enables the use of local and global variables to be unique to a thread. The most popular thread-local variable is `errno`.

Thread-local variables in the standard library

C standard libraries can support usage of thread-local storage.

Library objects that need thread-local storage when used in multiple tasks are for example:

  • error functions - errno, strerror.
  • locale functions - localeconv, setlocale.
  • time functions - asctime, localtime, gmtime, mktime.
  • multibyte functions - mbrlen, mbrtowc, mbsrtowc, mbtowc, wcrtomb, wcsrtomb, wctomb.
  • rand functions - rand, srand.
  • etc functions - atexit, strtok.
  • C++ exception engine.

Handling thread-local storage

When thread-local storage is used, it needs to be handled by the system or OS.

With multiple tasks or threads, the TLS blocks need to be initialized per thread and the OS is responsible to provide information about the currently active thread.

If there is only one thread, the system can treat thread-local variables like regular variables and share them across the whole system. The system only needs to provide information about where the "global thread-local block" is located, but does not need to explicitly initialize it.

embOS

embOS is prepared to support TLS, but does not enable it per default. This has the advantage of no additional overhead as long as TLS is not needed by the application. The embOS implementation of thread-local storage allows activation of TLS separately for every task. Only tasks that call functions using TLS need to activate it by calling an initialization function when the task is started.

No OS / Single Thread System

Even when there are no threads, the compiler will generate code to access thread-local variables like in a multi-tasking system. The system needs to provide the information which matches the architecture-specific implementation and the memory layout.

Some architectures have or define a register to be the "thread pointer". The thread pointer needs to change when there is a task or thread change and points to the data relevant to the current thread.

In the case for Arm, there is no dedicated thread pointer. Instead the function __aeabi_read_tp is used by the compiler and required to be implemented.

When the memory layout has the sections .tbss and .tdata (in this order), __aeabi_read_tp can simply return the start address of .tbss - 8.

 .section .text.__aeabi_read_tp, "ax", %progbits
 -type __aeabi_read_tp, function
 __aeabi_read_tp:
         ldr     R0, =__tbss_start__-8
         bx      LR

Why .tbss - 8?

__aeabi_read_tp is not directly intended to return a pointer to the thread-local data. Instead it shall return the "thread pointer", which points to a structure describing the data of the thread.

With dynamically loaded libraries, the structure content needs to be evaluated to get to the actual data of a variable. With statically linked and loaded applications, which is usually the case fore embedded firmware, the compiler can take a shortcut.

In the TLS structure the thread-local data is stored after the "task control block" at a known offset. The compiler knows the offset of a variable in the thread-local storage section and the fixed offset. It can therefore directly use the offset to the thread pointer to address a thread-local variable.

Since the "task control block" is not used by a firmware, this is a virtual construct. The bytes preceding the thread-local storage section do not have to be available anywhere in memory, only the theoretical number needs to be known to be used to get the thread pointer.

For Arm the known offset from thread pointer to start of data is: 8.

Linker configuration

Thread-local data and thread-local bss need to be placed in memory in a block and order which is known to the OS. The OS creates a copy of the block for each thread or task and on access of a thread-local variable points to the block copy belonging to the active thread.

SEGGER Linker

With the SEGGER Linker thread-local data and thread-local bss can be put into a block, which can then be placed regularly in RAM.

  define block tls with fixed order { block tbss, block tdata };
  
  place in RAM with auto order      { block tls, readwrite, zeroinit };

Linker warning "thread-local and non-thread-local sections cannot be mixed"

When the tls block is declared without a specific ordering, the SEGGER Linker uses auto order to reduce the loss due to alignment and may interfere with the ordering expected by the OS.

To resolve this warning, define the block which contains tbss and tdata with fixed order.

GNU Linker

With the GNU Linker thread-local data and thread-local bss need to be placed with fixed order and layout, too.

In Embedded Studio the section placement file can take care of this:

 <MemorySegment name="$(FLASH_NAME:FLASH);FLASH1">
   ...
   <ProgramSection alignment="4" load="Yes" runin=".data_run" name=".data" />
   <ProgramSection alignment="4" load="Yes" runin=".tdata_run" name=".tdata" />
   ...
 </MemorySegment>
 <MemorySegment name="$(RAM_NAME:RAM);SRAM;RAM1">
   ...
   <ProgramSection alignment="4" load="No" name=".data_run" />
   <ProgramSection alignment="4" load="No" name=".bss" />
   <ProgramSection alignment="4" load="No" name=".tbss" />
   <ProgramSection alignment="4" load="No" name=".tdata_run" />
   ...
 </MemorySegment>

Note: The default *placement.xml in Embedded Studio prior to version 6.30 created mixed TLS and non-TLS sections and might need to be updated in existing projects.

In a manually created linker script the ordering can look like this:

 __tbss_load_start__ = ALIGN(__bss_end__ , 4);
 .tbss ALIGN(__bss_end__ , 4) (NOLOAD) : AT(ALIGN(__bss_end__ , 4))
 {
   __tbss_start__ = .;
   *(.tbss .tbss.*)
 }
 __tbss_end__ = __tbss_start__ + SIZEOF(.tbss);
 __tbss_size__ = SIZEOF(.tbss);
 __tbss_load_end__ = __tbss_end__;

 __tdata_load_start__ = ALIGN(__data_load_start__ + SIZEOF(.data) , 4);
 .tdata ALIGN(__tbss_end__ , 4) : AT(ALIGN(__data_load_start__ + SIZEOF(.data) , 4))
 {
   __tdata_start__ = .;
   *(.tdata .tdata.*)
 }
 __tdata_end__ = __tdata_start__ + SIZEOF(.tdata);
 __tdata_size__ = SIZEOF(.tdata);
 __tdata_load_end__ = __tdata_load_start__ + SIZEOF(.tdata);

 .tdata_run ALIGN(__tbss_end__ , 4) (NOLOAD) :
 {
   __tdata_run_start__ = .;
 }
 __tdata_run_end__ = __tdata_run_start__ + SIZEOF(.tdata);
 __tdata_run_size__ = __tdata_run_end__ - __tdata_run_start__;
 __tdata_run_load_end__ = __tdata_run_end__;

Other section ordering might lead to the error message "TLS sections are not adjacent:"