Measuring the embOS Context Switch Time with Cortex-M and the DWT Cycle Counter

From SEGGER Wiki
Jump to: navigation, search

A common way to measure the execution time of code on microcontrollers is to toggle a GPIO and to read the output of the pin using an oscilloscope or logic analyzer like described in the embOS manual. However, when measuring short periods like the context switch time, which can be shorter than a microsecond, the output signal might not be simple to read with an oscilloscope and the hardware itself can also add some inaccuracy, e.g. when the GPIOs are driven with a frequency which is only a fraction of the processor's actual clock frequency.

Some Cortex-M devices have the optional Cycle Counter of the Data Watch and Trace unit (DWT) implemented. When implemented and enabled, this counter increments on each cycle of the processor clock. This can be used to receive a pretty accurate measurement for the embOS context switch time by avoiding any imprecision entailed by the hardware or reading of the signal by using the oscilloscope.

Requirements

The following application for measuring the embOS context switch time using the DWT Cycle Counter requires an Armv7[E]-M or Armv8-M Mainline device with implemented DWT Cycle Counter and a debug probe to read the results from the device's memory, e.g. via the watch view of any IDE's debug session. Furthermore, it is assumed that the Cortex-M SysTick is used as a hardware timer for the embOS system tick. If another hardware timer is used, the code should be modified to disable the hardware timer. Else, it will affect the maximum and average execution time of the context switch.

The Application

Simply let the application run with an active debug session on your device. If the DWT Cycle Counter is not implemented, the debug session will halt at line 74. The application repeats the measuring several times and records the minimal, maximal and average execution time of the context switch. Although the executed code for the context switch is always the same, the minimal and maximal values for the context switch time can differ. The more complex the processor is, the greater the margin. A Cortex-M7 with caches, branch prediction, a long pipeline and probably faster processor clock frequency than the maximum frequency at which memory can be accessed will result in a greater margin between those two values than with a Cortex-M4. Thus, the average execution time is also recorded to see whether the minimal or maximal value is more likely to occur.

After measuring the context switch time, the debug session will halt at line 143. Now, he results can be read from the device's memory by inspecting the variables Min, Max, Average and Nanoseconds.


  1 /*********************************************************************
  2 *                   (c) SEGGER Microcontroller GmbH                  *
  3 *                        The Embedded Experts                        *
  4 *                           www.segger.com                           *
  5 **********************************************************************
  6 
  7 -------------------------- END-OF-HEADER -----------------------------
  8 Purpose : embOS sample program that measures the embOS context
  9           switch time and stores the maximal, minimal, and average
 10           context switch time (in cycles) in memory. It also saves
 11           the minimal context switch time (in nanoseconds) in memory.
 12 */
 13 
 14 #include "RTOS.h"
 15 
 16 /*********************************************************************
 17 *
 18 *       Defines
 19 *
 20 **********************************************************************
 21 */
 22 
 23 #define NUM_SAMPLES         (1024 * 16)
 24 
 25 #define DWT_CTRL            (*(volatile OS_U32*)(0xE0001000u))
 26 #define DWT_CTRL_CYCCNTENA  (1u)
 27 #define DWT_CTRL_NOYCYCCNT  (1u << 25)
 28 #define DWT_CYCCNT          (*(volatile OS_U32*)(0xE0001004u))
 29 
 30 #define SYST_CSR            (*(volatile OS_U32*)(0xE000E010u))
 31 
 32 #define BREAK()             __asm volatile ("bkpt #0")
 33 
 34 /*********************************************************************
 35 *
 36 *       Static data
 37 *
 38 **********************************************************************
 39 */
 40 
 41 static OS_STACKPTR int StackHP[128];
 42 static OS_STACKPTR int StackLP[128];
 43 static OS_TASK         TCBHP;
 44 static OS_TASK         TCBLP;
 45 static OS_U32          Time;
 46 
 47 //
 48 // Data to inspect in a watch view of an IDE
 49 //
 50 static volatile OS_U64 Nanoseconds;
 51 static volatile OS_U32 Average = (OS_U32) 0;
 52 static volatile OS_U32 Max     = (OS_U32) 0;
 53 static volatile OS_U32 Min     = (OS_U32)-1;
 54 
 55 /*********************************************************************
 56 *
 57 *       Local functions
 58 *
 59 **********************************************************************
 60 */
 61 
 62 /*********************************************************************
 63 *
 64 *       _Initialize()
 65 */
 66 inline static void _Initialize(void) {
 67   OS_U32 Ctrl;
 68 
 69   Ctrl = DWT_CTRL;
 70   //
 71   // Check if device has the DWT Cycle Counter implemented
 72   //
 73   if ((Ctrl & DWT_CTRL_NOYCYCCNT) != 0) {
 74     BREAK();  // Device has no DWT Cycle Counter implemented
 75   }
 76   //
 77   // Enable the DWT Cycle Counter if it is disabled
 78   //
 79   if ((Ctrl & DWT_CTRL_CYCCNTENA) == 0) {
 80     DWT_CTRL |= DWT_CTRL_CYCCNTENA;
 81   }
 82   //
 83   // Disable the SysTick, as it isn't required and could interfere
 84   // the measuring of the context switch time
 85   //
 86   SYST_CSR = 0;
 87 }
 88 
 89 /*********************************************************************
 90 *
 91 *       _GetCycles()
 92 */
 93 inline static OS_U32 _GetCycles(void) {
 94   return DWT_CYCCNT;
 95 }
 96 
 97 /*********************************************************************
 98 *
 99 *       HPTask()
100 */
101 static void HPTask(void) {
102   while (1) {
103     OS_TASK_Suspend(NULL);       // Suspend high priority task
104     Time = _GetCycles() - Time;  // Stop measurement
105   }
106 }
107 
108 /*********************************************************************
109 *
110 *       LPTask()
111 */
112 static void LPTask(void) {
113   OS_U32 MeasureOverhead;
114   OS_U32 SampleCount;
115 
116   _Initialize();
117 
118   SampleCount = 0;
119   while (1) {
120     //
121     // Measure overhead for time measurement so we can take this into account by subtracting it
122     // This is done inside the while()-loop to mitigate possible effects of an instruction cache
123     //
124     MeasureOverhead = _GetCycles();
125     MeasureOverhead = _GetCycles() - MeasureOverhead;
126     //
127     // Perform actual measurements
128     //
129     Time = _GetCycles();     // Start measurement
130     OS_TASK_Resume(&TCBHP);  // Resume high priority task to force task switch
131     Time = Time - MeasureOverhead;
132     //
133     // Evaluate
134     //
135     if (Time < Min) Min = Time;
136     if (Time > Max) Max = Time;
137     SampleCount += 1;
138     Average     += Time;
139     if (SampleCount >= NUM_SAMPLES) {
140       Average     = Average / NUM_SAMPLES;
141       Nanoseconds = OS_TIME_ConvertCycles2ns(Min);
142       while (1) {
143         BREAK();  // Break automatically
144       }
145     }
146   }
147 }
148 
149 /*********************************************************************
150 *
151 *       Global functions
152 *
153 **********************************************************************
154 */
155 
156 /*********************************************************************
157 *
158 *       main()
159 */
160 int main(void) {
161   OS_Init();    // Initialize embOS
162   OS_InitHW();  // Initialize required hardware
163   OS_TASK_CREATE(&TCBHP, "HP Task", 100, HPTask, StackHP);
164   OS_TASK_CREATE(&TCBLP, "LP Task",  50, LPTask, StackLP);
165   OS_Start();   // Start embOS
166   return 0;
167 }
168 
169 /*************************** End of file ****************************/