Measuring the embOS Context Switch Time with Cortex-M and the DWT Cycle Counter
A common way to measure the execution time of code on microcontrollers is to toggle a GPIO and to read the output of the pin using an oscilloscope or logic analyzer like described in the embOS manual. However, when measuring short periods like the context switch time, which can be shorter than a microsecond, the output signal might not be simple to read with an oscilloscope and the hardware itself can also add some inaccuracy, e.g. when the GPIOs are driven with a frequency which is only a fraction of the processor's actual clock frequency.
Some Cortex-M devices have the optional Cycle Counter of the Data Watch and Trace unit (DWT) implemented. When implemented and enabled, this counter increments on each cycle of the processor clock. This can be used to receive a pretty accurate measurement for the embOS context switch time by avoiding any imprecision entailed by the hardware or reading of the signal by using the oscilloscope.
Requirements
The following application for measuring the embOS context switch time using the DWT Cycle Counter requires an Armv7[E]-M or Armv8-M Mainline device with implemented DWT Cycle Counter and a debug probe to read the results from the device's memory, e.g. via the watch view of any IDE's debug session. Furthermore, it is assumed that the Cortex-M SysTick is used as a hardware timer for the embOS system tick. If another hardware timer is used, the code should be modified to disable the hardware timer. Else, it will affect the maximum and average execution time of the context switch.
The Application
Simply let the application run with an active debug session on your device. If the DWT Cycle Counter is not implemented, the debug session will halt at line 74. The application repeats the measuring several times and records the minimal, maximal and average execution time of the context switch. Although the executed code for the context switch is always the same, the minimal and maximal values for the context switch time can differ. The more complex the processor is, the greater the margin. A Cortex-M7 with caches, branch prediction, a long pipeline and probably faster processor clock frequency than the maximum frequency at which memory can be accessed will result in a greater margin between those two values than with a Cortex-M4. Thus, the average execution time is also recorded to see whether the minimal or maximal value is more likely to occur.
After measuring the context switch time, the debug session will halt at line 143. Now, he results can be read from the device's memory by inspecting the variables Min
, Max
, Average
and Nanoseconds
.
1 /*********************************************************************
2 * (c) SEGGER Microcontroller GmbH *
3 * The Embedded Experts *
4 * www.segger.com *
5 **********************************************************************
6
7 -------------------------- END-OF-HEADER -----------------------------
8 Purpose : embOS sample program that measures the embOS context
9 switch time and stores the maximal, minimal, and average
10 context switch time (in cycles) in memory. It also saves
11 the minimal context switch time (in nanoseconds) in memory.
12 */
13
14 #include "RTOS.h"
15
16 /*********************************************************************
17 *
18 * Defines
19 *
20 **********************************************************************
21 */
22
23 #define NUM_SAMPLES (1024 * 16)
24
25 #define DWT_CTRL (*(volatile OS_U32*)(0xE0001000u))
26 #define DWT_CTRL_CYCCNTENA (1u)
27 #define DWT_CTRL_NOYCYCCNT (1u << 25)
28 #define DWT_CYCCNT (*(volatile OS_U32*)(0xE0001004u))
29
30 #define SYST_CSR (*(volatile OS_U32*)(0xE000E010u))
31
32 #define BREAK() __asm volatile ("bkpt #0")
33
34 /*********************************************************************
35 *
36 * Static data
37 *
38 **********************************************************************
39 */
40
41 static OS_STACKPTR int StackHP[128];
42 static OS_STACKPTR int StackLP[128];
43 static OS_TASK TCBHP;
44 static OS_TASK TCBLP;
45 static OS_U32 Time;
46
47 //
48 // Data to inspect in a watch view of an IDE
49 //
50 static volatile OS_U64 Nanoseconds;
51 static volatile OS_U32 Average = (OS_U32) 0;
52 static volatile OS_U32 Max = (OS_U32) 0;
53 static volatile OS_U32 Min = (OS_U32)-1;
54
55 /*********************************************************************
56 *
57 * Local functions
58 *
59 **********************************************************************
60 */
61
62 /*********************************************************************
63 *
64 * _Initialize()
65 */
66 inline static void _Initialize(void) {
67 OS_U32 Ctrl;
68
69 Ctrl = DWT_CTRL;
70 //
71 // Check if device has the DWT Cycle Counter implemented
72 //
73 if ((Ctrl & DWT_CTRL_NOYCYCCNT) != 0) {
74 BREAK(); // Device has no DWT Cycle Counter implemented
75 }
76 //
77 // Enable the DWT Cycle Counter if it is disabled
78 //
79 if ((Ctrl & DWT_CTRL_CYCCNTENA) == 0) {
80 DWT_CTRL |= DWT_CTRL_CYCCNTENA;
81 }
82 //
83 // Disable the SysTick, as it isn't required and could interfere
84 // the measuring of the context switch time
85 //
86 SYST_CSR = 0;
87 }
88
89 /*********************************************************************
90 *
91 * _GetCycles()
92 */
93 inline static OS_U32 _GetCycles(void) {
94 return DWT_CYCCNT;
95 }
96
97 /*********************************************************************
98 *
99 * HPTask()
100 */
101 static void HPTask(void) {
102 while (1) {
103 OS_TASK_Suspend(NULL); // Suspend high priority task
104 Time = _GetCycles() - Time; // Stop measurement
105 }
106 }
107
108 /*********************************************************************
109 *
110 * LPTask()
111 */
112 static void LPTask(void) {
113 OS_U32 MeasureOverhead;
114 OS_U32 SampleCount;
115
116 _Initialize();
117
118 SampleCount = 0;
119 while (1) {
120 //
121 // Measure overhead for time measurement so we can take this into account by subtracting it
122 // This is done inside the while()-loop to mitigate possible effects of an instruction cache
123 //
124 MeasureOverhead = _GetCycles();
125 MeasureOverhead = _GetCycles() - MeasureOverhead;
126 //
127 // Perform actual measurements
128 //
129 Time = _GetCycles(); // Start measurement
130 OS_TASK_Resume(&TCBHP); // Resume high priority task to force task switch
131 Time = Time - MeasureOverhead;
132 //
133 // Evaluate
134 //
135 if (Time < Min) Min = Time;
136 if (Time > Max) Max = Time;
137 SampleCount += 1;
138 Average += Time;
139 if (SampleCount >= NUM_SAMPLES) {
140 Average = Average / NUM_SAMPLES;
141 Nanoseconds = OS_TIME_ConvertCycles2ns(Min);
142 while (1) {
143 BREAK(); // Break automatically
144 }
145 }
146 }
147 }
148
149 /*********************************************************************
150 *
151 * Global functions
152 *
153 **********************************************************************
154 */
155
156 /*********************************************************************
157 *
158 * main()
159 */
160 int main(void) {
161 OS_Init(); // Initialize embOS
162 OS_InitHW(); // Initialize required hardware
163 OS_TASK_CREATE(&TCBHP, "HP Task", 100, HPTask, StackHP);
164 OS_TASK_CREATE(&TCBLP, "LP Task", 50, LPTask, StackLP);
165 OS_Start(); // Start embOS
166 return 0;
167 }
168
169 /*************************** End of file ****************************/