HowTo Measure CPU Performance
Introduction
This Wiki article shows how to easily measure the CPU performance by running an application that computes prime numbers. This performance test runs with and without an RTOS. Please find below both applications. The application computes in a loop the first 1000 prime numbers using the Sieve of Eratosthenes. The output value is the number of times this could be performed within one second. Higher values indicate better performance. The last column in the table below shows the CPU performance per MHz. This makes it easier to compare devices running at different CPU frequencies.
CPU Performance values
The following values were measured with the according embOS port (for example with embOS V5.18.0.0 for Arm / Cortex-M and Embedded Studio) and embOS board support package (BSP).
Core | Device | embOS BSP | Executed from | Frequency [MHz] | Compiler | Optimization setting | Loops/sec | Loops/sec/MHz |
---|---|---|---|---|---|---|---|---|
Cortex-M4 | Silicon Labs EFM32GG11 | EFM32GG11_EFM32_GiantGecko_SK | Flash | 72 | SEGGER cc 16.0.3 | Level 3 | 4448 | 61.8 |
Cortex-M7 | NXP iMXRT1176 | iMXRT1176_M7_MIMXRT1170_EVK | external QSPI Flash | 996 | SEGGER cc 16.0.3 | Level 3 | 128239 | 128.8 |
Cortex-M33 | NXP iMXRT595S | iMXRT595S_MIMXRT595_EVK | Flash | 198 | SEGGER cc 16.0.3 | Level 3 | 16093 | 81.3 |
Cortex-M4 | NXP K66FN2M0xxx18 | K66FN2M0_emPower | Flash | 168 | SEGGER cc 15.2.5 | Level 3 | 10837 | 64.5 |
Cortex-M33 | NXP iMXRT685S | iMXRT685S_MIMXRT685_EVK | Flash | 300 | SEGGER cc 16.0.3 | Level 3 | 24741 | 82.5 |
Cortex-M4 | NXP LPC4357 | LPC4357_MCB4300 | Flash | 204 | SEGGER cc 16.0.3 | Level 3 | 13977 | 68.5 |
Cortex-M33 | NXP LPC55S69 | LPC55S69_LPCXpresso55S69 | Flash | 100 | SEGGER cc 16.0.3 | Level 3 | 7320 | 73.2 |
Cortex-M33 | Nordic Semi nRF5340 | nRF5340_nRF5340_DK | Flash | 64 | SEGGER cc 16.0.3 | Level 3 | 2410 | 37.7 |
Cortex-A9 | Renesas R7S72100 | R7S72100_RSK_RZA1H | RAM | 399 | SEGGER cc 16.0.3 | Level 3 | 57438 | 144.0 |
Cortex-M23 | Microchip SAML11E16 | SAML11E16_SAML11_XPlainedPro | Flash | 32 | SEGGER cc 16.0.3 | Level 3 | 1506 | 47.1 |
Cortex-M3 | ST STM32F103 | STM32F103_STM3210E_Eval | Flash | 72 | SEGGER cc 16.0.3 | Level 3 | 4265 | 59.2 |
Cortex-M4 | ST STM32F407 | STM32F407_STM3240G_Eval | Flash | 168 | SEGGER cc 16.0.3 | Level 3 | 11585 | 69.0 |
Cortex-M7 | ST STM32F756 | STM32F756_STM32756G_Eval | Flash | 200 | SEGGER cc 16.0.3 | Level 3 | 25980 | 129.9 |
Cortex-M7 | ST STM32F769 | STM32F769_STM32F769I_Eval | Flash | 200 | SEGGER cc 16.0.3 | Level 3 | 26321 | 131.6 |
Cortex-M7 | ST STM32F779 | STM32F779_STM32F779I_Eval | Flash | 200 | SEGGER cc 16.0.3 | Level 3 | 25981 | 129.9 |
Cortex-M7 | ST STM32H745 M7 | STM32H745_M7_STM32H745XI_Discovery | Flash | 480 | SEGGER cc 16.0.3 | Level 3 | 61757 | 128.7 |
Cortex-M7 | ST STM32H753 | STM32H753_STM32H753I_Eval | Flash | 400 | SEGGER cc 16.0.3 | Level 3 | 51391 | 128.5 |
Cortex-M7 | ST STM32H7B3 | STM32H7B3_STM32H7B3I_Nucleo | Flash | 280 | SEGGER cc 16.0.3 | Level 3 | 35962 | 128.4 |
Cortex-M0+ | ST STM32L073 | STM32L073_STM32L073RZ_Nucleo | Flash | 32 | SEGGER cc 16.0.3 | Level 3 | 1342 | 41.9 |
Cortex-M4 | ST STM32L4R9 | STM32L4R9_STM32L4R9I_Discovery | Flash | 120 | SEGGER cc 16.0.3 | Level 3 | 8338 | 69.5 |
Cortex-M33 | ST STM32L552 | STM32L552_STM32L552ZE_Nucleo | Flash | 110 | SEGGER cc 16.0.3 | Level 3 | 3835 | 34.9 |
Cortex-M33 | ST STM32L562 | STM32L562_STM32L562E_DK | Flash | 110 | SEGGER cc 16.0.3 | Level 3 | 3921 | 35.6 |
Cortex-M33 | ST STM32U575 | STM32U575_STM32U575I_EV | Flash | 160 | SEGGER cc 16.0.3 | Level 3 | 8212 | 51.3 |
RX | Renesas RX610 | RX610_RSKRX610 | Flash | 100 | IAR EWRX V2.10A | Level 3 | 7520 | 75.2 |
RTOS
Every embOS board support package comes with the sample application OS_MeasureCPU_Performance.c that checks the performance of the entire system, outputting one result value per second. Please run the sample application with full compiler optimization and OS_LIBMODE_XR embOS library.
#include "RTOS.h"
#include <stdio.h>
#include <stdlib.h>
/*********************************************************************
*
* Static data
*
**********************************************************************
*/
static OS_STACKPTR int StackHP[128]; // Task stack
static OS_TASK TCBHP; // Task control block
static char aIsPrime[1000];
static unsigned int NumPrimes;
/*********************************************************************
*
* Local functions
*
**********************************************************************
*/
/*********************************************************************
*
* _CalcPrimes()
*/
static void _CalcPrimes(unsigned int NumItems) {
unsigned int i;
unsigned int j;
//
// Mark all as potential prime numbers
//
memset(aIsPrime, 1, NumItems);
//
// 2 deserves a special treatment
//
for (i = 4; i < NumItems; i += 2) {
aIsPrime[i] = 0; // Cross it out: not a prime
}
//
// Cross out multiples of every prime starting at 3. Crossing out starts at i^2.
//
for (i = 3; i * i < NumItems; i++) {
if (aIsPrime[i]) {
j = i * i; // The square of this prime is the first we need to cross out
do {
aIsPrime[j] = 0; // Cross it out: not a prime
j += 2 * i; // Skip even multiples (only 3*, 5*, 7* etc)
} while (j < NumItems);
}
}
//
// Count prime numbers
//
NumPrimes = 0;
for (i = 2; i < NumItems; i++) {
if (aIsPrime[i]) {
NumPrimes++;
}
}
}
/*********************************************************************
*
* _PrintDec()
*/
static void _PrintDec(unsigned int v) {
unsigned int Digit;
unsigned int r;
Digit = 10;
while (Digit < v) {
Digit *= 10;
}
do {
Digit /= 10;
r = v / Digit;
v -= r * Digit;
putchar(r + '0');
} while (v | (Digit > 1));
}
/*********************************************************************
*
* _PrintResult()
*/
static void _PrintResult(unsigned int Cnt) {
if (NumPrimes != 168) {
puts("Error");
} else {
puts("Loops/sec:");
_PrintDec(Cnt);
}
puts("\n");
}
/*********************************************************************
*
* HPTask()
*/
static void HPTask(void) {
unsigned int Cnt;
OS_U64 tEnd;
OS_U64 CyclesPerSecond;
CyclesPerSecond = OS_TIME_Convertms2Cycles(1000u);
while(1) {
Cnt = 0;
OS_TASK_Delay(1); // Sync to tick
tEnd = OS_TIME_Get_Cycles() + CyclesPerSecond;
while (tEnd >= OS_TIME_Get_Cycles()) {
_CalcPrimes(sizeof(aIsPrime));
Cnt++;
}
_PrintResult(Cnt);
}
}
/*********************************************************************
*
* Global functions
*
**********************************************************************
*/
/*********************************************************************
*
* main()
*/
int main(void) {
OS_Init(); // Initialize embOS
OS_InitHW(); // Initialize required hardware
OS_TASK_CREATE(&TCBHP, "HP Task", 100, HPTask, StackHP);
OS_Start(); // Start embOS
return 0;
}
Bare Metal
The routine _GetTimems() is just a placeholder and needs to be adjusted according to your hardware. _GetTimems() must return the number of milliseconds since the system reset. For example, with Cortex-M the Data Watchpoint and Trace (DWT) cycle counter could be used. Please run the application with full compiler optimization.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/*********************************************************************
*
* Static data
*
**********************************************************************
*/
static char aIsPrime[1000];
static unsigned int NumPrimes;
/*********************************************************************
*
* Local functions
*
**********************************************************************
*/
/*********************************************************************
*
* _CalcPrimes()
*/
static void _CalcPrimes(unsigned int NumItems) {
unsigned int i;
unsigned int j;
//
// Mark all as potential prime numbers
//
memset(aIsPrime, 1, NumItems);
//
// 2 deserves a special treatment
//
for (i = 4; i < NumItems; i += 2) {
aIsPrime[i] = 0; // Cross it out: not a prime
}
//
// Cross out multiples of every prime starting at 3. Crossing out starts at i^2.
//
for (i = 3; i * i < NumItems; i++) {
if (aIsPrime[i]) {
j = i * i; // The square of this prime is the first we need to cross out
do {
aIsPrime[j] = 0; // Cross it out: not a prime
j += 2 * i; // Skip even multiples (only 3*, 5*, 7* etc)
} while (j < NumItems);
}
}
//
// Count prime numbers
//
NumPrimes = 0;
for (i = 2; i < NumItems; i++) {
if (aIsPrime[i]) {
NumPrimes++;
}
}
}
/*********************************************************************
*
* _PrintDec()
*/
static void _PrintDec(unsigned int v) {
unsigned int Digit;
unsigned int r;
Digit = 10;
while (Digit < v) {
Digit *= 10;
}
do {
Digit /= 10;
r = v / Digit;
v -= r * Digit;
putchar(r + '0');
} while (v | (Digit > 1));
}
/*********************************************************************
*
* _PrintResult()
*/
static void _PrintResult(unsigned int Cnt) {
if (NumPrimes != 168) {
puts("Error");
} else {
puts("Loops/sec:");
_PrintDec(Cnt);
}
puts("\n");
}
/*********************************************************************
*
* _GetTimems()
*
* Function description
* Returns the number of milliseconds since reset
*/
static unsigned int _GetTimems(void) {
return milliseconds; // Needs to be adjusted according to your hardware
}
/*********************************************************************
*
* Global functions
*
**********************************************************************
*/
/*********************************************************************
*
* main()
*/
int main(void) {
unsigned int Cnt;
unsigned int tEnd;
while(1) {
Cnt = 0;
tEnd = _GetTimems() + 1000;
while (tEnd >= _GetTimems()) {
_CalcPrimes(sizeof(aIsPrime));
Cnt++;
}
_PrintResult(Cnt);
}
return 0;
}