HowTo Measure CPU Performance

From SEGGER Wiki
Jump to: navigation, search

Introduction

This Wiki article shows how to easily measure the CPU performance by running an application that computes prime numbers. This performance test runs with and without an RTOS. Please find below both applications. The application computes in a loop the first 1000 prime numbers using the Sieve of Eratosthenes. The output value is the number of times this could be performed within one second. Higher values indicate better performance. The last column in the table below shows the CPU performance per MHz. This makes it easier to compare devices running at different CPU frequencies.

CPU Performance values

The following values were measured with the according embOS port (for example with embOS V5.18.0.0 for Arm / Cortex-M and Embedded Studio) and embOS board support package (BSP).

Core Device embOS BSP Executed from Frequency [MHz] Compiler Optimization setting Loops/sec Loops/sec/MHz
Cortex-M4 Silicon Labs EFM32GG11 EFM32GG11_EFM32_GiantGecko_SK Flash 72 SEGGER cc 16.0.3 Level 3 4448 61.8
Cortex-M7 NXP iMXRT1176 iMXRT1176_M7_MIMXRT1170_EVK external QSPI Flash 996 SEGGER cc 16.0.3 Level 3 128239 128.8
Cortex-M33 NXP iMXRT595S iMXRT595S_MIMXRT595_EVK Flash 198 SEGGER cc 16.0.3 Level 3 16093 81.3
Cortex-M4 NXP K66FN2M0xxx18 K66FN2M0_emPower Flash 168 SEGGER cc 15.2.5 Level 3 10837 64.5
Cortex-M33 NXP iMXRT685S iMXRT685S_MIMXRT685_EVK Flash 300 SEGGER cc 16.0.3 Level 3 24741 82.5
Cortex-M4 NXP LPC4357 LPC4357_MCB4300 Flash 204 SEGGER cc 16.0.3 Level 3 13977 68.5
Cortex-M33 NXP LPC55S69 LPC55S69_LPCXpresso55S69 Flash 100 SEGGER cc 16.0.3 Level 3 7320 73.2
Cortex-M33 Nordic Semi nRF5340 nRF5340_nRF5340_DK Flash 64 SEGGER cc 16.0.3 Level 3 2410 37.7
Cortex-A9 Renesas R7S72100 R7S72100_RSK_RZA1H RAM 399 SEGGER cc 16.0.3 Level 3 57438 144.0
Cortex-M23 Microchip SAML11E16 SAML11E16_SAML11_XPlainedPro Flash 32 SEGGER cc 16.0.3 Level 3 1506 47.1
Cortex-M3 ST STM32F103 STM32F103_STM3210E_Eval Flash 72 SEGGER cc 16.0.3 Level 3 4265 59.2
Cortex-M4 ST STM32F407 STM32F407_STM3240G_Eval Flash 168 SEGGER cc 16.0.3 Level 3 11585 69.0
Cortex-M7 ST STM32F756 STM32F756_STM32756G_Eval Flash 200 SEGGER cc 16.0.3 Level 3 25980 129.9
Cortex-M7 ST STM32F769 STM32F769_STM32F769I_Eval Flash 200 SEGGER cc 16.0.3 Level 3 26321 131.6
Cortex-M7 ST STM32F779 STM32F779_STM32F779I_Eval Flash 200 SEGGER cc 16.0.3 Level 3 25981 129.9
Cortex-M7 ST STM32H745 M7 STM32H745_M7_STM32H745XI_Discovery Flash 480 SEGGER cc 16.0.3 Level 3 61757 128.7
Cortex-M7 ST STM32H753 STM32H753_STM32H753I_Eval Flash 400 SEGGER cc 16.0.3 Level 3 51391 128.5
Cortex-M7 ST STM32H7B3 STM32H7B3_STM32H7B3I_Nucleo Flash 280 SEGGER cc 16.0.3 Level 3 35962 128.4
Cortex-M0+ ST STM32L073 STM32L073_STM32L073RZ_Nucleo Flash 32 SEGGER cc 16.0.3 Level 3 1342 41.9
Cortex-M4 ST STM32L4R9 STM32L4R9_STM32L4R9I_Discovery Flash 120 SEGGER cc 16.0.3 Level 3 8338 69.5
Cortex-M33 ST STM32L552 STM32L552_STM32L552ZE_Nucleo Flash 110 SEGGER cc 16.0.3 Level 3 3835 34.9
Cortex-M33 ST STM32L562 STM32L562_STM32L562E_DK Flash 110 SEGGER cc 16.0.3 Level 3 3921 35.6
Cortex-M33 ST STM32U575 STM32U575_STM32U575I_EV Flash 160 SEGGER cc 16.0.3 Level 3 8212 51.3
RX Renesas RX610 RX610_RSKRX610 Flash 100 IAR EWRX V2.10A Level 3 7520 75.2

RTOS

Every embOS board support package comes with the sample application OS_MeasureCPU_Performance.c that checks the performance of the entire system, outputting one result value per second. Please run the sample application with full compiler optimization and OS_LIBMODE_XR embOS library.

#include "RTOS.h"
#include <stdio.h>
#include <stdlib.h>

/*********************************************************************
*
*       Static data
*
**********************************************************************
*/
static OS_STACKPTR int StackHP[128];  // Task stack
static OS_TASK         TCBHP;         // Task control block
static char            aIsPrime[1000];
static unsigned int    NumPrimes;

/*********************************************************************
*
*       Local functions
*
**********************************************************************
*/

/*********************************************************************
*
*       _CalcPrimes()
*/
static void _CalcPrimes(unsigned int NumItems) {
  unsigned int i;
  unsigned int j;

  //
  // Mark all as potential prime numbers
  //
  memset(aIsPrime, 1, NumItems);
  //
  // 2 deserves a special treatment
  //
  for (i = 4; i < NumItems; i += 2) {
    aIsPrime[i] = 0;     // Cross it out: not a prime
  }
  //
  // Cross out multiples of every prime starting at 3. Crossing out starts at i^2.
  //
  for (i = 3; i * i < NumItems; i++) {
    if (aIsPrime[i]) {
      j = i * i;    // The square of this prime is the first we need to cross out
      do {
        aIsPrime[j] = 0;     // Cross it out: not a prime
        j += 2 * i;          // Skip even multiples (only 3*, 5*, 7* etc)
      } while (j < NumItems);
    }
  }
  //
  // Count prime numbers
  //
  NumPrimes = 0;
  for (i = 2; i < NumItems; i++) {
    if (aIsPrime[i]) {
      NumPrimes++;
    }
  }
}

/*********************************************************************
*
*       _PrintDec()
*/
static void _PrintDec(unsigned int v) {
  unsigned int Digit;
  unsigned int r;

  Digit = 10;
  while (Digit < v) {
    Digit *= 10;
  }
  do {
    Digit /= 10;
    r = v / Digit;
    v -= r * Digit;
    putchar(r + '0');
  } while (v | (Digit > 1));
}

/*********************************************************************
*
*       _PrintResult()
*/
static void _PrintResult(unsigned int Cnt) {
  if (NumPrimes != 168) {
    puts("Error");
  } else {
    puts("Loops/sec:");
    _PrintDec(Cnt);
  }
  puts("\n");
}

/*********************************************************************
*
*       HPTask()
*/
static void HPTask(void) {
  unsigned int Cnt;
  OS_U64       tEnd;
  OS_U64       CyclesPerSecond;

  CyclesPerSecond = OS_TIME_Convertms2Cycles(1000u);
  while(1) {
    Cnt = 0;
    OS_TASK_Delay(1);  // Sync to tick
    tEnd = OS_TIME_Get_Cycles() + CyclesPerSecond;
    while (tEnd >= OS_TIME_Get_Cycles()) {
      _CalcPrimes(sizeof(aIsPrime));
      Cnt++;
    }
    _PrintResult(Cnt);
  }
}

/*********************************************************************
*
*       Global functions
*
**********************************************************************
*/

/*********************************************************************
*
*       main()
*/
int main(void) {
  OS_Init();    // Initialize embOS
  OS_InitHW();  // Initialize required hardware
  OS_TASK_CREATE(&TCBHP, "HP Task", 100, HPTask, StackHP);
  OS_Start();   // Start embOS
  return 0;
}

Bare Metal

The routine _GetTimems() is just a placeholder and needs to be adjusted according to your hardware. _GetTimems() must return the number of milliseconds since the system reset. For example, with Cortex-M the Data Watchpoint and Trace (DWT) cycle counter could be used. Please run the application with full compiler optimization.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/*********************************************************************
*
*       Static data
*
**********************************************************************
*/
static char         aIsPrime[1000];
static unsigned int NumPrimes;

/*********************************************************************
*
*       Local functions
*
**********************************************************************
*/

/*********************************************************************
*
*       _CalcPrimes()
*/
static void _CalcPrimes(unsigned int NumItems) {
  unsigned int i;
  unsigned int j;

  //
  // Mark all as potential prime numbers
  //
  memset(aIsPrime, 1, NumItems);
  //
  // 2 deserves a special treatment
  //
  for (i = 4; i < NumItems; i += 2) {
    aIsPrime[i] = 0;     // Cross it out: not a prime
  }
  //
  // Cross out multiples of every prime starting at 3. Crossing out starts at i^2.
  //
  for (i = 3; i * i < NumItems; i++) {
    if (aIsPrime[i]) {
      j = i * i;    // The square of this prime is the first we need to cross out
      do {
        aIsPrime[j] = 0;     // Cross it out: not a prime
        j += 2 * i;          // Skip even multiples (only 3*, 5*, 7* etc)
      } while (j < NumItems);
    }
  }
  //
  // Count prime numbers
  //
  NumPrimes = 0;
  for (i = 2; i < NumItems; i++) {
    if (aIsPrime[i]) {
      NumPrimes++;
    }
  }
}

/*********************************************************************
*
*       _PrintDec()
*/
static void _PrintDec(unsigned int v) {
  unsigned int Digit;
  unsigned int r;

  Digit = 10;
  while (Digit < v) {
    Digit *= 10;
  }
  do {
    Digit /= 10;
    r = v / Digit;
    v -= r * Digit;
    putchar(r + '0');
  } while (v | (Digit > 1));
}

/*********************************************************************
*
*       _PrintResult()
*/
static void _PrintResult(unsigned int Cnt) {
  if (NumPrimes != 168) {
    puts("Error");
  } else {
    puts("Loops/sec:");
    _PrintDec(Cnt);
  }
  puts("\n");
}

/*********************************************************************
*
*       _GetTimems()
*
*  Function description
*    Returns the number of milliseconds since reset
*/
static unsigned int _GetTimems(void) {
  return milliseconds;  // Needs to be adjusted according to your hardware
}

/*********************************************************************
*
*       Global functions
*
**********************************************************************
*/

/*********************************************************************
*
*       main()
*/
int main(void) {
  unsigned int Cnt;
  unsigned int tEnd;

  while(1) {
    Cnt = 0;
    tEnd = _GetTimems() + 1000;
    while (tEnd >= _GetTimems()) {
      _CalcPrimes(sizeof(aIsPrime));
      Cnt++;
    }
    _PrintResult(Cnt);
  }

  return 0;
}