General information about tracing
ARM tracing describes an advanced debug feature set of ARM devices that are able to stream out compressed core instruction information so a data stream of executed instructions can be reconstructed. This enables the possibility to analyze passed core instructions precisely and depending on the trace type without any gaps. In some ways this debug technique is comparable to an "application time machine" as all instructions from application start till stop are part of that trace stream and can be analyzed.
A full trace stream can only be reconstructed if the programmed application is known by the debug probe as the trace stream will mostly consist of information about what memory area is being executed from and how many instructions have been executed and how many weren't. To be able to synchronize the trace stream indicators such as the program counter will be transferred periodically as well.
The different hardware trace types
Generally tracing is divided into the following main trace types:
PC sampling describes the process of halting the target device periodically and reading the current PC value. Thus statistical code coverage can be achieved. The biggest advantage is that this technique requires no additional hardware pins than the ones already used for the debug interface. However it also got major flaws, the biggest two being that the sample period can be irregular depending on the general data load from target device to debug probe and the constant halting of the application will interfere with any real time behaviour of your application. This is why this is the least recommended trace technique.
SWO (Serial Wire Output) is mostly used for printf debugging. But it offers more than that. For example periodic PC trace information can be output as well. That way a trace stream can be reconstructed from one "PC ping" to another using the ITM (Instrumentation Trace Macrocell). Unfortunately this leads to a incomplete trace stream due to the sampling frequency being relatively low. Other drawbacks are that SWO needs a physical hardware pin and is susceptible for overflows if the PC sampling frequency is set to fast. The biggest advanted is that the SWO protocol is widely supported by different debug probes and debug software and that the periodic sampling happens non-intrusive.
The MTB (Micro Trace Buffer) was first introduced with Cortex-M0+ cores and enabled complete instruction trace for Cortex-M cores. This has been expanded with the newer ETB (Embedded Trace Buffer) and the latest ETF (Embedded Trace FIFO) which is part of the TMC (Trace Memory Controller).
All three versions are essentially designated memory areas in RAM (sometimes dedicated, sometimes shared) where the instruction trace data gets dumped into from the ETM (Embedded Trace Macrocell). That way you will get a complete trace stream of the last couple of instructions that will fit into that RAM area (usually only 4-8 kB in size). The memory gets overwritten in FIFO fashion which enables you to have the latest instructions stored.
A big plus of this technique is that you don't need any hardware pins at all and that you get a full trace coverage of the latest instruction trace that fit into RAM. Also this technique is completely non-intrusive. The drawback is that you only get the latest instruction trace that fit into RAM which limits the actual time which can be analyzed backwards.
ETM/PTM + TPIU
ETM, PTM (Program Trace Macrocell) and TPIU (Trace Port Interface Unit) are the key components that are generating the instruction trace data. ETM is most prominently available on Cortex-M and Cortex-R cores and PTM on Cortex-A cores (there are some exceptions also available on the market). The TPIU deals with outputting the trace data over pins in a non-intrusive fashion.
Instead of dumping the instructions into a RAM buffer an ETM/PTM can also send the information to the trace debug probe using GPIO pins directly. This technique is usually also referred to as pin trace. That way high speed transfers of more than 100 MByte/s can be achieved. The trace clock is generally half the CPU clock of the target device and gets sampled in DDR (double data rate) fashion.
This way you are not only getting the full trace stream but also with very high speeds so even high end Cortex-A devices with over 1 GHz CPU clock speed can be traced and analyzed during run time. The only drawback with this technique is that several high speed GPIO pins need to be available which are blocked by this technique during debugging (usually five pins, minimum two).
More information about tracing can be found in the J-Link User Manual (UM08001_JLink) or on our Website www.segger.com.
The different software trace types
The following trace types are usually referred to as software tracing. They have in common that no extra hardware other than the already used debug interface is required and that they usually run on any target device. Their most prominent drawback is that they all are intrusive.
One can argue if printf may be considered an actual trace technique but we will name it here as it is one of the oldest and still widely used debug techniques. The general principle is that at certain code parts a printf statement would be called which usually outputs a string + some optional data. Depending on the implementation, this will be forwarded to the host PC e.g. over the debug probe so one can use this to follow code execution. One popular way of printf implemented in software is RTT (Real-Time Transfer).
Using software instrumentation for software tracing goes a step further than plain printf. Here a specific API will be implemented which enables much higher data throughput and featureset than just printing out strings. The instrumented information will then be send to the host PC for further analysis which is usually done in some trace analysis application. One popular implementation of software tracing is SystemView.