From SEGGER Wiki
Jump to: navigation, search

In Embedded Systems, CRC stands for cyclic redundancy check. A CRC is in many respects similar to a checksum, and in fact, this is what it is typically used for. CRCs come in all kinds of sizes, from 4 bits to 64 bits, but there is no upper limit.

Generating CRCs in hardware

CRCs can be generated very easily in hardware, by shifting the data through a shift register with feedback at certain bit positions. A 32-bit CRC requires not much more than a 32-bit shift register. This is ideal for applications in which the data is actually transferred bit-by-bit. A lot of MCUs have built-in CRC circuits, that typically allow computation of a specific CRC, such as 0x1021 (16-bit). There are two downsides to these hardware units:

  1. Fixed to a specific CRC
  2. Not thread safe: The CRC unit can only be used from one thread(task) at a time, so the software needs to be designed in a way that makes sure this is so, typically using the unit only in a single thread, or by securing it using a Mutex/Semaphore/Critical region or simply disabling interrupts

Generating CRCs in software

Generating a CRC is something that needs to be done a lot. Again, there are a lot of ways to do it. The easiest way is to do the same thing that is easy in hardware, and compute the CRC bit-by-bit. Unfortunately, this is also the slowest way. More advanced ways of doing it are table based. The tables can be any size, typically 16 entries for 4 bits at a time or 256 entries for 8 bits (one byte) at a time.

Portable Implementation in C


The below is a piece of C-Code that SEGGER actually uses in production code. It computes a CRC, LSB first, for any Polynomial up to 32 bits, on any amount of data. The starting value can be specified and the routine can be used to compute the CRC of multiple memory areas by simply using the return value of the first block as initial value for the second block. It is small and efficient and thus used where size is more important than speed. The downside is that it is a bit slow, as each byte is computed in the inner loop, on a bit-by-bit basis, thus executed 8 times for each byte. Typically CPUs will need about 150 cycles per byte, so a 200MHz microcontroller will deliver about 1.3 MiB/sec. Faster versions using a table (with typically 256 entries, so they handle 1 byte per loop) can be optimized to use about 15 cycles per loop, delivering about 13MiB performance. Note that the code does not need any globals / statics and is fully reentrant.

C - Source code

U32 CRC_Calc(const U8* pData, unsigned NumBytes, U32 crc, U32 Polynom) {
  int i;
  U8 Data;
  U8 Xor;

  do {
    Data = *pData++;
    i = 8;
    do {
      Xor = crc ^ Data;
      crc >>= 1;
      if (Xor & 1)
        crc ^= Polynom;
      Data >>= 1;
    } while (--i);
  } while (--NumBytes);
  return crc;