A Watchdog is a hardware unit that automatically resets the CPU, or the entire Embedded System, in the case of system failure, such as a program crash or hardware fault. Also known as Watchdog Timers, they are common in systems that are critical, especially if human access is difficult or would take too much time. The time until a watchdog resets the system is usually programmable, and depends very much on the application. It is typically between 1ms and a few seconds.
The Watchdog starts with a time value higher that the acceptable worst-case-scenario time requirement to complete the loop. The timer counts down until it is reset or the allotted time runs out, at which point a timeout signal is generated and the CPU is reset.
Feeding the watchdog
The application program is responsible for resetting the watchdog time before it expires and resets the system. It usually does this by writing a particular value at a particular location, a "special function register" of the watchdog. This process is typically called “feeding” or “kicking" the watchdog.
Single and Multi-Tasking environments
In a Single threaded program (one that continuously runs through a main loop), this process is usually easy; the watchdog is typically fed once with every execution of this loop. The act of feeding can be conditional, in which case the watchdog is only fed if all critical components of the system are up and running. One example of that functionality check could be a communication interface, such as Ethernet, to verify that the packet queued for sending actually gets sent out in the expected time frame. In multi-tasking environments (typically using an RTOS), things can be more tricky. Usually, a special watchdog task exists, which checks a number of indicators for a functioning system and feeds the watchdog only if everything is "up and running".
However, in a single threaded program, there may be one task taking over the functionality of the main loop , being responsible for feeding the watchdog.
Another option is to feed it in a periodically called Interrupt-service-routine, again based on certain conditions.
Examples include pretty much any Embedded System from critical ones such as space probes, where access is impossible, and pace makers, where time is of the essence, to simple consumer applications such as a cordless phone. Even for a cordless phone, the end user wants it to reset automatically and continue to function in case the program crashes, rather than having to go through the process of disconnecting and reconnecting the batteries.