RTOS debugging, part 3: Chasing the jitter bug

By Dr. Johan Kraft

CEO and Founder

Percepio

September 27, 2017

RTOS debugging, part 3: Chasing the jitter bug

The visible symptoms of jitter can be very similar to what you see in a system that suffers from CPU starvation, from general sluggishness to intermittent data loss or even malfunction.

When you have a task in your system that is supposed to execute at regular intervals (say for instance that it needs to read a sensor value every five milliseconds), then you have a system that is sensitive to random delays – also known as “jitter”. When your task experiences jitter, it sometimes has to wait longer than its intended sleep time before next activation. Some minor jitter is typically no problem and often hard to avoid, but excessive jitter is a different story.

The visible symptoms of (excessive) jitter can be very similar to what you see in a system that suffers from CPU starvation, from general sluggishness to intermittent data loss or even malfunction. The fix is also similar: make sure you get the task priorities right and avoid long-running, high-priority tasks.

The first thing is to make sure the RTOS is configured to use pre-emptive scheduling so that the operating system can pre-empt the running task when higher priority tasks need to execute. And make sure that the RTOS tick rate – how often the RTOS timer interrupt occurs – is set high enough, as this dictates how precise the system scheduler can be. Ideally, you want the time between two consecutive RTOS ticks to be much shorter than the period time of the most frequent tasks in the system, and ideally a common divisor of these periodicities. In practice, the RTOS tick rate can be a trade-off between real-time accuracy and RTOS processing overhead, as the RTOS tick uses up some processor cycles. On 32-bit systems this overhead is usually insignificant, but for slow 8-bit or 16-bit MCUs, the effect can be more noticeable.

It is important that you never disable interrupts to protect a critical section in your code, as this disables the RTOS and with it the scheduling we have just discussed. Instead, you can protect the critical section with a mutex or, even better, create a separate task to manage the resource you need to protect.

Accurate, pre-emptive RTOS scheduling is key to reduce jitter, but even with this in place, you may experience some jitter. If you find that the disturbance comes from high priority tasks, you can consider changing the priorities. It may also be possible to nudge the starting times of the tasks involved in such a way that they no longer interfere with each other. A third option is to restructure the blocking task(s) to spend less time at high-priority level.

If you find that jitter is caused by an interrupt service routine (ISR), your best recourse is to try to reduce its execution time at interrupt level. This can be achieved, for example, by refactoring the code and move some of the processing to an ordinary task. One final option, if your task has very low jitter tolerance, is to refactor the task itself; its most time-critical parts can be implemented as an ISR that runs on a timer interrupt.

Tracealyzer can be quite helpful when it comes to locating jitter. In the diagram above, the X-axis is a timeline and each point represents one execution of a particular task. The Y-value is time in milliseconds since the last execution. Normally we have 5 ms between task activations, but it is obvious from the plot that we have almost 7 ms at one point and also 6 ms on two occasions.

Why are these delays there? This diagram cannot answer that, but it provides pointers to places where we should trace execution in more detail to see what is going on.

Dr. Johan Kraft is CEO and CTO of Percepio AB, a Swedish tool company developing visual analysis tools that accelerate embedded software development. Dr. Kraft holds a Ph.D. in Computer Science and his academic work up until 2010 focused on practical methods for timing analysis of embedded software, performed in close collaboration with regional software-oriented industry. Before his doctoral studies, Dr. Kraft started his career as an embedded software developer, working with control software for industrial robots.

Dr. Johan Kraft is CEO and founder of Percepio AB. Dr. Kraft is the original developer of Percepio Tracealyzer, a tool for visual trace diagnostics that provides insight into runtime systems to accelerate embedded software development. His applied academic research, in collaboration with industry, focused on embedded software timing analysis. Prior to founding Percepio in 2009, he worked in embedded software development at ABB Robotics. Dr. Kraft holds a PhD in computer science.

More from Dr. Johan

Categories
Open Source