RHEL-RT FAQ

From RHEL-RT

Jump to: navigation, search

Contents

Frequently Asked Questions

What is this RT thing anyway?

RT stands for Realtime. The term Realtime has been abused quite often in marketing. The term can mean "on the fly updates" as you can track a package being sent from a supplier. But here we are talking about RTOS "Realtime Operating Systems". A better term is "Deterministic Operating System". An RTOS can guarantee the time it takes for a transaction to occur, or the worst case latency from an interrupt happening to the time a reaction is taken place.


OK, then what does MRG give me?

For the past several years, Red Hat has been supporting development on converting Linux (a general purpose operating system - GPOS) into a full RTOS system. MRG incorporates the user land applications of RHEL5 and the kernel with the RT patch extensions. The RT patch development (lead by Ingo Molnar - a full time Red Hat employee) has been solving many of the issues that are obstacles with Linux in terms of realtime determinism. Linux with the RT patch can now guarantee worst case response times and time of calculations.


Will MRG make my applications run faster?

The main objective of realtime is determinism. Most application workloads are measured in throughput - which is an average. For many classes of workloads realtime throughput will be slightly lower due to the increased overhead. While other properly prioritized realtime workloads may yield higher throughput by properly preventing lower priority tasks from impeding progress of higher priority tasks.

The Linux kernel is designed to be as fast as possible. But Linux sacrifices determinism in doing so. Adding a guarantee worst latency has a price, and takes a bit of overhead to accomplish it. Throughput is sometimes best when reactions are slow.

One of the largest sources of non-deterministic reaction times is interrupt handlers. In Linux, interrupt handlers are handled when the interrupts arrive. They preempt any task without prejudice to their priority. This helps with throughput but increases latencies for user tasks. The RT kernel moves the work of device interrupt handlers into threads to allow higher priority tasks to be unaffected by interrupts. The better determinism is the result of interrupt threads at the cost of more overhead to handle the interrupt. This is just one example of where an RTOS may sacrifice throughput for determinism.

Note, we try very hard to keep the throughput as close to that of the mainstream Linux.


What is this Latency you keep talking about?

Latency is the difference between the time we want something to occur and when it actually does. There are four main sources of latency.

  • Interrupt latency - An interrupt is a means for hardware devices to notify the Linux Operating System that they require attention, for example, because some data has finished being read from disk, or a packet of traffic was received over the network. Interrupt latency is the time between such an interrupt triggering (the hardware storing this fact in a means that can be determined by the Linux kernel) and when the Operating System is actually able to handle that interrupt.
  • Wakeup latency - The time a task is awoken, and the time it actually runs.
  • Priority Inversion - The time a higher priority process must wait for a resource that a lower priority process has.
  • Interrupt Inversion - A special kind of Priority Inversion, where a high priority task must wait for an interrupt to handle a task for a lower priority process.


I installed MRG and my latencies are still bad?

In MRG, interrupts are run as threads. But their priority is set at 95 (out of 99). Any task of lower priority (most of them) would still have the same types of latencies as mainline Linux with respect to interrupts. The interrupt threads are viewable by simple ps command:


     ps -eo pid,rtprio,command | grep IRQ
        87     95 [IRQ-9]
       365     95 [IRQ-8]
       429     95 [IRQ-12]
       430     95 [IRQ-1]
       447     95 [IRQ-16]
       448     95 [IRQ-21]
       449     95 [IRQ-23]
       450     95 [IRQ-19]
       451     95 [IRQ-18]
       479     95 [IRQ-17]
       485     95 [IRQ-8409]
      1620     95 [IRQ-7]
      2004     95 [IRQ-22]
      2125     95 [IRQ-6]
      2733     95 [IRQ-8408]
      3622     95 [IRQ-4]


This means that any process that is under priority 95 will be affected by the interrupt threads. If you have a high priority process that needs to run higher than the interrupts, you can either set it to priority 96 or simply use the Tuna tool, available in the partners yum repository, to modify the default priorities of the interrupts.


When I run my application at priority 99, the system locks up?

Don't run applications at the highest priority!


Running applications at a priority above interrupts may be dangerous. High priority processes will run until a process with a higher priority preempts it, or the process voluntarily goes to sleep. Even the keyboard interrupt must wait for the process. If you have an application that is in a busy while loop, doing a non blocking read on the console, waiting for input from the keyboard, it may never receive that input if it is running at a higher priority than the keyboard interrupt.

Ideally, no application should run at priority 99 (only very few kernel threads run at that priority). RT gives power to the user, and just like anything else, with added power comes added responsibility. One must understand the RT applications and how they interact with the rest of the system. The priorities between RT applications and IRQ threads must be coordinated properly for an optimal result.


Why isn't latency_tracer enabled in kernel-rt-debug?

The kernel-rt-debug package comes with extra code to help in diagnosing problems. This debugging code brings a significant overhead and that overhead would be present on latency measurements. That could lead to wrong results and misinterpretation as the code paths being measured are not the same of a production kernel.