RHEL-RT SwappingNonDeterminismHowto

From RHEL-RT

Jump to: navigation, search

Swapping is Non-Deterministic & OOM Killer tunables - HOWTO

Swapping pages out to disk can introduce latency in any environment. To date the realtime kernel focus has been on subsystems other than VM, filesystem, and disk IO. As such, swapping can lead to non-deterministic behavior. If you really have stringent low-latency requirements, be sure to have enough memory in your systems so that they aren't swapping.

  • Size your system physical RAM for your application
  • Vmstat – watch at si/so

For additional information on why swapping results in non-deterministic behavior, look at the Latencies caused by Page-faults section of this page: http://rt.wiki.kernel.org/index.php/HOWTO:_Build_an_RT-application

OOM Kill tunables

There is a sysctl file in /proc/sys/vm/panic_on_oom. It can be used to enable/disable panic on OOM. Here is some information from Documentation/sysctl/vm.txt:

panic_on_oom
This enables or disables panic on out-of-memory feature. If this is set to 1, the kernel panics when out-of-memory happens. If this is set to 0 (which is the default behavior), the kernel will kill some rogue process, called oom_killer. Usually, oom_killer can kill rogue processes and system will survive. If you want to panic the system rather than killing rogue processes, set this to 1.


The later 2.6 kernels (including the one on which RHEL-RT is based) also support prioritizing which processes get OOM killed when the system dicides it must kill. In /proc/<pid>/ there are now oom_adj and oom_score. You can set the OOM priority from +15 to -17 and the system kills processes with the highest oom_score. There isn't much documentation other than the code in the kernel:

/**

* badness - calculate a numeric value for how bad this task has been
* @p: task struct of which task we should calculate
* @uptime: current uptime in seconds
*
* The formula used is relatively simple and documented inline in the
* function. The main rationale is that we want to select a good task
* to kill when we run out of memory.
*
* Good in this context means that:
* 1) we lose the minimum amount of work done
* 2) we recover a large amount of memory
* 3) we don't kill anything innocent of eating tons of memory
* 4) we want to kill the minimum amount of processes (one)
* 5) we try to kill the process the user expects us to kill, this
*    algorithm has been meticulously tuned to meet the principle
*    of least surprise ... (be careful when you change it)
*/
Personal tools