Kernel Internals

Kernel Preemption

The Linux kernel is a fully preemptive kernel. In nonpreemptive kernels, kernel code runs until completion. That is, the scheduler cannot reschedule a task while it is in the kernel. In Linux, it is possible to preempt a task at any point, so long as the kernel is in a state in which it is safe to reschedule.

When is it safe to reschedule? The kernel can preempt a task running in the kernel so long as it does not hold a lock. That is, locks are used as markers of regions of nonpreemptability. Kernel is SMP-safe.

Kernel preemption can also occur explicitly, when a task in the kernel blocks or explicitly calls schedule().

Kernel preemption can occur:

When an interrupt handler exits, before returning to kernel-space
When kernel code becomes preemptible again
If a task in the kernel explicitly calls schedule()
If a task in the kernel blocks (which results in a call to schedule())

User preemption can occur:

When returning to user-space from a system call.
When returning to user-space from an interrupt handler.

Context Switching

Context switching, the switching from one runnable task to another, is handled by the context_switch() function defined in kernel/sched/core.c.

It is called by schedule() when a new process has been selected to run. It does two basic jobs:

Calls switch_mm(), to switch the virtual memory mapping from the previous process’s to that of the new process.
Calls switch_to(), declared in <asm/system.h>, to switch the processor state from the previous process’s to the current’s. This involves saving and restoring stack information and the processor registers and any other architecture-specific state that must be managed and restored on a per-process basis.