Demonstrations of criticalstat: Find long atomic critical sections in the kernel.
criticalstat traces and reports occurences of atomic critical sections in the
kernel with useful stacktraces showing the origin of them. Such critical
sections frequently occur due to use of spinlocks, or if interrupts or
preemption were explicity disabled by a driver. IRQ routines in Linux are also
executed with interrupts disabled. There are many reasons. Such critical
sections are a source of long latency/responsive issues for real-time systems.
This works by probing the preempt/irq and cpuidle tracepoints in the kernel.
Since this uses BPF, only the root user can use this tool. Further, the kernel
has to be built with certain CONFIG options enabled inorder for it to work:
CONFIG_PREEMPTIRQ_EVENTS
CONFIG_DEBUG_PREEMPT
Additionally, the following options should be turned off on older kernels:
CONFIG_PROVE_LOCKING
CONFIG_LOCKDEP
USAGE:
# ./criticalstat -h
usage: criticalstat [-h] [-p] [-i] [-d DURATION]
Trace long critical sections
optional arguments:
-h, --help Show this help message and exit
-p, --preemptoff Find long sections where preemption was off
-i, --irqoff Find long sections where IRQ was off
-d DURATION, --duration DURATION
Duration in uS (microseconds) below which we filter
examples:
./criticalstat # run with default options: irq off for more than 100 uS
./criticalstat -p # find sections with preemption disabled for more than 100 uS
./criticalstat -d 500 # find sections with IRQs disabled for more than 500 uS
./criticalstat -p -d 500 # find sections with preemption disabled for more than 500 uS
The tool runs continuously until interrupted by Ctrl-C
By default, criticalstat finds IRQ disable sections for > 100us.
# ./criticalstat
Finding critical section with IRQ disabled for > 100us
===================================
TASK: kworker/u16:5 (pid 5903 tid 5903) Total Time: 194.427 us
Section start: __schedule -> schedule
Section end: _raw_spin_unlock_irq -> finish_task_switch
trace_hardirqs_on+0xdc
trace_hardirqs_on+0xdc
_raw_spin_unlock_irq+0x18
finish_task_switch+0xf0
__schedule+0x8c8
preempt_schedule_irq+0x38
el1_preempt+0x8
===================================
If too many sections are showing up, the user can raise the threshold to only
show critical sections that are > 500us by passing "-d" option:
# ./criticalstat -d 500
Finding critical section with IRQ disabled for > 500us
===================================
TASK: crtc_commit:111 (pid 246 tid 246) Total Time: 580.730 us
Section start: clk_enable_lock -> clk_enable
Section end: _raw_spin_unlock_irqrestore -> clk_enable
trace_hardirqs_on+0xdc
trace_hardirqs_on+0xdc
_raw_spin_unlock_irqrestore+0x24
clk_enable+0x80
msm_dss_enable_clk+0x7c
sde_power_resource_enable+0x578
_sde_crtc_vblank_enable_no_lock+0x68
sde_crtc_vblank+0x8c
sde_kms_enable_vblank+0x18
vblank_ctrl_worker+0xd0
kthread_worker_fn+0xf8
kthread+0x114
ret_from_fork+0x10
===================================
If instead of irq disabled sections, we want to see preempt disabled sections,
then pass the "-p" option. Below we try to find preempt-disabled critical
sections that are > 500us.
# ./criticalstat -p -d 500
Finding critical section with preempt disabled for > 500us
===================================
TASK: swapper/1 (pid 0 tid 0) Total Time: 618.437 us
Section start: preempt_count_add -> preempt_count_add
Section end: preempt_count_sub -> preempt_count_sub
trace_preempt_on+0x98
trace_preempt_on+0x98
preempt_latency_stop+0x164
preempt_count_sub+0x50
schedule+0x74
schedule_preempt_disabled+0x14
cpu_startup_entry+0x84
secondary_start_kernel+0x1c8
[unknown]
===================================
criticalstat -p can also reflect kernel scheduler issues sometimes. These may
show up as long preempt-off sections if the functions in the scheduler take a
long time to run (such as pick_next_task_fair which selects the CPU for a task
Follow is a report showing a preempt-off latency of 700us during the schedule
loop's execution:
===================================
TASK: irq/296-cs35l36 (pid 666 tid 666) Total Time: 732.657 us
Section start: schedule -> schedule
Section end: schedule -> schedule
trace_preempt_on+0x98
trace_preempt_on+0x98
preempt_count_sub+0xa4
schedule+0x78
schedule_timeout+0x80
wait_for_common+0xb4
wait_for_completion_timeout+0x28
geni_i2c_xfer+0x298
__i2c_transfer+0x4e0
i2c_transfer+0x8
irq_thread_fn+0x2c
irq_thread+0x160
kthread+0x118
ret_from_fork+0x10
===================================
See Also: Linux kernel's preemptoff and irqoff tracers which provide similar
tracing but with some limitations.