今天我们看一看内核对长期处于睡眠状态的进程的一种监测机制:hung task。
hung task 案例
用 crash tool 解析出的结果如下:
KERNEL: examples/hungtask/vmlinux
DUMPFILE: examples/hungtask/dump-hungtask.bin
CPUS: 2 [OFFLINE: 1]
DATE: Thu Nov 23 10:31:23 CST 2023
UPTIME: 00:03:44
LOAD AVERAGE: 2.60, 1.08, 0.41
TASKS: 51
NODENAME: (none)
RELEASE: 4.19.298
VERSION: #24 SMP PREEMPT Thu Nov 23 10:27:05 CST 2023
MACHINE: aarch64 (unknown Mhz)
MEMORY: 128 MB
PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
PID: 476
COMMAND: "khungtaskd"
TASK: ffff800004a61b00 [THREAD_INFO: ffff800004a61b00]
CPU: 1
STATE: TASK_RUNNING (PANIC)
PANIC 提示:hung_task 机制侦测到有(长期)阻塞的任务。
PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
COMMAND 提示:panic 发生在 "khungtaskd" 进程中。
COMMAND: "khungtaskd"
用 log 命令查看内核 log,有如下两段提示。第一段 log 提示了 task hung 的原因,有进程 block 超过 60 秒。
[ 11.028762] test_work_func enter
[ 121.875251] INFO: task kworker/1:1:44 blocked for more than 60 seconds.
[ 121.875749] Not tainted 4.19.298 #24
[ 121.875886] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 121.876116] kworker/1:1 D 0 44 2 0x00000028
[ 121.876995] Workqueue: events test_work_func
[ 121.877180] Call trace:
[ 121.877285] __switch_to+0xe8/0x148
[ 121.877413] __schedule+0x1e8/0x5c8
[ 121.877482] schedule+0x38/0xa0
[ 121.877526] schedule_timeout+0x24c/0x388
[ 121.877581] __down+0x74/0xc8
[ 121.877629] down+0x48/0x60
[ 121.877667] test_work_func+0x3c/0x58
[ 121.877748] process_one_work+0x1b4/0x2e8
[ 121.877869] worker_thread+0x48/0x400
[ 121.877979] kthread+0x128/0x160
[ 121.878080] ret_from_fork+0x10/0x1c
[ 121.878389] Kernel panic - not syncing: hung_task: blocked tasks
我们可以用 dis 命令看看 test_work_func 函数的源码,确实是在 down 一个信号量,并且该信号量初始值是 0,所以会 马上会进入阻塞。而其他地方并没有 up 该信号量的操作。
crash> dis -ls test_work_func
FILE: ../drivers/input/keyboard/gpio_keys.c
LINE: 404
399
400 static void test_work_func(struct work_struct *work)
401 {
402 int i;
403
* 404 pr_info("%s enter\n", __func__);
405 sema_init(&test_sem, 0);
406 down(&test_sem);
407 pr_info("%s exit\n", __func__);
408 }
第二段 log 提示了 hung task 的侦测流程,是通过 watchdog 这个内核线程来侦测,当满足 hung task 的条件后主动 panic 。
[ 121.878679] CPU: 1 PID: 476 Comm: khungtaskd Not tainted 4.19.298 #24
[ 121.878879] Hardware name: linux,dummy-virt (DT)
[ 121.879060] Call trace:
[ 121.879124] dump_backtrace+0x0/0x158
[ 121.879198] show_stack+0x14/0x20
[ 121.879260] dump_stack+0x94/0xc4
[ 121.879322] panic+0x13c/0x2a0
[ 121.879376] watchdog+0x290/0x3a0
[ 121.879431] kthread+0x128/0x160
[ 121.879487] ret_from_fork+0x10/0x1c
[ 121.879773] SMP: stopping secondary CPUs
[ 121.880707] Kernel Offset: disabled
[ 121.880863] CPU features: 0x00,24002004
[ 121.880935] Memory Limit: none
[ 121.881307] ---[ end Kernel panic - not syncing: hung_task: blocked tasks ]---
hung task 侦测原理
内核会创建一个内核线程(khungtaskd),定期监测处于 D 状态(TASK_UNINTERRUPTIBLE)的进程,如果发现有进程在两次监测之间没有发生任何的调度,则认为该进程很有可能已经死锁,于是输出告警日志供开发人员参考。
进程长期处于 D 状态,这个并不一定代表内核出现了问题,但需要密切注意,所以给出告警信息。如果内核配置了 hung task 后 panic 的功能,则会发起 panic。
- CONFIG_DEFAULT_HUNG_TASK_TIMEOUT 用于配置监测周期
- CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE 用于配置是否 panic
/*
* kthread which checks for tasks stuck in D state
*/
static int watchdog(void *dummy)
{
unsigned long hung_last_checked = jiffies;
set_user_nice(current, 0);
for ( ; ; ) {
unsigned long timeout = sysctl_hung_task_timeout_secs;
unsigned long interval = sysctl_hung_task_check_interval_secs;
long t;
if (interval == 0)
interval = timeout;
interval = min_t(unsigned long, interval, timeout);
t = hung_timeout_jiffies(hung_last_checked, interval);
if (t <= 0) {
if (!atomic_xchg(&reset_hung_task, 0) &&
!hung_detector_suspended)
check_hung_uninterruptible_tasks(timeout);
hung_last_checked = jiffies;
continue;
}
schedule_timeout_interruptible(t);
}
return 0;
}