这阵子一直在处理各种死机和STR开关机问题,虽说稍微有点虐,不过技术面的扩展和更多新技巧的学习了解,也让我挺enjoy的。kernel世界对我来说,一直是一个black box,很迷。随着上阵子切bionic、看完程序员的自我修养以及这阵子为处理一个语音唤醒问题学习AN8.0的DC待机流程,发现这个是通过reboot函数做syscall进kernel待机,感觉打开一个新世界。虽然之前也是知道这条通路,但当问题和整个代码流程浮现在你面前时,一切依旧像盒子中的巧克力让你兴奋不已。
这篇文章首先通过一个实例介绍sysrq debug定位进程卡死问题,之前我没接触过sysrq,因此这次是由alix分析完之后,请他指导我怎么分析流程,而我还没去看sysrq部分的代码和实现机理。随后,介绍另一个通过修改线程在内核中的调度策略和优先级导致异常CPU占用的问题,我这边是先怀疑,然后请客户review code确认,后续有现场由Futang通过ICE分析线程参数确认问题。这次都是不太熟悉的东西,因为有bug请评论指出。
1. Driver异常问题
问题系统为AN8.0,运行在4核系统上,接USB CAM卡做切台压测,异常时UI操作能够响应,有一个进程卡死,导致TV应用卡在某一帧画面上,不运行。通过复现抓取log,分析到卡在某个接口的CMA申请上,具体如下:
一般申请不到CMA就是内存不足,不过我司的平台比较特殊,一些关键IP是有专用的memory,因此一般不会出现这种问题。在异常发生时,我这边分析了整体系统的内存情况正常,对应申请的memory部分也很充足,照理不应该卡住的。又因无下手点,所以请kernel team的alix指导处理,如下也就是他的分析了。
1. 透過sysrq check blocked state,可以知道Binder:2058_7 3(PID 3224) 目前為 block 的狀態,卡在CMA申请上面。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20[ 735.108723] sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) show-blocked-tasks(w) dump-ftrace-buffer(z)
[ 738.933149] sysrq: SysRq : Show Blocked State
[ 738.937536]task PC stack pid father
[ 739.091866] Binder:2058_7 D0 3224 2012 0x00400008
[ 739.097368] Call trace:
[ 739.099820] [<ffffff800808bf8c>] __switch_to+0x8c/0xac
[ 739.104966] [<ffffff8008a3bec4>] __schedule+0x1b0/0xdac
[ 739.110196] [<ffffff8008a3caf8>] schedule+0x38/0x98
[ 739.115078] [<ffffff8008a3fab0>] schedule_timeout+0x1cc/0x374
[ 739.120829] [<ffffff8008a3d704>] wait_for_common+0xb4/0x158
[ 739.126406] [<ffffff8008a3d7bc>] wait_for_completion+0x14/0x1c
[ 739.132243] [<ffffff80080da87c>] flush_work+0xcc/0x140
[ 739.137389] [<ffffff80081e3a94>] lru_add_drain_all+0x13c/0x188
[ 739.143228] [<ffffff8008256e2c>] migrate_prep+0xc/0x18
[ 739.148373] [<ffffff80081dcc78>] alloc_contig_range+0xec/0x3f8
[ 739.154213] [<ffffff800826293c>] dma_alloc_at_from_contiguous+0x160/0x474
[ 739.161010] [<ffffff80087b9b94>] CMA_Pool_ioctl+0x5fc/0x2364
[ 739.166674] [<ffffff80087b6278>] Compat_CMA_Pool_ioctl+0xd10/0x1318
[ 739.172950] [<ffffff80082cc740>] compat_SyS_ioctl+0xbc/0x294
[ 739.178614] [<ffffff8008082e70>] el0_svc_naked+0x24/0x28
2. Blocked task Backtrace 中 lru_add_drain_all 中的 flush_work 會讓其他所有 cpu 強迫將 workqueue 的事情做完才離開,因此懷疑的方向改為分析 workqueue 卡住。注:这部分深入可以参考读写信号量与实时进程阻塞挂死问题
3. 透過 sysrq dump workqueue status 可以看CPU2 的 workqueue 卡住了1
2
3[ 922.802525] workqueue lru-add-drain: flags=0x8
[ 922.806970]pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 922.813027] pending: lru_add_drain_per_cpu BAR(3224)
4. 透過 sysrq dump all cpu (特別是cpu2), 得知CPU2 目前的 PC 值為 el0_irq_naked+48/0x501
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21[ 2167.926823] sysrq: CPU2:
[ 2167.929358] Call trace:
[ 2167.931810] [<ffffff800808f4fc>] dump_backtrace+0x0/0x1a4
[ 2167.937210] [<ffffff800808f6b4>] show_stack+0x14/0x1c
[ 2167.942263] [<ffffff800852f4a8>] showacpu+0x5c/0x6c
[ 2167.947145] [<ffffff8008147e4c>] flush_smp_call_function_queue+0x94/0x16c
[ 2167.953934] [<ffffff8008148c5c>] generic_smp_call_function_single_interrupt+0x10/0x18
[ 2167.961767] [<ffffff8008098f40>] handle_IPI+0x180/0x298
[ 2167.966992] [<ffffff8008081514>] gic_handle_irq+0xa4/0xbc
[ 2167.972389] Exception stack(0xffffffc06cf3bec0 to 0xffffffc06cf3bff0)
[ 2167.978830] bec0: 00000000e7d00000 0000000000017800 00000000e7c02928 00000000eb76fb5c
[ 2167.986660] bee0: 00000000e7d00000 00000000e7c02928 00000000eb79ef60 0000000000000000
[ 2167.994489] bf00: 0000000000000000 00000000eb79ef58 00000000e7c02928 00000000eb79ef44
[ 2168.002319] bf20: 0000000000017800 00000000e7c02898 00000000eb5d5400 0000000000000000
[ 2168.010149] bf40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2168.017979] bf60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2168.025808] bf80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2168.033637] bfa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2168.041468] bfc0: 00000000eb5d2b3c 00000000a00f0010 0000000000000024 ffffffffffffffff
[ 2168.049297] bfe0: 0000000000000000 0000000000000000
[ 2168.054174] [<ffffff8008082c88>] el0_irq_naked+0x48/0x50
5. 通过重复sysrq dump all CPU status,我们可以看到Exception Stack一直不变,因此查看CPU2上处于running的task确认为PID 2548占着没退出。而这个task的调用栈如下,我们可以看到异常时卡在media_intf_read这只函数上,该函数采用自旋锁,不能进行处理器抢占,其他进程无法获取CPU,因此需要请对应的kernel driver owner定位下media_intf_read卡住的原因。1
2
3
4
5
6
7
8
9
10
11
12[ 2167.972389] Exception stack(0xffffffc06cf3bec0 to 0xffffffc06cf3bff0)
[10148.240821] hstvos R running task 0 2548 2122 0x00400000
[10148.247899] Call trace:
[10148.250353] [<ffffff800808bf8c>] __switch_to+0x8c/0xac
[10148.255497] [<ffffff80080eae84>] preempt_count_add+0x100/0x11c
[10148.261337] [<ffffff8008a406fc>] _raw_spin_lock_irqsave+0x20/0x60
[10148.267435] [<ffffff8008a409ac>] _raw_spin_unlock_irqrestore+0x20/0x44
[10148.273968] [<ffffff800810cc78>] finish_wait+0x5c/0x7c
[10148.279112] [<ffffff80086683c0>] media_intf_read+0x154/0x374
[10148.284775] [<ffffff8008266ea8>] __vfs_read+0x28/0x108
[10148.289917] [<0000000000000001>] 0x1
在达成如上分析后,请客户找USB CAM卡驱动的设计者,回复信息如下,这个反馈也符合了问题现象,异常时拔卡、发signal都能恢复正常。问题也就会回归为media_intf_write为何没有执行了,客户反馈如果hs_UsbCam_Create没有执行完就会出现media_intf_write不执行。
usbcam读写设计是阻塞的,如果没有media write的话,那media read接口会一直阻塞(除非拔出usbcam、close fd或者被信号杀死等特殊情况)。
通过生成coredump确认异常时这个函数卡住的位置,再与客户沟通上层实现逻辑(看不到code…),我梳理出如下一个死锁通路。因为两个流程是在不同线程上跑的,且未加同步机制保证时序,故而有机会出现deadlock。分析完,这个问题的修正方式也就浮出水面:只要保证media_intf_read在media_intf_write之后执行,保证hs_usbcam_create执行完之后才能执行vhal_start,就能避免死锁条件的形成,问题也就解决。
2. CPU占用100%问题
修正如上问题后,出现小概率客户中间件进程100%占用某颗CPU的情况出现。采用同样的方式分析,可以看到一个未知线程一直占住CPU没有放。直觉怀疑是客户对线程调度策略进行修改,后请客户review code,确认了假想。其实现类似如下,也就是将进程设定为FIFO且优先级为最高。我们后面通过ICE也可以看到异常线程的这个参数设定。1
2
3
4
5
6
7
8
9
10
11pthread_attr_t attr;
struct sched_param param;
int rs;
int policy;
rs=pthread_attr_init(&attr);
pthread_attr_setschedpolicy(&attr, SCHED_FIFO);
param.sched_priority = 99;
pthread_attr_setschedparam(&attr, ¶m);
pthread_create(&threadId, attr, (void*)thread_start_task, NULL);
pthread_attr_destroy(&attr);
从Linux Programmer’s Manual的SCHED一节可以看到如下说明,SCHED_FIFO一旦占用cpu便会一直运行(CHED_FIFO is a simple scheduling algorithm without time slicing.),直到有IO请求、更高优先级任务到达或主动退出(sched_yield)才释放CPU,他和RR都是有很高的调度优先级的。因此代码上对线程调度控制的修改也要十分谨慎,不然很容易出现问题。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48SCHED_FIFO: First in-first out scheduling
SCHED_FIFO can be used only with static priorities higher than 0,
which means that when a SCHED_FIFO threads becomes runnable, it will
always immediately preempt any currently running SCHED_OTHER,
SCHED_BATCH, or SCHED_IDLE thread. SCHED_FIFO is a simple scheduling
algorithm without time slicing. For threads scheduled under the
SCHED_FIFO policy, the following rules apply:
1) A running SCHED_FIFO thread that has been preempted by another
thread of higher priority will stay at the head of the list for
its priority and will resume execution as soon as all threads of
higher priority are blocked again.
2) When a blocked SCHED_FIFO thread becomes runnable, it will be
inserted at the end of the list for its priority.
3) If a call to sched_setscheduler(2), sched_setparam(2),
sched_setattr(2), pthread_setschedparam(3), or
pthread_setschedprio(3) changes the priority of the running or
runnable SCHED_FIFO thread identified by pid the effect on the
thread's position in the list depends on the direction of the
change to threads priority:
· If the thread's priority is raised, it is placed at the end of
the list for its new priority. As a consequence, it may
preempt a currently running thread with the same priority.
· If the thread's priority is unchanged, its position in the run
list is unchanged.
· If the thread's priority is lowered, it is placed at the front
of the list for its new priority.
According to POSIX.1-2008, changes to a thread's priority (or
policy) using any mechanism other than pthread_setschedprio(3)
should result in the thread being placed at the end of the list
for its priority.
4) A thread calling sched_yield(2) will be put at the end of the
list.
No other events will move a thread scheduled under the SCHED_FIFO
policy in the wait list of runnable threads with equal static
priority.
A SCHED_FIFO thread runs until either it is blocked by an I/O
request, it is preempted by a higher priority thread, or it calls
sched_yield(2).