0%

Linux 内核死锁

系统出现死机问题,终端无法操作,怀疑是死锁问题,开启内核选项来辅助调试

Linux kernel configuration

  1. General setup -> Configure standard kernel features 开启,使用默认值并保证子选项 Load all symbols for debugging/ksymoops 打开
  2. Kernel hacking -> Panic on Oops 开启
  3. Kernel hacking -> Lock Debugging (spinlocks, mutexes, etc...) 开启子选项
    • RT Mutex debugging, deadlock detection
    • Spinlock and rw-lock debugging: basic checks
    • Mutex debugging: basic checks

panic 分析

死机时输出如下信息,可以看出是 spinlock 导致的问题

[  304.883867] VPU_DecClose: success
[  309.036689] BUG: spinlock lockup suspected on CPU#0, thread-demux0/801
[  309.044130]  lock: _dmxdev+0x580/0xffe44a4c [av], .magic: dead4ead, .owner: dumpfilter_thre/10758, .owner_cpu: 1
[  309.054407] CPU: 0 PID: 801 Comm: thread-demux0 Tainted: G           O    4.9.y #7
[  309.061979] Hardware name: nationalchip sirius
[  309.066458] [<c010f290>] (unwind_backtrace) from [<c010b834>] (show_stack+0x10/0x14)
[  309.074220] [<c010b834>] (show_stack) from [<c0374618>] (dump_stack+0x94/0xa8)
[  309.081460] [<c0374618>] (dump_stack) from [<c015d860>] (do_raw_spin_lock+0xfc/0x1b8)
[  309.089309] [<c015d860>] (do_raw_spin_lock) from [<c0651830>] (_raw_spin_lock_irqsave+0x10/0x18)
[  309.098594] [<c0651830>] (_raw_spin_lock_irqsave) from [<bf004e9c>] (_fifo_put+0x110/0x144 [av])
[  309.108335] [<bf004e9c>] (_fifo_put [av]) from [<bf05ff9c>] (_tsw_dealwith+0x210/0x35c [av])
[  309.118203] [<bf05ff9c>] (_tsw_dealwith [av]) from [<bf060814>] (_tsw_isr+0x144/0x244 [av])
[  309.128405] [<bf060814>] (_tsw_isr [av]) from [<bf059da0>] (demux_irq_thread+0xa4/0xe0 [av])
[  309.137781] [<bf059da0>] (demux_irq_thread [av]) from [<c01385d4>] (kthread+0xdc/0xf4)
[  309.145720] [<c01385d4>] (kthread) from [<c0107678>] (ret_from_fork+0x14/0x3c)
[  309.152951] NMI backtrace for cpu 0

需要使用的几种工具 objdumpaddr2line

[<bf004e9c>] (_fifo_put [av]) from [<bf05ff9c>] (_tsw_dealwith+0x210/0x35c [av]) 解析如下

  • <bf004e9c> 当前栈指针
  • _fifo_put 当前调用
  • [av] 模块名
  • from [<bf05ff9c>] 上一栈指针
  • _tsw_dealwith 函数调用
  • 0x210/0x35c0x210 偏移, 0x35c 函数大小

可以使用 objdump -d -t -S linux/a.o 查看函数

另外模块在加载时存在偏移量,使用 lsmod 查询偏移量已计算绝对值

$ lsmod
av 2552573 3 - Live 0xbf000000 (O)

[<bf004e9c>] (_fifo_put [av]) from [<bf05ff9c>] (_tsw_dealwith+0x210/0x35c [av])

反汇编代码截取如下, 0x5ff9c = 0x0005fd8c + 0x210

0005fd8c <_tsw_dealwith>:
    .
    .
    5ff98:   ebfffffe    bl  4d8c <_fifo_put>
    5ff9c:   eaffffac    b   5fe54 <_tsw_dealwith+0xc8>
    5ffa0:   e1a0200a    mov r2, sl
    .