0%

使用反汇编定位未开源库问题

问题背景

整合版本时,遇到一个必现死机问题,每次出现问题都是调用库中的一个函数。
经过排查,初步怀疑是库存在一些问题,或者与外部使用配合出现问题。

定位问题

(gdb) set $pc=$epc
(gdb) i r r10
r10            0x970038656
(gdb) info registers 
r0             0x90972c100x90972c10
r1             0x90486564-1874303644
r2             0x970038656
r3             0xff255
r4             0x904895de-1874291234
r5             0xae174
r6             0x00
r7             0x00
r8             0x904d03a0-1874000992
r9             0x11110009286326793
r10            0x970038656
r11            0x6096
r12            0x11
r13            0x904d03a0-1874000992
r14            0x904d02a0-1874001248
r15            0x90142750-1877727408
pc             0x9000060a0x9000060a <__default_14>
epc            0x902abd300x902abd30 <inflate_fast+708>
psr            0x8f0e0010-1894907888
epsr           0x800e0150-2146565808
(gdb) pc
#0  0x9014279cin xxx_xxm()at xx.c:574
#1  0x90142750in xxx_xxm()at xx.c:568
#2  0x9014144ein xxx_xx()at xx.c:2659
#3  0x90141446in xxx_xx()at xx.c:2658
#4  0x9014143ein xxx_xx()at xx.c:2657
#5  0x901413f2in xxx_xx()at xx.c:2657
#6  0x901413dein xxx_xx()at xx.c:2652
#7  0x901413d2in xxx_xx()at xx.c:2652
(gdb) disassemble $pc-10,$pc+10
Dump of assembler code from 0x90000600 to 0x90000614:
0x90000600 <__hardware_accelerator+2>:br0x90000600
0x90000602 <__trap0+0>:bkpt
0x90000604 <__trap0+2>:br0x90000604
0x90000606 <__default_13+0>:bkpt
0x90000608 <__default_13+2>:br0x90000608
=> 0x9000060a <__default_14+0>:bkpt
0x9000060c <__default_14+2>:br0x9000060c
0x9000060e <__default_15+0>:bkpt
0x90000610 <__default_15+2>:br0x90000610
0x90000612 <__default_17+0>:bkpt
End of assembler dump.
(gdb) bt
#0  0x9000060a in __default_14 ()
#1  0x90142750 in xxx_xxm () at xx.c:568
#2  0x9013cfb8 in func () at c1.c:167
#3  0x9013f394 in main_entry (args=<value optimized out>) at init.c:14
#4  0x9008724c in default_thread_function (arg=0x907f32b8) at os/ecos/osapi.c:371
#5  0x901232f2 in pthread_entry(unsigned int) ()
#6  0x90127030 in Cyg_HardwareThread::thread_entry(Cyg_Thread*) ()
#7  0x9012701c in Cyg_Thread::exit() ()
#8  0xb0b6ccd0 in ?? ()

可以判断出r10中值出现问题,并且问题出现在文件xx.c的函数xxx_xxm()中,重启跟踪代码:

(gdb) b xxx_xxm 
Breakpoint 3 at 0x901425f4: file xx.c, line 524.
(gdb) c
Continuing.

Breakpoint 3, xxx_xxm () at xx.c:524
524xx.c: 没有那个文件或目录.
in xx.c
(gdb) display /i $r10
1: x/i $r10
0x1111000a:ldm r3-r15, (r0)
(gdb) display /i $pc
2: x/i $pc
=> 0x901425f4 <xxx_xxm+20>:lrw r7, 0x903BA9CC
(gdb) n
526in xx.c
(gdb) 
574in xx.c
2: x/i $pc
=> 0x9014279c <xxx_xxm+444>:mov r7, r2
1: x/i $r10
0x1111000a:movi r4, 86
(gdb) 
575in xx.c
2: x/i $pc
=> 0x901427a6 <xxx_xxm+454>:mov r2, r10
1: x/i $r10
0x9700:movi r4, 86

问题很可能出现在源文件xx.c的574行,执行过这一行之后,r10的值修改为0x9700,最后请客户检查代码,确认为野指针。