问题背景
整合版本时,遇到一个必现死机问题,每次出现问题都是调用库中的一个函数。
经过排查,初步怀疑是库存在一些问题,或者与外部使用配合出现问题。
定位问题
(gdb) set $pc=$epc
(gdb) i r r10
r10 0x970038656
(gdb) info registers
r0 0x90972c100x90972c10
r1 0x90486564-1874303644
r2 0x970038656
r3 0xff255
r4 0x904895de-1874291234
r5 0xae174
r6 0x00
r7 0x00
r8 0x904d03a0-1874000992
r9 0x11110009286326793
r10 0x970038656
r11 0x6096
r12 0x11
r13 0x904d03a0-1874000992
r14 0x904d02a0-1874001248
r15 0x90142750-1877727408
pc 0x9000060a0x9000060a <__default_14>
epc 0x902abd300x902abd30 <inflate_fast+708>
psr 0x8f0e0010-1894907888
epsr 0x800e0150-2146565808
(gdb) pc
#0 0x9014279cin xxx_xxm()at xx.c:574
#1 0x90142750in xxx_xxm()at xx.c:568
#2 0x9014144ein xxx_xx()at xx.c:2659
#3 0x90141446in xxx_xx()at xx.c:2658
#4 0x9014143ein xxx_xx()at xx.c:2657
#5 0x901413f2in xxx_xx()at xx.c:2657
#6 0x901413dein xxx_xx()at xx.c:2652
#7 0x901413d2in xxx_xx()at xx.c:2652
(gdb) disassemble $pc-10,$pc+10
Dump of assembler code from 0x90000600 to 0x90000614:
0x90000600 <__hardware_accelerator+2>:br0x90000600
0x90000602 <__trap0+0>:bkpt
0x90000604 <__trap0+2>:br0x90000604
0x90000606 <__default_13+0>:bkpt
0x90000608 <__default_13+2>:br0x90000608
=> 0x9000060a <__default_14+0>:bkpt
0x9000060c <__default_14+2>:br0x9000060c
0x9000060e <__default_15+0>:bkpt
0x90000610 <__default_15+2>:br0x90000610
0x90000612 <__default_17+0>:bkpt
End of assembler dump.
(gdb) bt
#0 0x9000060a in __default_14 ()
#1 0x90142750 in xxx_xxm () at xx.c:568
#2 0x9013cfb8 in func () at c1.c:167
#3 0x9013f394 in main_entry (args=<value optimized out>) at init.c:14
#4 0x9008724c in default_thread_function (arg=0x907f32b8) at os/ecos/osapi.c:371
#5 0x901232f2 in pthread_entry(unsigned int) ()
#6 0x90127030 in Cyg_HardwareThread::thread_entry(Cyg_Thread*) ()
#7 0x9012701c in Cyg_Thread::exit() ()
#8 0xb0b6ccd0 in ?? ()
可以判断出r10中值出现问题,并且问题出现在文件xx.c的函数xxx_xxm()中,重启跟踪代码:
(gdb) b xxx_xxm
Breakpoint 3 at 0x901425f4: file xx.c, line 524.
(gdb) c
Continuing.
Breakpoint 3, xxx_xxm () at xx.c:524
524xx.c: 没有那个文件或目录.
in xx.c
(gdb) display /i $r10
1: x/i $r10
0x1111000a:ldm r3-r15, (r0)
(gdb) display /i $pc
2: x/i $pc
=> 0x901425f4 <xxx_xxm+20>:lrw r7, 0x903BA9CC
(gdb) n
526in xx.c
(gdb)
574in xx.c
2: x/i $pc
=> 0x9014279c <xxx_xxm+444>:mov r7, r2
1: x/i $r10
0x1111000a:movi r4, 86
(gdb)
575in xx.c
2: x/i $pc
=> 0x901427a6 <xxx_xxm+454>:mov r2, r10
1: x/i $r10
0x9700:movi r4, 86
问题很可能出现在源文件xx.c的574行,执行过这一行之后,r10的值修改为0x9700,最后请客户检查代码,确认为野指针。