最近在调用darknet动态库的时候,遇到了一个奇怪的问题,库里面的内容无法传出,但是在库里打印出的结果是正常的,经过仔细排查,发现是由于我们在调用这个libdarknet.a文件编译时候没有添加-DGPU选项,导致地址出现偏移所致。
为了将表述得更清楚,写一个简单的代码来还原问题。
fun.h,内容如下:
#include <iostream> void get_res(struct RESULT *res_); struct RESULT { float *res; #ifdef GPU float *res_gpu; #endif int n; int c; int h; int w; };fun.cpp,主要功能就是给结构体内的成员赋值:
#include "fun.h" void get_res(RESULT *res_) { res_->res = (float*)malloc(10 * sizeof(float)); for (int i = 0; i < 10; i++) { rs_->res[i] = i; } res_->n = 1; res_->c = 2; res_->h = 3; res_->w = 4; }编译动态库 g++ -fPIC -shared -g -DGPU -o libfun.so fun.cpp
编写调用动态库代码:
#include "fun.h" int main(int argc, char **argv) { struct RESULT *result = (struct RESULT*)malloc(sizeof(struct RESULT)); get_res(result); std::cout << "n:" << result->n << std::endl; std::cout << "c:" << result->c << std::endl; std::cout << "h:" << result->h << std::endl; std::cout << "w:" << result->w << std::endl; return 0; }首先使用-DGPU选项编译测试代码:g++ main.cpp -g -Ilib -Llib -lfun -Wl,-rpath lib -DGPU -o main 看下运行结果:
./main n:1 c:2 h:3 w:4输出结果和预期一致。
接下来去掉编译选项-DGPU:g++ main.cpp -g -Ilib -Llib -lfun -Wl,-rpath lib -o main
./main n:0 c:0 h:1 w:2很明显是错的,仔细看貌似h的结果原本是属于n,而w的结果原本是属于c的,根本原因就是-DGPU引起的地址偏移造成的。下面调试一下该代码来验证我们的猜想。
gdb ./main GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./main...done. (gdb) b 6 Breakpoint 1 at 0x400a14: file main.cpp, line 6. (gdb) r Starting program: /home/gcs/work/study/diy_cnn/code/compile_opt/main Breakpoint 1, main (argc=1, argv=0x7fffffffe398) at main.cpp:6 6 get_res(result); (gdb) p &result->n $1 = (int *) 0x613c28 (gdb) p &result->c $2 = (int *) 0x613c2c (gdb) p &result->h $3 = (int *) 0x613c30 (gdb) p &result->w $4 = (int *) 0x613c34 (gdb) s get_res (res_=0x613c20) at fun.cpp:5 5 res_->res = (float*)malloc(10 * sizeof(float)); (gdb) p &res_->n $5 = (int *) 0x613c30 (gdb) p &res_->c $6 = (int *) 0x613c34 (gdb) p &res_->h $7 = (int *) 0x613c38 (gdb) p &res_->w $8 = (int *) 0x613c3c (gdb)从我们打印出的地址可以发现在进入.so之前,n,c,h,w对应的地址分别是
n:0x613c28 c:0x613c2c h:0x613c30 w:0x613c34而进入到.so之后,n,c,h,w的地址变成了:
n:0x613c30 c:0x613c34 h:0x613c38 w:0x613c3c打印出更多的地址信息后,与结构体对应起来
进入库函数之前的result变量地址如下:
result------------------0x613c20 struct RESULT { float *res;---------0x613c20 #ifdef GPU float *res_gpu;-----由于编译选项里没有定义,所以该变量无地址 #endif int n;--------------0x613c28 int c;--------------0x613c2c int h;--------------0x613c30 int w;--------------0x613c34 };进入库函数之后的res_变量地址如下:
res_--------------------0x613c20 struct RESULT { float *res;---------0x613c20 #ifdef GPU float *res_gpu;-----0x613c28 #endif int n;--------------0x613c30 int c;--------------0x613c34 int h;--------------0x613c38 int w;--------------0x613c3c };通过对比地址后,已经很明显了,我们试图获取n的值时,也就是使用0x613c28地址所在值,发现在动态库根本就没有对该地址赋值,所以自然获取到的结果和预期不一样,c也是同理,再看h,当我们在动态库外边获取h的值,也就是0x613c30地址时候,可以看出在动态库里面对应的是n的值,与上述结果吻合。