本文介绍怎么实现一个可用的gofunclatency脚本。
起因
最近对bcc产生了一些兴趣,而我工作语言是golang,因此打算实现一个gofunclancy。有人会问bcc tools里面不是自带了一个funclatency吗?实际上这个工具不能直接应用于golang。我们接下来将从原生的funclatency的原理以及实现讲起,然后看下golang是哪里不适合这个工具,并逐步提出解决方案。
特别心急的小伙伴可以直接看我的实现源码:gofunclatency,中度心急的可以看每个小节的倒数一两段的总结,不急的可以从头到尾看一遍,帮忙找找错漏。
环境
软件 |
版本 |
操作系统 |
Arch Linux |
go |
1.12.7 |
gcc |
9.3.0 |
funclatency实现原理探索
以一个简单的C程序作为范例:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
#include <stdio.h>
#include <stdlib.h>
int wait4input(void){
int ch;
ch = getchar();
while('\n'!=getchar());
return ch;
}
void attack(int i)
{
i++;
}
int main()
{
printf("waiting for gdb\n");
wait4input();
printf("gdb ready\n");
for (int i = 0; i < 1000000; i++) {
attack(i);
}
return 0;
}
|
编译的时候记得禁止任何优化,否则attack会被内联:
1
|
gcc hello.c -O0 -o hello.bin
|
接着打开一个窗口A运行
另外一个窗口B执行
1
|
gdb -p `pgrep hello.bin`
|
在窗口B给i++
那行打个断点,然后continue,再在窗口A随便输入点东西让程序继续。
1
2
3
4
5
6
7
|
(gdb) b hello.c:15
Breakpoint 1 at 0x55c71f396170: file hello.c, line 15.
(gdb) c
Continuing.
Breakpoint 1, attack (i=0) at hello.c:15
15 i++;
|
完成上面的步骤后,在使用funclatency之前,我们先看看原来的代码是怎么样的:
1
2
3
4
5
6
7
8
9
10
|
(gdb) disass
Dump of assembler code for function attack:
0x000055d574cb7169 <+0>: push %rbp
0x000055d574cb716a <+1>: mov %rsp,%rbp
0x000055d574cb716d <+4>: mov %edi,-0x4(%rbp)
=> 0x000055d574cb7170 <+7>: addl $0x1,-0x4(%rbp)
0x000055d574cb7174 <+11>: nop
0x000055d574cb7175 <+12>: pop %rbp
0x000055d574cb7176 <+13>: retq
End of assembler dump.
|
我们反汇编了attack函数,可以看到经典的保存rbp指针,将rsp指针赋予rbp指针的函数栈操作。而edi是用于存储第一个函数的寄存器,我们把这个函数放到临时的栈位置上,然后加1,这和i++
的语义是一样的。最后是恢复rbp指针和返回上级函数。在断点,也就是上面小箭头的位置上,我们可以知道,栈上保存了rbp原来的值,以及返回地址。返回地址应该是main函数调用完attack函数后的下一个指令的地址,我们验证一下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
(gdb) x/8xg $rsp
0x7ffce76e1b40: 0x00007ffce76e1b60 0x000055d574cb71af
0x7ffce76e1b50: 0x00007ffce76e1c50 0x0000000000000000
0x7ffce76e1b60: 0x0000000000000000 0x00007f0fb550a023
0x7ffce76e1b70: 0x00007f0fb57005c0 0x00007ffce76e1c58
(gdb) disass main
Dump of assembler code for function main:
0x000055d574cb7177 <+0>: push %rbp
0x000055d574cb7178 <+1>: mov %rsp,%rbp
0x000055d574cb717b <+4>: sub $0x10,%rsp
0x000055d574cb717f <+8>: lea 0xe7e(%rip),%rdi # 0x55d574cb8004
0x000055d574cb7186 <+15>: callq 0x55d574cb7030 <puts@plt>
0x000055d574cb718b <+20>: callq 0x55d574cb7149 <wait4input>
0x000055d574cb7190 <+25>: lea 0xe7d(%rip),%rdi # 0x55d574cb8014
0x000055d574cb7197 <+32>: callq 0x55d574cb7030 <puts@plt>
0x000055d574cb719c <+37>: movl $0x0,-0x4(%rbp)
0x000055d574cb71a3 <+44>: jmp 0x55d574cb71b3 <main+60>
0x000055d574cb71a5 <+46>: mov -0x4(%rbp),%eax
0x000055d574cb71a8 <+49>: mov %eax,%edi
0x000055d574cb71aa <+51>: callq 0x55d574cb7169 <attack>
0x000055d574cb71af <+56>: addl $0x1,-0x4(%rbp)
0x000055d574cb71b3 <+60>: cmpl $0xf423f,-0x4(%rbp)
0x000055d574cb71ba <+67>: jle 0x55d574cb71a5 <main+46>
0x000055d574cb71bc <+69>: mov $0x0,%eax
0x000055d574cb71c1 <+74>: leaveq
0x000055d574cb71c2 <+75>: retq
End of assembler dump.
|
可以看到0x000055d574cb71af
确实是main函数调用完attack后的下一个指令的地址。接着,我们看下用funclatency测量attack函数的耗时后,程序会出现什么变化,打开窗口C,执行:
1
2
|
[root@archlinux perf]# funclatency `realpath hello.bin`:attack
Tracing 1 functions for "/data/workspace/perf/hello.bin:attack"... Hit Ctrl-C to end.
|
然后在窗口B的gdb终端里再看一下汇编代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x000055d574cb7171 in attack (i=2) at hello.c:15
15 i++;
(gdb) disass
Dump of assembler code for function attack:
0x000055d574cb7169 <+0>: int3
0x000055d574cb716a <+1>: mov %rsp,%rbp
0x000055d574cb716d <+4>: mov %edi,-0x4(%rbp)
0x000055d574cb7170 <+7>: addl $0x1,-0x4(%rbp)
0x000055d574cb7174 <+11>: nop
0x000055d574cb7175 <+12>: pop %rbp
0x000055d574cb7176 <+13>: retq
End of assembler dump.
|
我们先continue一下,进入下一个attack循环断点,发现原来的代码位置0x000055d574cb7169
的push %rbp
指令变成了int3
。int3
是一个软中断,用于给gdb等程序作为断点使用,这里是被funclatency加入了一个断点
。
funclatency既然是计算函数延迟,不会只在函数开始的时候断点,结束的时候也应该有个断点。看管们别急,我们再看一下刚才说的栈返回地址的地方:
1
2
3
4
5
|
(gdb) x/8xg $rsp
0x7ffce76e1b40: 0x00007ffce76e1b60 0x00007fffffffe000
0x7ffce76e1b50: 0x00007ffce76e1c50 0x0000000200000000
0x7ffce76e1b60: 0x0000000000000000 0x00007f0fb550a023
0x7ffce76e1b70: 0x00007f0fb57005c0 0x00007ffce76e1c58
|
我们发现,应该返回的地址0x000055d574cb71af
变成了0x00007fffffffe000
。这个地址里面是什么呢?我们尝试gdb打印一下:
1
2
|
(gdb) x/1xb 0x00007fffffffe000
0x7fffffffe000: Cannot access memory at address 0x7fffffffe000
|
居然看不到,看不到也没关系,我们看下这块内存是干啥的:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
[root@archlinux perf]# cat /proc/`pgrep hello.bin`/maps
55d574cb6000-55d574cb7000 r--p 00000000 08:12 14554273 /data/workspace/perf/hello.bin
55d574cb7000-55d574cb8000 r-xp 00001000 08:12 14554273 /data/workspace/perf/hello.bin
55d574cb8000-55d574cb9000 r--p 00002000 08:12 14554273 /data/workspace/perf/hello.bin
55d574cb9000-55d574cba000 r--p 00002000 08:12 14554273 /data/workspace/perf/hello.bin
55d574cba000-55d574cbb000 rw-p 00003000 08:12 14554273 /data/workspace/perf/hello.bin
55d57530f000-55d575330000 rw-p 00000000 00:00 0 [heap]
7f0fb54e3000-7f0fb5508000 r--p 00000000 08:02 3935586 /usr/lib/libc-2.31.so
7f0fb5508000-7f0fb5654000 r-xp 00025000 08:02 3935586 /usr/lib/libc-2.31.so
7f0fb5654000-7f0fb569f000 r--p 00171000 08:02 3935586 /usr/lib/libc-2.31.so
7f0fb569f000-7f0fb56a2000 r--p 001bb000 08:02 3935586 /usr/lib/libc-2.31.so
7f0fb56a2000-7f0fb56a5000 rw-p 001be000 08:02 3935586 /usr/lib/libc-2.31.so
7f0fb56a5000-7f0fb56ab000 rw-p 00000000 00:00 0
7f0fb56d5000-7f0fb56d7000 r--p 00000000 08:02 3935574 /usr/lib/ld-2.31.so
7f0fb56d7000-7f0fb56f7000 r-xp 00002000 08:02 3935574 /usr/lib/ld-2.31.so
7f0fb56f7000-7f0fb56ff000 r--p 00022000 08:02 3935574 /usr/lib/ld-2.31.so
7f0fb5700000-7f0fb5701000 r--p 0002a000 08:02 3935574 /usr/lib/ld-2.31.so
7f0fb5701000-7f0fb5702000 rw-p 0002b000 08:02 3935574 /usr/lib/ld-2.31.so
7f0fb5702000-7f0fb5703000 rw-p 00000000 00:00 0
7ffce76c2000-7ffce76e3000 rw-p 00000000 00:00 0 [stack]
7ffce77d1000-7ffce77d5000 r--p 00000000 00:00 0 [vvar]
7ffce77d5000-7ffce77d7000 r-xp 00000000 00:00 0 [vdso]
7fffffffe000-7ffffffff000 --xp 00000000 00:00 0 [uprobes]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
|
可以看到,这块内存是在[uprobe]
这个区域,根据get_trampoline_vaddr,这是一个跳板区域,在函数返回之后,先执行这个跳板区域的程序,然后再返回原来应该返回的地址。
至此,我们知道了funclatency是:
- 在函数开始的地方加了个
int3
软中断,执行一段程序;
- 修改栈中保存的返回地址,改为
[uprobes]
里面的一个跳板区域,执行一段特殊程序后正常返回原地址。
第一点就是uprobe的原理,第二点就是uretprobe的原理。这里我们暂不探讨kprobe和kretprobe。
我们已经初步知道所谓断点
的机制,下面我们将通过阅读funclatency源码,了解程序在断点
中注入了什么(下面都是摘抄片段,有需要看完整源码请看funclatency):
1
2
3
4
|
b.attach_uprobe(name=library, sym_re=pattern, fn_name="trace_func_entry",
pid=args.pid or -1)
b.attach_uretprobe(name=library, sym_re=pattern,
fn_name="trace_func_return", pid=args.pid or -1)
|
其中library为程序路径或者库名,pattern是函数的正则表达式,我们往这个程序的这些函数里面注入uprobe和uretprobe。uprobe里面是执行了一个trace_func_entry
函数:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
BPF_HASH(start, u32);
STORAGE
int trace_func_entry(struct pt_regs *ctx)
{
u64 pid_tgid = bpf_get_current_pid_tgid();
u32 pid = pid_tgid;
u32 tgid = pid_tgid >> 32;
u64 ts = bpf_ktime_get_ns();
FILTER
ENTRYSTORE
start.update(&pid, &ts);
return 0;
}
|
其中bpf_get_current_pid_tgid是获取进程的pid和tgid,这是内核的概念,内核的pid相当于用户空间的线程id,内核的tgid才是用户空间的进程id,为了避免混乱,下面我们分别以用户角度简称线程id
和进程id
。可以看到,我们以线程id作为键,存下了开始的纳秒时间(bpf_ktime_get_ns)。
然后我们看下uretprobe的trace_func_return
函数干了啥:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
int trace_func_return(struct pt_regs *ctx)
{
u64 *tsp, delta;
u64 pid_tgid = bpf_get_current_pid_tgid();
u32 pid = pid_tgid;
u32 tgid = pid_tgid >> 32;
// calculate delta time
tsp = start.lookup(&pid);
if (tsp == 0) {
return 0; // missed start
}
delta = bpf_ktime_get_ns() - *tsp; //重点关注
start.delete(&pid);
FACTOR
// store as histogram
STORE
return 0;
}
|
可以看到,我们先用当前的线程id查到开始时间,然后用现在时间减去开始时间得到delta,以纳秒计算的时间差,这就是这个函数的延迟。有人可能会有疑惑,那delta没看到哪里用呀?其实STORE是一个宏,负责存储:
1
2
3
|
bpf_text = bpf_text.replace('STORAGE', 'BPF_HISTOGRAM(dist);')
bpf_text = bpf_text.replace('STORE',
'dist.increment(bpf_log2l(delta));')
|
我们把值存到了一个dist这个直方图类型里面,最后是打印出来:
1
|
dist.print_log2_hist(label)
|
至此,funclatency的uprobe部分在我们眼里再无秘密。总结一下就是,funclatency相当于在函数开始的地方加了一个uprobe,也就是替换第一个指令为int3,进入中断处理函数,在中断处理函数中用线程id
作为键,存了开始的时间t1
;然后加个uretprobe,也就是修改栈上保存的返回地址,指向一个跳板区域,跳板区域里面用线程id
查找到开始时间t1
, 用现在的时间t2-t1
,得到延迟时间t'
,最后打印时间分布直方图。
原生funclatency的原理我们很清楚了,下面我们将探索funclatency不适用于go的原因。
go的线程id之困
老规矩,先来个小程序提神醒脑,提出问题:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
package main
/*
#include <sys/types.h>
extern pid_t gettid(void);
*/
import "C"
import (
"fmt"
"time"
)
func bark(prefix string) {
fmt.Printf("[%s] tid is %d\n", prefix, C.gettid())
time.Sleep(time.Second)
fmt.Printf("[%s] tid is %d\n", prefix, C.gettid())
}
func main() {
go bark("b")
bark("a")
}
|
多跑几次
1
2
3
4
5
6
7
8
|
[root@archlinux perf]# go run gotid.go
[a] tid is 480820
[b] tid is 480822
[a] tid is 480823
[root@archlinux perf]# go run gotid.go
[a] tid is 480923
[b] tid is 480926
[a] tid is 480927
|
我们发现,线程id居然在函数a中间变了,这在普通的多线程C/C++函数里面难以想象,这也导致了funclatency脚本以tid作为键存储函数开始时间变得不可行。很自然的,为了测量函数的延迟,我们需要找另一个键,一个能表示这个函数的这次运行的唯一的键。比如goroutine id。那么,怎么获取goroutine id呢?(以下简称goid)
遗憾的是,经过一番查找,目前go官方没有暴露任何获得goid的方法,但是由于go是开源的,我们可以看源码。经过看源码,我们发现goid存储在G结构体中:
1
2
3
4
5
|
type g struct {
...省略N行无关代码...
goid int64
...省略N行无关代码...
}
|
而在runtime包里面,获取当前的g是通过如下方法:
1
2
3
4
|
// getg returns the pointer to the current g.
// The compiler rewrites calls to this function into instructions
// that fetch the g directly (from TLS or from the dedicated register).
func getg() *g
|
难以置信,不知道是不是老眼昏花,笔者我居然找不到这个函数的实现,看来只能是按照这个函数的注释,是编译器把这个调用直接转成指令了。不过既然能编译出来,那我们找个调用了getg函数的比较短的父函数反汇编一下。在这里我们找了runtime.showframe
函数:
1
2
3
4
5
6
7
8
9
|
// showframe reports whether the frame with the given characteristics should
// be printed during a traceback.
func showframe(f funcInfo, gp *g, firstFrame bool, funcID, childID funcID) bool {
g := getg()
if g.m.throwing > 0 && gp != nil && (gp == g.m.curg || gp == g.m.caughtsig.ptr()) {
return true
}
return showfuncinfo(f, firstFrame, funcID, childID)
}
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
(gdb) disass 'runtime.showframe'
Dump of assembler code for function runtime.showframe:
0x0000000000449df0 <+0>: mov %fs:0xfffffffffffffff8,%rcx
0x0000000000449df9 <+9>: cmp 0x10(%rcx),%rsp
0x0000000000449dfd <+13>: jbe 0x449ea8 <runtime.showframe+184>
0x0000000000449e03 <+19>: sub $0x28,%rsp
0x0000000000449e07 <+23>: mov %rbp,0x20(%rsp)
0x0000000000449e0c <+28>: lea 0x20(%rsp),%rbp
0x0000000000449e11 <+33>: mov %fs:0xfffffffffffffff8,%rax
0x0000000000449e1a <+42>: mov 0x30(%rax),%rax
0x0000000000449e1e <+46>: cmpl $0x0,0xf4(%rax)
0x0000000000449e25 <+53>: jle 0x449e43 <runtime.showframe+83>
0x0000000000449e27 <+55>: mov 0x40(%rsp),%rcx
0x0000000000449e2c <+60>: test %rcx,%rcx
0x0000000000449e2f <+63>: je 0x449e43 <runtime.showframe+83>
0x0000000000449e31 <+65>: cmp %rcx,0xc0(%rax)
0x0000000000449e38 <+72>: jne 0x449e98 <runtime.showframe+168>
0x0000000000449e3a <+74>: mov $0x1,%eax
0x0000000000449e3f <+79>: test %al,%al
0x0000000000449e41 <+81>: jne 0x449e89 <runtime.showframe+153>
0x0000000000449e43 <+83>: mov 0x30(%rsp),%rax
0x0000000000449e48 <+88>: mov %rax,(%rsp)
0x0000000000449e4c <+92>: mov 0x38(%rsp),%rax
0x0000000000449e51 <+97>: mov %rax,0x8(%rsp)
0x0000000000449e56 <+102>: movzbl 0x48(%rsp),%eax
0x0000000000449e5b <+107>: mov %al,0x10(%rsp)
0x0000000000449e5f <+111>: movzbl 0x49(%rsp),%eax
0x0000000000449e64 <+116>: mov %al,0x11(%rsp)
0x0000000000449e68 <+120>: movzbl 0x4a(%rsp),%eax
0x0000000000449e6d <+125>: mov %al,0x12(%rsp)
0x0000000000449e71 <+129>: callq 0x449ec0 <runtime.showfuncinfo>
0x0000000000449e76 <+134>: movzbl 0x18(%rsp),%eax
0x0000000000449e7b <+139>: mov %al,0x50(%rsp)
0x0000000000449e7f <+143>: mov 0x20(%rsp),%rbp
0x0000000000449e84 <+148>: add $0x28,%rsp
0x0000000000449e88 <+152>: retq
0x0000000000449e89 <+153>: movb $0x1,0x50(%rsp)
0x0000000000449e8e <+158>: mov 0x20(%rsp),%rbp
0x0000000000449e93 <+163>: add $0x28,%rsp
0x0000000000449e97 <+167>: retq
0x0000000000449e98 <+168>: mov 0xc8(%rax),%rdx
0x0000000000449e9f <+175>: cmp %rcx,%rdx
0x0000000000449ea2 <+178>: sete %al
0x0000000000449ea5 <+181>: nop
0x0000000000449ea6 <+182>: jmp 0x449e3f <runtime.showframe+79>
0x0000000000449ea8 <+184>: callq 0x451470 <runtime.morestack_noctxt>
0x0000000000449ead <+189>: jmpq 0x449df0 <runtime.showframe>
End of assembler dump.
|
哎,这哗啦啦的一堆汇编代码,搞得像电视里面特工的电脑画面一样。那么一堆,那么getg的汇编代码在哪里?别急,我们稍微扫一眼汇编,runtime.showfuncinfo
并没有被内联进来,这大大降低了我们的阅读难度;其次g.m.throwing > 0
这个真是令人感动,我们只要找到立即数0就好,果然,一眼就发现cmpl $0x0,0xf4(%rax)
,那么0xf4(%rax)
明显是throwing
,我们合理推测%rax
里面是g.m
的地址,看代码验证一下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
type m struct {
g0 *g // goroutine with scheduling stack
morebuf gobuf // gobuf arg to morestack
divmod uint32 // div/mod denominator for arm - known to liblink
// Fields not known to debuggers.
procid uint64 // for debuggers, but offset not hard-coded
gsignal *g // signal-handling g
goSigStack gsignalStack // Go-allocated signal handling stack
sigmask sigset // storage for saved signal mask
tls [6]uintptr // thread-local storage (for x86 extern register)
mstartfn func()
curg *g // current running goroutine
caughtsig guintptr // goroutine running during fatal signal
p puintptr // attached p for executing go code (nil if not executing go code)
nextp puintptr
oldp puintptr // the p that was attached before executing a syscall
id int64
mallocing int32
throwing int32
...此处省略N行代码...
}
|
根据内存对齐规则,我们算了一下throwing的偏移,确实是0xf4
。那么根据再前一行代码mov 0x30(%rax),%rax
,我们猜测这里%rax
里面存的是g
,一样的办法,我们检查g
结构体中m
的偏移:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
type g struct {
// Stack parameters.
// stack describes the actual stack memory: [stack.lo, stack.hi).
// stackguard0 is the stack pointer compared in the Go stack growth prologue.
// It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption.
// stackguard1 is the stack pointer compared in the C stack growth prologue.
// It is stack.lo+StackGuard on g0 and gsignal stacks.
// It is ~0 on other goroutine stacks, to trigger a call to morestackc (and crash).
stack stack // offset known to runtime/cgo
stackguard0 uintptr // offset known to liblink
stackguard1 uintptr // offset known to liblink
_panic *_panic // innermost panic - offset known to liblink
_defer *_defer // innermost defer
m *m // current m; offset known to arm liblink
...此处省略N行代码...
}
|
确实是0x30
,那我们很容易知道mov %fs:0xfffffffffffffff8,%rax
,这句代码就是getg。%fs
是x86的一个段寄存器,主要用来存储TLS(Thread Local Storage),就是线程本地变量。但是,对于funclatency程序来说,看上去我们能获取的只有struct pt_regs
变量:
1
2
3
4
|
int trace_func_entry(struct pt_regs *ctx)
{
...省略...
}
|
结构体定义在<uapi/asm/ptrace.h>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
struct pt_regs {
/*
* C ABI says these regs are callee-preserved. They aren't saved on kernel entry
* unless syscall needs a complete, fully filled "struct pt_regs".
*/
unsigned long r15;
unsigned long r14;
unsigned long r13;
unsigned long r12;
unsigned long rbp;
unsigned long rbx;
/* These regs are callee-clobbered. Always saved on kernel entry. */
unsigned long r11;
unsigned long r10;
unsigned long r9;
unsigned long r8;
unsigned long rax;
unsigned long rcx;
unsigned long rdx;
unsigned long rsi;
unsigned long rdi;
/*
* On syscall entry, this is syscall#. On CPU exception, this is error code.
* On hw interrupt, it's IRQ number:
*/
unsigned long orig_rax;
/* Return frame for iretq */
unsigned long rip;
unsigned long cs;
unsigned long eflags;
unsigned long rsp;
unsigned long ss;
/* top of stack page */
};
|
没有fs寄存器,但是经过查找,我们发现内核thread_struct结构体里面有:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
// arch/x86/include/asm/processor.h
struct thread_struct {
...省略N行无关代码...
unsigned long fsbase;
...省略N行无关代码...
}
// include/linux/sched.h
struct task_struct {
...省略N行无关代码...
/* CPU-specific state of this task: */
struct thread_struct thread;
...省略N行无关代码...
}
|
对于golang来说,根据go调度,参考golang的runtime包内的代码,我们可以找出fs每次启动一个新线程设置fs寄存器的路径是:
1
|
startm->newm->newm1->newosproc->clone->settls->[系统调用arch_prctl(ARCH_SET_FS, ...)]
|
而系统调用arch_prctl,会修改task_struct->thread.fsbase。那这个问题就简单了,我们可以写一个获取goid的函数:
1
2
3
4
5
6
7
8
9
10
|
static u64 get_goid()
{
struct task_struct *t = (struct task_struct *)bpf_get_current_task();
void* fsbase = (void*)t->thread.fsbase;
void* g;
bpf_probe_read(&g, sizeof(g), fsbase-8);
u64 goid;
bpf_probe_read(&goid, sizeof(goid), g+152);
return goid;
}
|
其中fsbase-8
就是%fs:0xfffffffffffffff8
,我们先读取g
的指针,然后g+152
是goid
在g
中的偏移。这里有一些风险,go以后的版本里面goid的位置可能不对,需要使用者注意并处理
目前为止,x86-64键的问题我们算是比较好地解决了。总而言之,go里面,一个函数运行过程中可能会切换到其它线程运行,不能以线程id作为键关联开始和结束时间。我们这里选择了goid,并探讨了在x86-64系统中获得goid的方法。
go的栈扩展之殇
虽然实现了goid的获取,的确让人松了一口气,但是革命尚未成功,问题并不止一个。先来一段程序引入这小节的问题:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
package main
import "fmt"
func joke(i int) {
var arr [2048]int
arr[i] = i
fmt.Println(arr)
}
func main() {
for i := 0; i < 3; i++ {
joke(i)
}
}
|
很简单的程序,简单到编译器都内联了,我们禁止内联:
1
|
go build -gcflags=-l stack.go
|
我们先试着跑一下原来的funclatency
脚本,试一下测量main.joke
函数的延迟:
1
|
./funclatency '/data/workspace/perf/stack/stack:main.joke'
|
然后启动我们的stack例子程序:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
|
[root@archlinux stack]# ./stack
runtime: unexpected return pc for main.joke called from 0x7fffffffe000
stack: frame={sp:0xc00009cf70, fp:0xc00009cf78} stack=[0xc00009c000,0xc00009d000)
000000c00009ce70: 0000000000000000 0000000000000000
000000c00009ce80: 0000000000000000 0000000000000000
000000c00009ce90: 0000000000000000 0000000000000000
000000c00009cea0: 0000000000000000 0000000000000000
000000c00009ceb0: 0000000000000000 0000000000000000
000000c00009cec0: 0000000000000000 0000000000000000
000000c00009ced0: 0000000000000000 0000000000000000
000000c00009cee0: 0000000000000000 0000000000000000
000000c00009cef0: 0000000000000000 0000000000000000
000000c00009cf00: 0000000000000000 0000000000000000
000000c00009cf10: 0000000000000000 0000000000000000
000000c00009cf20: 0000000000000000 0000000000000000
000000c00009cf30: 0000000000000000 0000000000000000
000000c00009cf40: 0000000000000000 0000000000000000
000000c00009cf50: 0000000000000000 0000000000000000
000000c00009cf60: 0000000000000000 0000000000000000
000000c00009cf70: <00007fffffffe000 >0000000000000000
000000c00009cf80: 0000000000000000 000000c000042790
000000c00009cf90: 00000000004298fc <runtime.main+524> 000000c000080000
000000c00009cfa0: 0000000000000000 000000c000080000
000000c00009cfb0: 0000000000000000 0000000000000000
000000c00009cfc0: 0000000000000000 000000c000000180
000000c00009cfd0: 0000000000000000 0000000000451181 <runtime.goexit+1>
000000c00009cfe0: 0000000000000000 0000000000000000
000000c00009cff0: 0000000000000000 0000000000000000
fatal error: unknown caller pc
runtime stack:
runtime.throw(0x4ba8d3, 0x11)
/usr/local/go/src/runtime/panic.go:617 +0x72
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000000180, 0x0, 0x0, 0x7fffffff, 0x4bff80, 0x7ffea3d65f78, 0x0, ...)
/usr/local/go/src/runtime/traceback.go:275 +0x1cd1
runtime.copystack(0xc000000180, 0x1000, 0x7fab4e901001)
/usr/local/go/src/runtime/stack.go:881 +0x25b
runtime.newstack()
/usr/local/go/src/runtime/stack.go:1050 +0x2fd
runtime.morestack()
/usr/local/go/src/runtime/asm_amd64.s:429 +0x8f
goroutine 1 [copystack]:
runtime: unexpected return pc for main.joke called from 0x7fffffffe000
stack: frame={sp:0xc00009cf70, fp:0xc00009cf78} stack=[0xc00009c000,0xc00009d000)
000000c00009ce70: 0000000000000000 0000000000000000
000000c00009ce80: 0000000000000000 0000000000000000
000000c00009ce90: 0000000000000000 0000000000000000
000000c00009cea0: 0000000000000000 0000000000000000
000000c00009ceb0: 0000000000000000 0000000000000000
000000c00009cec0: 0000000000000000 0000000000000000
000000c00009ced0: 0000000000000000 0000000000000000
000000c00009cee0: 0000000000000000 0000000000000000
000000c00009cef0: 0000000000000000 0000000000000000
000000c00009cf00: 0000000000000000 0000000000000000
000000c00009cf10: 0000000000000000 0000000000000000
000000c00009cf20: 0000000000000000 0000000000000000
000000c00009cf30: 0000000000000000 0000000000000000
000000c00009cf40: 0000000000000000 0000000000000000
000000c00009cf50: 0000000000000000 0000000000000000
000000c00009cf60: 0000000000000000 0000000000000000
000000c00009cf70: <00007fffffffe000 >0000000000000000
000000c00009cf80: 0000000000000000 000000c000042790
000000c00009cf90: 00000000004298fc <runtime.main+524> 000000c000080000
000000c00009cfa0: 0000000000000000 000000c000080000
000000c00009cfb0: 0000000000000000 0000000000000000
000000c00009cfc0: 0000000000000000 000000c000000180
000000c00009cfd0: 0000000000000000 0000000000451181 <runtime.goexit+1>
000000c00009cfe0: 0000000000000000 0000000000000000
000000c00009cff0: 0000000000000000 0000000000000000
main.joke(0x0)
/data/workspace/perf/stack/stack.go:5 +0x10e fp=0xc00009cf78 sp=0xc00009cf70 pc=0x48739e
|
这么简单的程序居然崩溃了,但我们只要不用funclatency,这个程序就不会崩溃。这个在生产环境中是无法接受的,也有悖于ebpf程序活体x光的名声,这相当于一个医生诊断一个病人,结果病人被诊断死了。废话少说,赶紧查查。根据go打印的内容,另外我们还知道问题出在go将旧栈复制到新栈的过程中:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
|
//文件在$GOROOT/src/runtime/stack.go
func copystack(gp *g, newsize uintptr, sync bool) {
...此处省略N行代码...
// Adjust pointers in the new stack.
gentraceback(^uintptr(0), ^uintptr(0), 0, gp, 0, nil, 0x7fffffff, adjustframe, noescape(unsafe.Pointer(&adjinfo)), 0)
...此处省略N行代码...
}
//文件在GOROOT/src/runtime/traceback.go
func gentraceback(pc0, sp0, lr0 uintptr, gp *g, skip int, pcbuf *uintptr, max int, callback func(*stkframe, unsafe.Pointer) bool, v unsafe.Pointer, flags uint) int {
...此处省略N行代码...
flr = findfunc(frame.lr)
if !flr.valid() {
// This happens if you get a profiling interrupt at just the wrong time.
// In that context it is okay to stop early.
// But if callback is set, we're doing a garbage collection and must
// get everything, so crash loudly.
doPrint := printing
if doPrint && gp.m.incgo && f.funcID == funcID_sigpanic {
// We can inject sigpanic
// calls directly into C code,
// in which case we'll see a C
// return PC. Don't complain.
doPrint = false
}
if callback != nil || doPrint {
print("runtime: unexpected return pc for ", funcname(f), " called from ", hex(frame.lr), "\n")
tracebackHexdump(gp.stack, &frame, lrPtr)
}
if callback != nil {
throw("unknown caller pc") //错误在这里
}
}
...此处省略N行代码...
}
|
copystack
通过gentraceback
调整栈变量的值,但是findfunc
函数找不到uretprobe修改的返回地址代表的函数,也就是上面代码的frame.lr
,go打印了这个地址0x7fffffffe000
,因此崩溃。所以golang在发生扩展栈操作的时候,使用funclatency
脚本测量函数延迟分布是有风险的。有没有更安全的方法,在这种情况下不会导致程序崩溃。
经过一番查找,我们发现issue1320,这里有人提出了一种新的思路,给函数的ret
指令设置uprobe来代替uretprobe。这里有两个问题:
- 不能用于进行了尾调用优化的函数,但是go目前还没;
- 需要在脚本中找到指令ret,这对使用了变长指令集的x86_64来说,需要一些功夫。
第1个问题庆幸的是不存在,但是需要留意有这么个未来的可能的坑。第2个我们可以借助外部工具如gdb解决。