调试挂起的Go程序的技巧
摘要
一份实用指南,涵盖了调试挂起的Go程序的三种方法:使用SIGQUIT打印堆栈跟踪、附加delve调试器以及保存核心转储供后续分析。
<p>我正帮助某人设置我的 <a href="https://github.com/gokrazy/rsync">gokrazy/rsync</a>
实现,以同步 <a href="https://en.wikipedia.org/wiki/Resource_Public_Key_Infrastructure">RPKI
数据</a>(用于保护 BGP 路由基础设施),这时我们发现,在正确的调用下,我的 rsync 接收器会无限期挂起。</p>
<p>这个问题解决起来很快,但在这个过程中,我意识到我应该写下一些多年来我逐渐欣赏的 Go 调试技巧!</p>
<h2 id="scenario-hanging-go-program">场景:挂起的 Go 程序</h2>
<p>如果你想跟着操作,可以通过构建 bug 修复提交之前的旧版本 gokrazy/rsync 来重现问题(你需要 <a href="https://go.dev/dl/">Go 1.22 或更新版本</a>):</p>
<pre tabindex="0"><code>git clone https://github.com/gokrazy/rsync
cd rsync
git reset --hard 6c89d4dda3be055f19684c0ed56d623da458194e^
go install ./cmd/...
</code></pre><p>现在我们可以尝试同步仓库:</p>
<pre tabindex="0"><code>% gokr-rsync \
-rtO \
--delete \
rsync://rsync.paas.rpki.ripe.net/repository/ \
/tmp/rpki-repo
[…]
2025/02/08 09:35:10 Opening TCP connection to rsync.paas.rpki.ripe.net:873
2025/02/08 09:35:10 rsync module "repo", path "repo/"
2025/02/08 09:35:10 (Client) Protocol versions: remote=31, negotiated=27
2025/02/08 09:35:10 Client checksum: md4
2025/02/08 09:35:10 sending daemon args: [--server --sender -tr . repo/]
2025/02/08 09:35:10 exclusion list sent
2025/02/08 09:35:10 receiving file list
2025/02/08 09:35:11 [Receiver] i=0 ? . mode=40755 len=4096 uid=0 gid=0 flags=?
[…]
2025/02/08 09:35:11 [Receiver] i=89 ? clonoth/1/3139332e33322e3130302e302f32342d3234203d3e203537313936.roa mode=100644 len=1747 uid=0 gid=0 flags=?
</code></pre><p>……然后程序就停在那里了。</p>
<h2 id="sigquit-stack-trace">技巧 1:按 Ctrl+\(SIGQUIT)打印堆栈跟踪</h2>
<p>查看 Go 程序挂起位置的最简单方法是按 <code>Ctrl+\</code>(反斜杠)让终端向其发送 <a href="https://en.wikipedia.org/wiki/Signal_(IPC)#SIGQUIT">SIGQUIT 信号</a>。当 Go 运行时收到 <code>SIGQUIT</code> 时,它会在退出进程之前向终端打印堆栈跟踪。此行为默认启用,可通过 <code>GOTRACEBACK</code> 环境变量自定义,请参阅 <a href="https://pkg.go.dev/runtime"><code>runtime</code> 包文档</a>。</p>
<p>以下是我们案例中的输出样子。我把字体调小了,这样你可以识别输出的形状(细节不重要,继续阅读下文):</p>
<div style="font-size: 60%">
<pre tabindex="0"><code>^\SIGQUIT: quit
PC=0x47664e m=0 sigcode=128
goroutine 0 gp=0x6e6020 m=0 mp=0x6e6ec0 [idle]:
internal/runtime/syscall.Syscall6()
/home/michael/sdk/go1.23.0/src/internal/runtime/syscall/asm_linux_amd64.s:36 +0xe fp=0x7ffc58665090 sp=0x7ffc58665088 pc=0x47664e
internal/runtime/syscall.EpollWait(0x586651e0?, {0x7ffc5866511c?, 0x3000000018?, 0x7ffc586651f0?}, 0x58665110?, 0x7ffc?)
/home/michael/sdk/go1.23.0/src/internal/runtime/syscall/syscall_linux.go:32 +0x45 fp=0x7ffc586650e0 sp=0x7ffc58665090 pc=0x4765e5
runtime.netpoll(0xc0000000c0?)
/home/michael/sdk/go1.23.0/src/runtime/netpoll_epoll.go:116 +0xd2 fp=0x7ffc58665768 sp=0x7ffc586650e0 pc=0x432332
runtime.findRunnable()
/home/michael/sdk/go1.23.0/src/runtime/proc.go:3580 +0x8c5 fp=0x7ffc586658e0 sp=0x7ffc58665768 pc=0x43f045
runtime.schedule()
/home/michael/sdk/go1.23.0/src/runtime/proc.go:3995 +0xb1 fp=0x7ffc58665918 sp=0x7ffc586658e0 pc=0x4405b1
runtime.park_m(0xc0000061c0)
/home/michael/sdk/go1.23.0/src/runtime/proc.go:4102 +0x1eb fp=0x7ffc58665970 sp=0x7ffc58665918 pc=0x4409cb
runtime.mcall()
/home/michael/sdk/go1.23.0/src/runtime/asm_amd64.s:459 +0x4e fp=0x7ffc58665988 sp=0x7ffc58665970 pc=0x470e2e
goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0x452658?, 0x0?, 0x98?, 0xb3?, 0xb?)
/home/michael/sdk/go1.23.0/src/runtime/proc.go:424 +0xce fp=0xc0000eb358 sp=0xc0000eb338 pc=0x46bc0e
runtime.netpollblock(0x4a01b8?, 0x4058e6?, 0x0?)
/home/michael/sdk/go1.23.0/src/runtime/netpoll.go:575 +0xf7 fp=0xc0000eb390 sp=0xc0000eb358 pc=0x4318f7
internal/poll.runtime_pollWait(0x7ef586628808, 0x72)
/home/michael/sdk/go1.23.0/src/runtime/netpoll.go:351 +0x85 fp=0xc0000eb3b0 sp=0xc0000eb390 pc=0x46af05
internal/poll.(*pollDesc).wait(0xc0000ce180?, 0xc00020e99c?, 0x0)
/home/michael/sdk/go1.23.0/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000eb3d8 sp=0xc0000eb3b0 pc=0x4b0ce7
internal/poll.(*pollDesc).waitRead(...)
/home/michael/sdk/go1.23.0/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0000ce180, {0xc00020e99c, 0x4, 0x4})
/home/michael/sdk/go1.23.0/src/internal/poll/fd_unix.go:165 +0x27a fp=0xc0000eb470 sp=0xc0000eb3d8 pc=0x4b17da
net.(*netFD).Read(0xc0000ce180, {0xc00020e99c?, 0x6eeea0?, 0x1?})
/home/michael/sdk/go1.23.0/src/net/fd_posix.go:55 +0x25 fp=0xc0000eb4b8 sp=0xc0000eb470 pc=0x4f7e85
net.(*conn).Read(0xc000206000, {0xc00020e99c?, 0xc000212000?, 0x6e6ec0?})
/home/michael/sdk/go1.23.0/src/net/net.go:189 +0x45 fp=0xc0000eb500 sp=0xc0000eb4b8 pc=0x5001a5
net.(*TCPConn).Read(0x0?, {0xc00020e99c?, 0xc0000eb568?, 0x46d449?})
<autogenerated>:1 +0x25 fp=0xc0000eb530 sp=0xc0000eb500 pc=0x50bb25
io.ReadAtLeast({0x5d9640, 0xc000206000}, {0xc00020e99c, 0x4, 0x4}, 0x4)
/home/michael/sdk/go1.23.0/src/io/io.go:335 +0x90 fp=0xc0000eb578 sp=0xc0000eb530 pc=0x4957d0
io.ReadFull(...)
/home/michael/sdk/go1.23.0/src/io/io.go:354
encoding/binary.Read({0x5d9640, 0xc000206000}, {0x5da8b0, 0x7059a0}, {0x55e7c0, 0xc0000eb6a0})
/home/michael/sdk/go1.23.0/src/encoding/binary/binary.go:244 +0xa5 fp=0xc0000eb670 sp=0xc0000eb578 pc=0x5102a5
github.com/gokrazy/rsync/internal/rsyncwire.(*MultiplexReader).ReadMsg(0xc00020a100)
/home/michael/kr/rsync/internal/rsyncwire/wire.go:50 +0x48 fp=0xc0000eb6e8 sp=0xc0000eb670 pc=0x514428
github.com/gokrazy/rsync/internal/rsyncwire.(*MultiplexReader).Read(0x7ef5869b9a68?, {0xc000280000, 0x40000, 0x4dd4fb?})
/home/michael/kr/rsync/internal/rsyncwire/wire.go:72 +0x2f fp=0xc0000eb788 sp=0xc0000eb6e8 pc=0x5145af
bufio.(*Reader).Read(0xc0002020c0, {0xc00020e998, 0x4, 0x40ece5?})
/home/michael/sdk/go1.23.0/src/bufio/bufio.go:241 +0x197 fp=0xc0000eb7c0 sp=0xc0000eb788 pc=0x4d5a57
io.ReadAtLeast({0x5d93e0, 0xc0002020c0}, {0xc00020e998, 0x4, 0x4}, 0x4)
/home/michael/sdk/go1.23.0/src/io/io.go:335 +0x90 fp=0xc0000eb808 sp=0xc0000eb7c0 pc=0x4957d0
io.ReadFull(...)
/home/michael/sdk/go1.23.0/src/io/io.go:354
github.com/gokrazy/rsync/internal/rsyncwire.(*Conn).ReadInt32(0xc000208060)
/home/michael/kr/rsync/internal/rsyncwire/wire.go:163 +0x4a fp=0xc0000eb850 sp=0xc0000eb808 pc=0x51490a
github.com/gokrazy/rsync/internal/receiver.(*Transfer).recvIdMapping1(0xc000202120, 0x5a9b58)
/home/michael/kr/rsync/internal/receiver/uidlist.go:16 +0x3d fp=0xc0000eb8c0 sp=0xc0000eb850 pc=0x51fc7d
github.com/gokrazy/rsync/internal/receiver.(*Transfer).RecvIdList(0xc000202120)
/home/michael/kr/rsync/internal/receiver/uidlist.go:52 +0x1dd fp=0xc0000eba08 sp=0xc0000eb8c0 pc=0x51ffbd
github.com/gokrazy/rsync/internal/receiver.(*Transfer).ReceiveFileList(0xc000202120)
/home/michael/kr/rsync/internal/receiver/flist.go:229 +0x378 fp=0xc0000ebb10 sp=0xc0000eba08 pc=0x51c5b8
github.com/gokrazy/rsync/internal/receivermaincmd.clientRun({{0x5d9280, 0xc000078058}, {0x5d92a0, 0xc000078060}, {0x5d92a0, 0xc000078068}}, 0xc0000d0d90, {0x7ef53d47efc8, 0xc000206000}, {0x7ffc5866600e, ...}, ...)
/home/michael/kr/rsync/internal/receivermaincmd/receivermaincmd.go:341 +0x5cd fp=0xc0000ebc10 sp=0xc0000ebb10 pc=0x550c2d
github.com/gokrazy/rsync/internal/receivermaincmd.socketClient({{0x5d9280, 0xc000078058}, {0x5d92a0, 0xc000078060}, {0x5d92a0, 0xc000078068}}, 0xc0000d0d90, {0x7ffc58665ff4?, 0x1?}, {0x7ffc5866600e, ...})
/home/michael/kr/rsync/internal/receivermaincmd/clientserver.g
查看缓存全文
缓存时间: 2026/05/16 03:34
# 调试挂起的Go程序的技巧
来源:https://michael.stapelberg.ch/posts/2025-02-27-debug-hanging-go-programs/
目录
- [场景:挂起的Go程序](#场景挂起的go程序)
- [技巧1:按Ctrl+\(SIGQUIT)打印堆栈跟踪](#技巧1按ctrl-sigquit打印堆栈跟踪)
- [技巧2:将delve调试器附加到进程](#技巧2将delve调试器附加到进程)
- [技巧3:保存核心转储以供以后分析](#技巧3保存核心转储以供以后分析)
- [结论](#结论)
我之前在帮别人配置我的[gokrazy/rsync](https://github.com/gokrazy/rsync)实现,用于同步[RPKI数据](https://en.wikipedia.org/wiki/Resource_Public_Key_Infrastructure)(用于保护BGP路由基础设施),这时我们发现,使用正确的调用方式时,我的rsync接收端会无限期挂起。这个问题解决得很快,但在此过程中,我意识到应该记录下一些多年来积累的Go调试技巧!
## 场景:挂起的Go程序
如果你想跟着操作,可以构建一个旧版本的gokrazy/rsync,刚好在bug修复提交之前(你需要[Go 1.22或更新版本](https://go.dev/dl/)):
```
git clone https://github.com/gokrazy/rsync
cd rsync
git reset --hard 6c89d4dda3be055f19684c0ed56d623da458194e^
go install ./cmd/...
```
现在我们可以尝试同步该仓库:
```
% gokr-rsync \
-rtO \
--delete \
rsync://rsync.paas.rpki.ripe.net/repository/ \
/tmp/rpki-repo
[...]
2025/02/08 09:35:10 Opening TCP connection to rsync.paas.rpki.ripe.net:873
2025/02/08 09:35:10 rsync module "repo", path "repo/"
2025/02/08 09:35:10 (Client) Protocol versions: remote=31, negotiated=27
2025/02/08 09:35:10 Client checksum: md4
2025/02/08 09:35:10 sending daemon args: [--server --sender -tr . repo/]
2025/02/08 09:35:10 exclusion list sent
2025/02/08 09:35:10 receiving file list
2025/02/08 09:35:11 [Receiver] i=0 ? . mode=40755 len=4096 uid=0 gid=0 flags=?
[...]
2025/02/08 09:35:11 [Receiver] i=89 ? clonoth/1/3139332e33322e3130302e302f32342d3234203d3e203537313936.roa mode=100644 len=1747 uid=0 gid=0 flags=?
```
……然后程序就停在那里了。
## 技巧1:按Ctrl+\(SIGQUIT)打印堆栈跟踪
查看Go程序挂起位置最简单的方法是按下`Ctrl+\`(反斜杠),让终端向其发送`SIGQUIT`信号。当Go运行时收到`SIGQUIT`后,会在退出进程前向终端打印堆栈跟踪。该行为默认启用,可通过`GOTRACEBACK`环境变量定制,详见[`runtime`包文档](https://pkg.go.dev/runtime)。
以下是本例的输出(我将字体调小以便你识别输出的形状,细节不重要,继续往下看):
```
^\SIGQUIT: quit PC=0x47664e m=0 sigcode=128
goroutine 0 gp=0x6e6020 m=0 mp=0x6e6ec0 [idle]:
internal/runtime/syscall.Syscall6()
/home/michael/sdk/go1.23.0/src/internal/runtime/syscall/asm_linux_amd64.s:36 +0xe fp=0x7ffc58665090 sp=0x7ffc58665088 pc=0x47664e
internal/runtime/syscall.EpollWait(0x586651e0?, {0x7ffc5866511c?, 0x3000000018?, 0x7ffc586651f0?}, 0x58665110?, 0x7ffc?)
/home/michael/sdk/go1.23.0/src/internal/runtime/syscall/syscall_linux.go:32 +0x45 fp=0x7ffc586650e0 sp=0x7ffc58665090 pc=0x4765e5
runtime.netpoll(0xc0000000c0?)
/home/michael/sdk/go1.23.0/src/runtime/netpoll_epoll.go:116 +0xd2 fp=0x7ffc58665768 sp=0x7ffc586650e0 pc=0x432332
runtime.findRunnable()
/home/michael/sdk/go1.23.0/src/runtime/proc.go:3580 +0x8c5 fp=0x7ffc586658e0 sp=0x7ffc58665768 pc=0x43f045
runtime.schedule()
/home/michael/sdk/go1.23.0/src/runtime/proc.go:3995 +0xb1 fp=0x7ffc58665918 sp=0x7ffc586658e0 pc=0x4405b1
runtime.park_m(0xc0000061c0)
/home/michael/sdk/go1.23.0/src/runtime/proc.go:4102 +0x1eb fp=0x7ffc58665970 sp=0x7ffc58665918 pc=0x4409cb
runtime.mcall()
/home/michael/sdk/go1.23.0/src/runtime/asm_amd64.s:459 +0x4e fp=0x7ffc58665988 sp=0x7ffc58665970 pc=0x470e2e
goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0x452658?, 0x0?, 0x98?, 0xb3?, 0xb?)
/home/michael/sdk/go1.23.0/src/runtime/proc.go:424 +0xce fp=0xc0000eb358 sp=0xc0000eb338 pc=0x46bc0e
runtime.netpollblock(0x4a01b8?, 0x4058e6?, 0x0?)
/home/michael/sdk/go1.23.0/src/runtime/netpoll.go:575 +0xf7 fp=0xc0000eb390 sp=0xc0000eb358 pc=0x4318f7
internal/poll.runtime_pollWait(0x7ef586628808, 0x72)
/home/michael/sdk/go1.23.0/src/runtime/netpoll.go:351 +0x85 fp=0xc0000eb3b0 sp=0xc0000eb390 pc=0x46af05
internal/poll.(*pollDesc).wait(0xc0000ce180?, 0xc00020e99c?, 0x0)
/home/michael/sdk/go1.23.0/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0000eb3d8 sp=0xc0000eb3b0 pc=0x4b0ce7
internal/poll.(*pollDesc).waitRead(...)
/home/michael/sdk/go1.23.0/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0000ce180, {0xc00020e99c, 0x4, 0x4})
/home/michael/sdk/go1.23.0/src/internal/poll/fd_unix.go:165 +0x27a fp=0xc0000eb470 sp=0xc0000eb3d8 pc=0x4b17da
net.(*netFD).Read(0xc0000ce180, {0xc00020e99c?, 0x6eeea0?, 0x1?})
/home/michael/sdk/go1.23.0/src/net/fd_posix.go:55 +0x25 fp=0xc0000eb4b8 sp=0xc0000eb470 pc=0x4f7e85
net.(*conn).Read(0xc000206000, {0xc00020e99c?, 0xc000212000?, 0x6e6ec0?})
/home/michael/sdk/go1.23.0/src/net/net.go:189 +0x45 fp=0xc0000eb500 sp=0xc0000eb4b8 pc=0x5001a5
net.(*TCPConn).Read(0x0?, {0xc00020e99c?, 0xc0000eb568?, 0x46d449?})
<autogenerated>:1 +0x25 fp=0xc0000eb530 sp=0xc0000eb500 pc=0x50bb25
io.ReadAtLeast({0x5d9640, 0xc000206000}, {0xc00020e99c, 0x4, 0x4}, 0x4)
/home/michael/sdk/go1.23.0/src/io/io.go:335 +0x90 fp=0xc0000eb578 sp=0xc0000eb530 pc=0x4957d0
io.ReadFull(...)
/home/michael/sdk/go1.23.0/src/io/io.go:354
encoding/binary.Read({0x5d9640, 0xc000206000}, {0x5da8b0, 0x7059a0}, {0x55e7c0, 0xc0000eb6a0})
/home/michael/sdk/go1.23.0/src/encoding/binary/binary.go:244 +0xa5 fp=0xc0000eb670 sp=0xc0000eb578 pc=0x5102a5
github.com/gokrazy/rsync/internal/rsyncwire.(*MultiplexReader).ReadMsg(0xc00020a100)
/home/michael/kr/rsync/internal/rsyncwire/wire.go:50 +0x48 fp=0xc0000eb6e8 sp=0xc0000eb670 pc=0x514428
github.com/gokrazy/rsync/internal/rsyncwire.(*MultiplexReader).Read(0x7ef5869b9a68?, {0xc000280000, 0x40000, 0x4dd4fb?})
/home/michael/kr/rsync/internal/rsyncwire/wire.go:72 +0x2f fp=0xc0000eb788 sp=0xc0000eb6e8 pc=0x5145af
bufio.(*Reader).Read(0xc0002020c0, {0xc00020e998, 0x4, 0x40ece5?})
/home/michael/sdk/go1.23.0/src/bufio/bufio.go:241 +0x197 fp=0xc0000eb7c0 sp=0xc0000eb788 pc=0x4d5a57
io.ReadAtLeast({0x5d93e0, 0xc0002020c0}, {0xc00020e998, 0x4, 0x4}, 0x4)
/home/michael/sdk/go1.23.0/src/io/io.go:335 +0x90 fp=0xc0000eb808 sp=0xc0000eb7c0 pc=0x4957d0
io.ReadFull(...)
/home/michael/sdk/go1.23.0/src/io/io.go:354
github.com/gokrazy/rsync/internal/rsyncwire.(*Conn).ReadInt32(0xc000208060)
/home/michael/kr/rsync/internal/rsyncwire/wire.go:163 +0x4a fp=0xc0000eb850 sp=0xc0000eb808 pc=0x51490a
github.com/gokrazy/rsync/internal/receiver.(*Transfer).recvIdMapping1(0xc000202120, 0x5a9b58)
/home/michael/kr/rsync/internal/receiver/uidlist.go:16 +0x3d fp=0xc0000eb8c0 sp=0xc0000eb850 pc=0x51fc7d
github.com/gokrazy/rsync/internal/receiver.(*Transfer).RecvIdList(0xc000202120)
/home/michael/kr/rsync/internal/receiver/uidlist.go:52 +0x1dd fp=0xc0000eba08 sp=0xc0000eb8c0 pc=0x51ffbd
github.com/gokrazy/rsync/internal/receiver.(*Transfer).ReceiveFileList(0xc000202120)
/home/michael/kr/rsync/internal/receiver/flist.go:229 +0x378 fp=0xc0000ebb10 sp=0xc0000eba08 pc=0x51c5b8
github.com/gokrazy/rsync/internal/receivermaincmd.clientRun({{0x5d9280, 0xc000078058}, {0x5d92a0, 0xc000078060}, {0x5d92a0, 0xc000078068}}, 0xc0000d0d90, {0x7ef53d47efc8, 0xc000206000}, {0x7ffc5866600e, ...}, ...)
/home/michael/kr/rsync/internal/receivermaincmd/receivermaincmd.go:341 +0x5cd fp=0xc0000ebc10 sp=0xc0000ebb10 pc=0x550c2d
github.com/gokrazy/rsync/internal/receivermaincmd.socketClient({{0x5d9280, 0xc000078058}, {0x5d92a0, 0xc000078060}, {0x5d92a0, 0xc000078068}}, 0xc0000d0d90, {0x7ffc58665ff4?, 0x1?}, {0x7ffc5866600e, ...})
/home/michael/kr/rsync/internal/receivermaincmd/clientserver.go:44 +0x425 fp=0xc0000ebcd0 sp=0xc0000ebc10 pc=0x54c205
github.com/gokrazy/rsync/internal/receivermaincmd.rsyncMain({{0x5d9280, 0xc000078058}, {0x5d92a0, 0xc000078060}, {0x5d92a0, 0xc000078068}}, 0xc0000d0d90, {0xc00007e440, 0x1, 0x2}, ...)
/home/michael/kr/rsync/internal/receivermaincmd/receivermaincmd.go:160 +0x5d7 fp=0xc0000ebdf0 sp=0xc0000ebcd0 pc=0x54f697
github.com/gokrazy/rsync/internal/receivermaincmd.Main({0xc0000160a0, 0x5, 0x5}, {0x5d9280?, 0xc000078058?}, {0x5d92a0?, 0xc000078060?}, {0x5d92a0?, 0xc000078068?})
/home/michael/kr/rsync/internal/receivermaincmd/receivermaincmd.go:394 +0x272 fp=0xc0000ebee8 sp=0xc0000ebdf0 pc=0x5510d2
main.main()
/home/michael/kr/rsync/cmd/gokr-rsync/rsync.go:12 +0x4e fp=0xc0000ebf50 sp=0xc0000ebee8 pc=0x5515ae
runtime.main()
/home/michael/sdk/go1.23.0/src/runtime/proc.go:272 +0x28b fp=0xc0000ebfe0 sp=0xc0000ebf50 pc=0x438d4b
runtime.goexit({})
/home/michael/sdk/go1.23.0/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ebfe8 sp=0xc0000ebfe0 pc=0x472e61
goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/home/michael/sdk/go1.23.0/src/runtime/proc.go:424 +0xce fp=0xc000074fa8 sp=0xc000074f88 pc=0x46bc0e
runtime.goparkunlock(...)
/home/michael/sdk/go1.23.0/src/runtime/proc.go:430
runtime.forcegchelper()
/home/michael/sdk/go1.23.0/src/runtime/proc.go:337 +0xb3 fp=0xc000074fe0 sp=0xc000074fa8 pc=0x439093
runtime.goexit({})
/home/michael/sdk/go1.23.0/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x472e61
created by runtime.init.7 in goroutine 1
/home/michael/sdk/go1.23.0/src/runtime/proc.go:325 +0x1a
```
呼!这个输出相当密集。我们可以使用 [https://github.com/maruel/panicparse](https://github.com/maruel/panicparse) 程序来以更丰富多彩且更简短的方式呈现这个堆栈跟踪:
[](https://michael.stapelberg.ch/posts/2025-02-27-debug-hanging-go-programs/2025-02-08-panicparse.jpg)
红色高亮显示的函数正是问题所在:我的rsync接收端实现错误地期望服务器发送uid/gid列表,尽管PreserveUid和PreserveGid选项并未启用。
提交 [`6c89d4d`](https://github.com/gokrazy/rsync/commit/6c89d4dda3be055f19684c0ed56d623da458194e) 修复了这个问题。
## 技巧2:将delve调试器附加到进程
如果仅仅打印堆栈跟踪不足以诊断问题,你可以进一步使用交互式调试器。最著名的Linux调试器大概是GDB,但在处理Go程序时,我推荐使用 [delve 调试器](https://github.com/go-delve/delve),因为它通常工作得更好。
如果还没安装,请先安装 delve:
```
% go install github.com/go-delve/delve/cmd/dlv@latest
```
本文使用 delve v1.24.0。你可以用调试器运行新的子进程(使用 `dlv exec`),无需特殊权限;但在 Linux 中,为了安全原因,默认禁止附加到现有进程([参见文档](https://www.kernel.org/doc/Documentation/security/Yama.txt))。我们可以允许这个特性(记得之后关闭!)通过:
```
% sudo sysctl -w kernel.yama.ptrace_scope=0
kernel.yama.ptrace_scope = 0
```
然后我们可以直接用 `dlv attach` 附加到挂起的 `gokr-rsync` 进程:
```
% dlv attach $(pidof gokr-rsync)
Type 'help' for list of commands.
(dlv)
```
很好。但如果只是打印堆栈跟踪,我们只能看到 `runtime` 包的函数:
```
(dlv) bt
0 0x000000000047bb83 in runtime.futex
at /home/michael/sdk/go1.23.6/src/runtime/sys_linux_amd64.s:558
1 0x00000000004374d0 in runtime.futexsleep
at /home/michael/sdk/go1.23.6/src/runtime/os_linux.go:69
2 0x000000000040d89d in runtime.notesleep
at /home/michael/sdk/go1.23.6/src/runtime/lock_futex.go:170
3 0x000000000044123e in runtime.mPark
at /home/michael/sdk/go1.23.6/src/runtime/proc.go:1866
4 0x000000000044290d in runtime.stopm
at /home/michael/sdk/go1.23.6/src/runtime/proc.go:2886
5 0x00000000004433d0 in runtime.findRunnable
at /home/michael/sdk/go1.23.6/src/runtime/proc.go:3623
6 0x0000000000444e1d in runtime.schedule
at /home/michael/sdk/go1.23.6/src/runtime/proc.go:3996
7 0x00000000004451cb in runtime.park_m
at /home/michael/sdk/go1.23.6/src/runtime/proc.go:4103
8 0x0000000000477eee in runtime.mcall
at /home/michael/sdk/go1.23.6/src/runtime/asm_amd64.s:459
```
原因是当前没有 goroutine 在运行(程序在无限等待从服务器接收数据),所以我们看到的是 Go 调度器中的一个 OS 线程在等待。我们首先需要切换到感兴趣的 goroutine(`grs` 打印所有 goroutine),然后堆栈跟踪看起来就会是我们期望的样子:
```
(dlv) gr 1
Switched from 0 to 1 (thread 414327)
(dlv) bt
0 0x0000000000474ebc in runtime.gopark
at /home/michael/sdk/go1.23.6/src/runtime/proc.go:425
1 0x000000000043819e in runtime.netpollblock
at /home/michael/sdk/go1.23.6/src/runtime/netpoll.go:575
2 0x000000000047435c in internal/poll.runtime_pollWait
at /home/michael/sdk/go1.23.6/src/runtime/netpoll.go:351
3 0x00000000004ed15a in internal/poll.(*pollDesc).wait
at /home/michael/sdk/go1.23.6/src/internal/poll/fd_poll_runtime.go:84
4 0x00000000004ed1f1 in internal/poll.(*pollDesc).waitRead
at /home/michael/sdk/go1.23.6/src/internal/poll/fd_poll_runtime.go:89
5 0x00000000004ee351 in internal/poll.(*FD).Read
at /home/michael/sdk/go1.23.6/src/internal/poll/fd_unix.go:165
6 0x0000000000569bb3 in net.(*netFD).Read
at /home/michael/sdk/go1.23.6/src/net/fd_posix.go:55
7 0x000000000057a025 in net.(*conn).Read
at /home/michael/sdk/go1.23.6/src/net/net.go:189
8 0x000000000058fcc5 in net.(*TCPConn).Read
at <autogenerated>:1
9 0x00000000004b72e8 in io.ReadAtLeast
at /home/michael/sdk/go1.23.6/src/io/io.go:335
10 0x00000000004b74d3 in io.ReadFull
at /home/michael/sdk/go1.23.6/src/io/io.go:354
11 0x0000000000598d5f in encoding/binary.Read
at /home/michael/sdk/go1.23.6/src/encoding/binary/binary.go:244
12 0x00000000005a0b7a in github.com/gokrazy/rsync/internal/rsyncwire.(*MultiplexReader).ReadMsg
at /home/michael/kr/rsync/internal/rsyncwire/wire.go:50
13 0x00000000005a0f17 in github.com/gokrazy/rsync/internal/rsyncwire.(*MultiplexReader).Read
at /home/michael/kr/rsync/internal/rsyncwire/wire.go:72
14 0x0000000000528de8 in bufio.(*Reader).Read
at /home/michael/sdk/go1.23.6/src/bufio/bufio.go:241
15 0x00000000004b72e8 in io.ReadAtLeast
at /home/michael/sdk/go1.23.6/src/io/io.go:335
16 0x00000000004b74d3 in io.ReadFull
at /home/michael/sdk/go1.23.6/src/io/io.go:354
17 0x00000000005a19ef in github.com/gokrazy/rsync/internal/rsyncwire.(*Conn).ReadInt32
at /home/michael/kr/rsync/internal/rsyncwire/wire.go:163
18 0x00000000005b77d2 in github.com/gokrazy/rsync/internal/receiver.(*Transfer).recvIdMapping1
at /home/michael/kr/rsync/internal/receiver/uidlist.go:16
19 0x00000000005b7ea8 in github.com/gokrazy/rsync/internal/receiver.(*Transfer).RecvIdList
at /home/michael/kr/rsync/internal/receiver/uidlist.go:52
20 0x00000000005b18db in github.com/gokrazy/rsync/internal/receiver.(*Transfer).ReceiveFileList
at /home/michael/kr/rsync/internal/receiver/flist.go:229
21 0x0000000000605390 in github.com/gokrazy/rsync/internal/receivermaincmd.clientRun
at /home/michael/kr/rsync/internal/receivermaincmd/receivermaincmd.go:341
22 0x00000000005fe572 in github.com/gokrazy/rsync/internal/receivermaincmd.socketClient
at /home/michael/kr/rsync/internal/receivermaincmd/clientserver.go:44
23 0x0000000000602f10 in github.com/gokrazy/rsync/internal/receivermaincmd.rsyncMain
at /home/michael/kr/rsync/internal/receivermaincmd/receivermaincmd.go:160
24 0x0000000000605e7e in github.com/gokrazy/rsync/internal/receivermaincmd.Main
at /home/michael/kr/rsync/internal/receivermaincmd/receivermaincmd.go:394
25 0x0000000000605c85 in main.main
at /home/michael/kr/rsync/cmd/gokr-rsync/rsync.go:12
26 0x000000000043d5db in runtime.main
at /home/michael/sdk/go1.23.6/src/runtime/proc.go:272
```
现在我们可以检查当前上下文中的变量:
```
(dlv) args
t = (*github.com/gokrazy/rsync/internal/receiver.Transfer)(0xc000202120)
rf = (*bufio.Reader)(0xc0002020c0)
...
(dlv) print t
*github.com/gokrazy/rsync/internal/receiver.Transfer {
client: (*github.com/gokrazy/rsync/internal/rsyncwire.Conn)(0xc000208060),
...
uidMap: []github.com/gokrazy/rsync/internal/receiver.uidMapEntry nil,
gidMap: []github.com/gokrazy/rsync/internal/receiver.uidMapEntry nil, ...}
```
`uidMap` 和 `gidMap` 为 nil,但在函数内部,代码试图从连接中读取这些值。调用栈显示我们从 `recvIdMapping1` 开始,它期望读取一个 uid 映射条目,但我们却卡在从 TCP 套接字读取数据上。服务器显然没有发送我们期望的数据。
Delve 提供了许多其他有用的特性,比如逐步执行代码、设置断点、修改变量等。在处理挂起程序时,能检查任意运行时状态已经非常有帮助了。
## 技巧3:保存核心转储以供以后分析
如果问题间歇性发生,或者你无法立即连接到生产机器,你可以保存核心转储并在之后分析。Go 1.23 引入了通过 `GOTRACEBACK=crash` 环境变量生成核心转储的支持(参阅 [Go 1.23 发布说明](https://go.dev/doc/go1.23#runtime))。此外,Go 运行时还提供了一个实验性的 `runtime/debug.WriteHeapDump()` 函数。
不过,对于大多数挂起场景,最灵活的跨版本方法是使用 `gcore`(来自 gdb 包)来生成核心转储文件。你需要安装 gdb(仅用于生成转储,之后仍然可以用 delve 分析):
```
% gcore -o /tmp/hang.core $(pidof gokr-rsync)
```
之后你可以将核心文件复制到另一台机器,并在那里用 delve 加载它:
```
% dlv core /path/to/gokr-rsync /tmp/hang.core
```
接着你就可以像在实时调试中一样检查 goroutine 和变量。
## 结论
调试挂起的 Go 程序时,你可以使用以下技巧:
1. 按下 `Ctrl+\` 快速获取堆栈跟踪。
2. 使用 delve 进行交互式检查。
3. 保存核心转储供以后分析。
这些技巧多年来为我节省了大量时间。希望它们也能帮到你!
相似文章
优化CPU密集型Go热路径的笔记
本文讨论了CPU密集型Go代码的性能优化技术,指出了泛型和接口抽象因无法内联而产生的局限性,并主张在热路径中使用代码复制。文章通过一个Brotli移植示例和深入基准测试进行了说明。
就用Go
一篇带有强烈观点的开发者文章倡导使用Go编程语言,强调其简洁的语法、强大的标准库、高效的并发模型以及单二进制部署,作为对过于复杂的现代技术栈的实用替代方案。
用户更改键盘布局时程序挂起的问题
一个调试故事,讲述了当用户更改键盘布局(例如使用 Win+Space 快捷键)时,Windows 程序挂起的原因,是由于一个后台线程创建了窗口但没有泵送消息。修复方法是要么泵送消息,要么销毁窗口。
Go 工具链的 Fuzzing 分支
Trail of Bits 推出了 gosentry,这是一个面向模糊测试的 Go 工具链分支,集成 LibAFL 以增强路径约束求解、结构化模糊测试及漏洞检测能力,同时保留标准测试工作流程。
Qwen3.6 35B + 合适脚手架,本地跑真实 Go 任务 9/10 通过
开发者用 Qwen3.6 35B 为核心,配合 little-coder 脚手架搭建路由本地环境,在 10 个真实 Go 任务中拿到 9/10 通过率,证明本地模型搭配合适工具链也能逼近前沿代码模型。