追踪内核安装时段错误的原因

Lobsters Hottest 新闻

摘要

作者调试了内核安装过程中遇到的段错误,将其追溯到 dracut 调用的 'hardlink' 命令,并使用 gdb 调查崩溃。

<p><a href="https://lobste.rs/s/ihqekl/chasing_down_why_installing_kernel">评论</a></p>
查看原文
查看缓存全文

缓存时间: 2026/05/19 22:46

# 追踪为何安装内核时出现段错误 - sporks space 来源:https://sporks.space/2026/05/19/chasing-down-why-installing-the-kernel-segfaulted/ 我一直在运行一台面向特定架构的持续集成服务器。如果你近期也在维护服务器,你肯定熟悉没完没了的内核补丁轮换(https://www.gentoo.org/news/2026/05/19/copy-fail-fragnesia-vulnerabilities.html),原因无外乎是像 Copy Fail 这样广泛报道的安全问题。通常这些补丁都很顺利(除了上一个内核,它有一个 PowerPC 专属的构建回退问题(https://lore.kernel.org/stable/[email protected]/))。这次,在运行内核的 `make install` 时,我看到了一条非常奇怪的消息: ``` # make install INSTALL /boot /usr/bin/dracut: line 3125: 3644490 Segmentation fault hardlink "$initdir" 2>&1 3644491 Done | ddebug Generating grub configuration file ... [...] ``` 这很令人担忧。现在我担心重启到新内核后是否能正常工作。让我们尝试获取更多细节;内核的 Makefile 使用 `V=1` 来显示额外信息。 ``` # make install V=1 make --no-print-directory -C /usr/src/linux-6.18.32-gentoo-r1 \ -f /usr/src/linux-6.18.32-gentoo-r1/Makefile install # INSTALL /boot unset sub_make_done; ./scripts/install.sh /usr/bin/dracut: line 3125: 3648755 Segmentation fault hardlink "$initdir" 2>&1 3648756 Done | ddebug Generating grub configuration file ... ``` 好吧,这没有告诉我们太多信息,只是实际安装部分调用了一个脚本。我们来看看相关部分。 ``` # User/arch may have a custom install script for file in "${HOME}/bin/${INSTALLKERNEL}" \ "/sbin/${INSTALLKERNEL}" \ "${srctree}/arch/${SRCARCH}/install.sh" \ "${srctree}/arch/${SRCARCH}/boot/install.sh" do if [ ! -x "${file}" ]; then continue fi # installkernel(8) says the parameters are like follows: # # installkernel version zImage System.map [directory] exec "${file}" "${KERNELRELEASE}" "${KBUILD_IMAGE}" System.map "${INSTALL_PATH}" done ``` 这实际上是对 `installkernel` 的封装,`installkernel` 是一个自定义的、发行版特定的程序(如果缺少它,内核会提供一个通用的替代品,但功能较差),负责生成 initrd(这里委托给了 dracut)和更新引导加载器。在这种情况下,Gentoo 的版本(https://wiki.gentoo.org/wiki/Installkernel)接受 `-v` 参数,让我们添加上,看看能否获得更有趣的信息: ``` exec "${file}" -v "${KERNELRELEASE}" "${KBUILD_IMAGE}" System.map "${INSTALL_PATH}" ``` 好的,再次运行 `make install V=1`: ``` dracut[I]: *** Hardlinking files *** /usr/bin/dracut: line 3125: 3403396 Segmentation fault hardlink "$initdir" 2>&1 3403397 Done | ddebug dracut[I]: *** Hardlinking files done *** ``` 看起来问题是发生在这个特定的步骤。我们需要检查 dracut 本身。相关代码块(标记了第 3125 行): ``` # Hardlink is mtime-sensitive; do it after the above clamp. if [[ $do_hardlink == yes ]] && command -v hardlink > /dev/null; then dinfo "*** Hardlinking files ***" hardlink "$initdir" 2>&1 | ddebug dinfo "*** Hardlinking files done ***" # Hardlink itself breaks mtimes on directories as we may have added/removed # dir entries. Fix those up. if [[ ${SOURCE_DATE_EPOCH-} ]] && [[ $CPIO != 3cpio ]]; then clamp_mtimes "$initdir" -type d fi fi # this is line 3125 ``` 很明显,`hardlink` 是罪魁祸首。正常做法是替换路径上的 `hardlink`,但这里我直接修改 dracut 可执行文件。(请勿在家尝试!)我将用 gdb 来调用它,并去掉管道到 dracut 调试日志的部分: ``` #hardlink "$initdir" 2>&1 | ddebug gdb --args hardlink "$initdir" ``` 再次运行 make 命令。当出现 gdb 提示符时,运行它: ``` dracut[I]: *** Hardlinking files *** GNU gdb (Gentoo 17.1 vanilla) 17.1 Copyright (C) 2025 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64-unknown-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from hardlink... Reading symbols from /usr/lib/debug/usr/bin/hardlink.debug... (gdb) catch signal Catchpoint 1 (standard signals) (gdb) run Starting program: /usr/bin/hardlink /var/tmp/dracut.dEcmlh1/initramfs [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib64/libthread_db.so.1". Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. (gdb) ``` 等等,程序去哪儿了?gdb 需要进程仍然存在才能检查它,还是……等等,难道是*内核*把它杀死了?如果我们检查 dmesg…… ``` [1199626.054903] BUG: Unable to handle kernel data access at 0xc0403effffffffc8 [1199626.054921] Faulting instruction address: 0xc000000000396cb4 [1199626.054927] Oops: Kernel access of bad area, sig: 11 [#15] [1199626.054932] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=32 NUMA pSeries [1199626.054939] Modules linked in: vsock_diag vmx_crypto ibmveth pseries_rng rng_core fuse vsock_loopback vmw_vsock_virtio_transport_common vsock sr_mod cdrom nx_crypto [1199626.054969] CPU: 22 UID: 0 PID: 3545486 Comm: hardlink Tainted: G S D 6.18.26-gentoo-ppc #1 VOLUNTARY [1199626.054981] Tainted: [S]=CPU_OUT_OF_SPEC, [D]=DIE [1199626.054984] Hardware name: IBM,8286-42A POWER8 (architected) 0x4b0201 0xf000004 of:IBM,FW860.90 (SV860_226) hv:phyp pSeries [1199626.054992] NIP: c000000000396cb4 LR: c000000000396e68 CTR: c000000000396e40 [1199626.054998] REGS: c00000018a4a7840 TRAP: 0380 Tainted: G S D (6.18.26-gentoo-ppc) [1199626.055006] MSR: 8000000000009032 CR: 44002242 XER: 20000000 [1199626.055023] CFAR: c000000000396e64 IRQMASK: 0 GPR00: c0003d00002b0acc c00000018a4a7ae0 c0000000018ad100 fffffffffffffff0 GPR04: fffffffffffffff0 0000000000000001 c000000637bc0c28 0000000000000000 GPR08: 0000001ffd4cd000 c0003f0000000000 0000000000000000 c0003d00002b48a8 GPR12: c000000000396e40 c00000002ec44800 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 00000001000300c8 GPR24: 000000010002efe0 0000000000000000 0000000000000001 c00000011d91a480 GPR28: 0000000000000000 c00000000271cd00 c0003d00002b80b8 c0403effffffffc0 [1199626.055106] NIP [c000000000396cb4] __ksize+0x34/0x190 [1199626.055117] LR [c000000000396e68] kfree_sensitive+0x28/0x80 [1199626.055124] Call Trace: [1199626.055127] [c00000018a4a7ae0] [c00000018a4a7b20] 0xc00000018a4a7b20 (unreliable) [1199626.055136] [c00000018a4a7b10] [c00000018a4a7b80] 0xc00000018a4a7b80 [1199626.055143] [c00000018a4a7b40] [c0003d00002b0acc] nx_crypto_ctx_shash_exit+0x24/0x60 [nx_crypto] [1199626.055154] [c00000018a4a7b70] [c00000000097af78] crypto_shash_exit_tfm+0x28/0x40 [1199626.055165] [c00000018a4a7b90] [c00000000096f168] crypto_destroy_tfm+0x98/0x140 [1199626.055176] [c00000018a4a7bd0] [c000000000978d60] crypto_exit_ahash_using_shash+0x20/0x40 [1199626.055186] [c00000018a4a7bf0] [c00000000096f168] crypto_destroy_tfm+0x98/0x140 [1199626.055196] [c00000018a4a7c30] [c000000000998b5c] hash_release+0x1c/0x30 [1199626.055207] [c00000018a4a7c50] [c000000000996f58] alg_sock_destruct+0x38/0x60 [1199626.055216] [c00000018a4a7c80] [c0000000010bed98] __sk_destruct+0x48/0x2b0 [1199626.055227] [c00000018a4a7cc0] [c0000000009970a8] af_alg_release+0x58/0xb0 [1199626.055237] [c00000018a4a7cf0] [c0000000010b3918] __sock_release+0x68/0x150 [1199626.055247] [c00000018a4a7d70] [c0000000010b3a20] sock_close+0x20/0x40 [1199626.055257] [c00000018a4a7d90] [c0000000004549b0] __fput+0x110/0x3a0 [1199626.055265] [c00000018a4a7de0] [c00000000044df48] sys_close+0x48/0xa0 [1199626.055275] [c00000018a4a7e10] [c000000000029d40] system_call_exception+0x140/0x2d0 [1199626.055284] [c00000018a4a7e50] [c00000000000c354] system_call_common+0xf4/0x258 [1199626.055295] ---- interrupt: c00 at 0x3ffff7def394 [1199626.055300] NIP: 00003ffff7def394 LR: 00003ffff7def3f0 CTR: 0000000000000000 [1199626.055305] REGS: c00000018a4a7e80 TRAP: 0c00 Tainted: G S D (6.18.26-gentoo-ppc) [1199626.055312] MSR: 800000000280f032 CR: 24002242 XER: 00000000 [1199626.055334] IRQMASK: 0 GPR00: 0000000000000006 00003fffffffb820 00003ffff7f87100 0000000000000003 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00003ffff7ff37e0 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 00000001000300c8 GPR24: 000000010002efe0 0000000000000000 0000000000000001 0000000100030070 GPR28: 00003fffffffc078 0000000000000002 00003ffff7f802d8 00000001000300c8 [1199626.055412] NIP [00003ffff7def394] 0x3ffff7def394 [1199626.055417] LR [00003ffff7def3f0] 0x3ffff7def3f0 [1199626.055421] ---- interrupt: c00 [1199626.055425] Code: 38426480 28230010 418200c4 3d2200df fbe1fff8 f821ffd1 787fa402 7c641b78 3929c820 7bff3664 e9290000 7fe9fa14 71480001 408200a4 895f0030 [1199626.055456] ---[ end trace 0000000000000000 ]--- [1199626.059073] note: hardlink[3545486] exited with irqs disabled ``` 哦不。为什么 `hardlink` 会导致内核错误,而且为什么会发生在*加密子系统*中?内核的新敌人 `AF_ALG` 出现了,最出名的就是……Copy Fail。这几乎肯定不是 Copy Fail 本身,但我丝毫不惊讶 Copy Fail 的修复可能引入了一个回归问题。让我们搞清楚为什么 `hardlink` 会使用这个接口。 不用 gdb,我们试试把它放到 strace 下,在我们的 hacked dracut 中运行。 在(极其长的系统调用跟踪(https://gist.githubusercontent.com/NattyNarwhal/7a6c1411f542d737af3a7fc7b238b47b/raw/680e2595ea51899090b220234355a4921badf73b/oops2.txt))中: ``` close(5) = 0 close(0) = 0 close(0) = -1 EBADF (Bad file descriptor) close(4) = 0 close(3) = ? +++ killed by SIGSEGV +++ /usr/bin/dracut: line 3128: 3561819 Segmentation fault (core dumped) strace hardlink "$initdir" ``` 好吧,内核在 `close` 系统调用中间崩溃了,导致了一个非常滑稽的 `SIGSEGV`。最后创建的文件描述符 #3 是什么? ``` socket(AF_ALG, SOCK_SEQPACKET, 0) = 3 bind(3, {sa_family=AF_ALG, salg_type="hash", salg_feat=0, salg_mask=0, salg_name="sha256"}, 88) = 0 accept(3, NULL, NULL) = 4 ``` 哦,太好了,它居然*确实*使用了 `AF_ALG`。为什么一个创建硬链接的程序会使用内核有漏洞的加密加速路径?它毕竟不是 IPsec 啊。好吧,如果我们查找 `util-linux`(`hardlink` 的来源)中的 `AF_ALG`,会发现一个用于文件比较的工具函数(在 `lib/fileeq.c` 中)。我们来看看(第一个大注释(https://github.com/util-linux/util-linux/blob/master/lib/fileeq.c#L23-L25)): ``` /* * 比较文件内容 * * 目标是尽量减少需要从文件中读取的数据量,并且能够比较大量文件的集合, * 这意味着尽可能重用之前的数据。它不会在不必要的情况下读取整个文件。 * * 另一个目标是尽量减少打开的文件数量(想象一下 "hardlink /"), * 代码只能打开两个文件,并且在必要时可以稍后重新打开文件。 * * 这段代码支持多种比较方法。所有方法通用的最基本步骤是读取并比较一个 "intro" * (文件开头的一些字节)。这个 intro 缓冲区总是缓存在 'struct ul_fileeq_data' 中, * 这个 intro 缓冲区被当作 block=0 来处理。这个基本的方法可以大大减少…… * * 接下来的步骤取决于选择的方法: * * * memcmp 方法:始终将数据读取到用户空间,不进行缓存,直接比较文件内容; * 适合小文件的小规模集合,速度较快。 * * * Linux crypto API:基于 sendfile() 的零拷贝方法,数据块被发送到内核哈希函数 * (sha1, ...),只有哈希摘要被读取并缓存在用户空间。 * 适合大型(大)文件的大规模集合,速度较快。 * * [...] */ ``` 不错。这是一个优化路径,它暴露出一个脆弱的内核子系统,仅仅用来……做哈希。实际设置 socket 的代码在 `init_crypto_api` 中,使用它的逻辑受到 `USE_FILEEQ_CRYPTOAPI` 宏的控制。既然有一个 fallback 方案,我们能方便地禁用它,转而使用 `memcmp` 行为吗?肯定没问题吧?我们检查一下 `include/fileeq.h`,它暴露了 API: ``` #if defined(__linux__) && defined(HAVE_LINUX_IF_ALG_H) # define USE_FILEEQ_CRYPTOAPI 1 #endif ``` 很好,它是硬编码的,在新内核上始终可用;没有构建系统选项(因此也没有 `USE` 标志)。好吧,我们直接关闭它。因为我在这个 CI 服务器上使用的是 Gentoo,修起来很容易(https://wiki.gentoo.org/wiki//etc/portage/patches)。将下面的补丁放入 `/etc/portage/patches/sys-apps/util-linux/no-af-alg.patch`,然后用 `emerge -av sys-apps/util-linux` 重新构建包: ``` diff --git a/include/fileeq.h b/include/fileeq.h index 90b8d5118..e4d2dfae2 100644 --- a/include/fileeq.h +++ b/include/fileeq.h @@ -11,7 +11,7 @@ #include <unistd.h> #if defined(__linux__) && defined(HAVE_LINUX_IF_ALG_H) -# define USE_FILEEQ_CRYPTOAPI 1 +//# define USE_FILEEQ_CRYPTOAPI 1 #endif /* Number of bytes from the beginning of the file we always ``` 在我们的 hacked dracut 中仍然留有 strace 调用,再次运行: ``` close(3) = 0 close(0) = 0 close(0) = -1 EBADF (Bad file descriptor) fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x2), ...}) = 0 write(1, "Mode: real\n", 31Mode: real ) = 31 write(1, "Method: memcmp"..., 33Method: memcmp ) = 33 write(1, "Files: 1038\n", 31Files: 1038 ) = 31 write(1, "Linked: 3 file"..., 34Linked: 3 files ) = 34 write(1, "Compared: 0 xatt"..., 35Compared: 0 xattrs ) = 35 write(1, "Compared: 416 fi"..., 36Compared: 416 files ) = 36 write(1, "Saved: 5.74 K"..., 35Saved: 5.74 KiB ) = 35 write(1, "Duration: 1.0702"..., 43Duration: 1.070276 seconds ) = 43 exit_group(0) = ? +++ exited with 0 +++ dracut[I]: *** Hardlinking files done *** ``` 太好了,它工作了。看起来它也是在快要结束时,在做清理和最终报告时崩溃了,所以其实它可能从一开始就能正常工作。现在我需要搞清楚为什么内核会完全崩溃……(我也怀疑,在削弱 `AF_ALG` 的努力中,内核开发者未来可能会让它不再是零拷贝,从而使得 util-linux 使用它的做法变得没有意义。或许向 util-linux 提交补丁是个好主意。)

相似文章

@jedisct1: epoll UAF

X AI KOLs Timeline

对 Linux 内核 epoll 子系统中的一个释放后使用(UAF)漏洞的详细分析,该漏洞通过切换到 RCU 修复,以及作者在现代设备上尝试利用该漏洞失败的经过。

Linux内核中因单个错误字符导致的高危漏洞

Ars Technica

Linux内核中一个错误的字符引入了一个use-after-free漏洞(CVE-2026-53111),允许非特权用户在Debian和Ubuntu系统上将权限提升至root;该漏洞已修复并移植回旧版本。

Unix GC 重制版

Hacker News Top

详解 Linux 内核 AF_UNIX 垃圾收集器的重写,包括背景、新的基于图的模型以及一个释放后使用漏洞。