时隔8年，我重写了我的开源PyTorch曲率库

Hacker News Top 2026/05/14 07:36 工具

pytorch hessian curvature eigendecomposition lanczos open-source

摘要

时隔8年，作者重写了开源库pytorch-hessian-eigenthings，利用Lanczos等迭代方法为PyTorch模型提供Hessian及其他曲率矩阵的高效特征分解。

暂无内容

查看原文

查看缓存全文

缓存时间: 2026/05/16 15:40

noahgolmant/pytorch-hessian-eigenthings 来源：https://github.com/noahgolmant/pytorch-hessian-eigenthings

pytorch-hessian-eigenthings

PyPI (https://pypi.org/project/hessian-eigenthings/) | 文档 (https://noahgolmant.github.io/pytorch-hessian-eigenthings/) | CI (https://github.com/noahgolmant/pytorch-hessian-eigenthings/actions/workflows/ci.yml) | License

hessian-eigenthings 模块提供了一种高效（且可扩展！）的方法，用于计算任意 PyTorch 模型的黑塞矩阵（Hessian）以及其他曲率矩阵（如广义高斯-牛顿矩阵和经验 Fisher 矩阵）的特征分解。你可以通过 Lanczos 或随机幂迭代获得最大的特征值和特征向量，通过 Hutch++ 获得迹估计，以及通过随机 Lanczos 求积获得谱密度。

v1.0.0a1：Alpha 版本。旧的 0.x API 已移除；如果你依赖它，请固定版本为 hessian-eigenthings==0.0.2。

为什么要用这个？

黑塞矩阵的特征值和特征向量被认为与神经网络的许多泛化性质相关。人们假设“平坦极小点”具有更好的泛化能力，大型模型的 Hessian 非常低秩，某些优化器会导致更平坦的极小点，等等。但完整的 Hessian 矩阵需要与参数数量平方成正比的内存，这对于除了玩具模型之外的任何模型都是不可行的。像 Lanczos 和幂迭代这样的迭代方法只需要矩阵-向量乘积。Hessian-vector 乘积 (HVP) 正是如此，且其内存开销是线性的。该库将 HVP 与迭代算法结合起来，无需承受二次内存瓶颈即可计算特征分解，并适用于包括 HuggingFace 和 TransformerLens 变换器在内的真实模型。

安装

pip install hessian-eigenthings
# 或者附带 HuggingFace / TransformerLens 辅助工具：
pip install "hessian-eigenthings[transformers,transformer-lens]"

用法

从你的模型构建一个 CurvatureOperator，然后对其运行任意算法。

import torch
from torch import nn
from hessian_eigenthings import (
    HessianOperator,
    lanczos,
    trace,
    spectral_density,
    supervised_loss,
)

model = nn.Sequential(nn.Linear(20, 32), nn.Tanh(), nn.Linear(32, 1)).to(torch.float64)
x, y = torch.randn(128, 20, dtype=torch.float64), torch.randn(128, 1, dtype=torch.float64)
data = [(x[i:i+32], y[i:i+32]) for i in range(0, 128, 32)]

H = HessianOperator(model, data, supervised_loss(nn.functional.mse_loss))

eig = lanczos(H, k=5, seed=0)   # 前5个特征值 + 特征向量
t = trace(H, num_matvecs=99, seed=0)  # Hutch++ 迹估计
density = spectral_density(H, num_runs=8, lanczos_steps=40, seed=0)

如果你更倾向于使用 GGN（默认是半正定的，在分类损失上通常就是人们所说的“Hessian”），可以换成 GGNOperator。对于每个样本梯度的外积，可以使用 EmpiricalFisherOperator。它们共享相同的接口，因此上述所有算法都可以用于任意一种。

另外，当双反向传播不可行时（例如使用 FSDP 等），有一个有限差分 HVP 路径（HessianOperator(method="finite_difference")）。你可以通过 param_filter=match_names("blocks.*.attn.*") 限制只对部分参数进行分析，用于逐块分析。

对于大规模语言模型（大词汇量）场景，hf_lm_loss_of_output() 会自动选择融合的交叉熵 Hessian-vector 核：在 CUDA 上使用 Triton（约 3.4 倍加速，峰值内存减少 2 倍），否则使用 torch.compile（约 2.6 倍加速，峰值内存减少 2 倍）。要强制使用非融合的参考实现进行调试，请传递 fused="eager"。

可运行的脚本请参考 examples/ 目录，其中包含一个小型 MLP、HuggingFace 的 tiny-GPT2 以及一个 TransformerLens 模型。完整文档在。

参与库的开发

使用 uv (https://docs.astral.sh/uv/)：

git clone https://github.com/noahgolmant/pytorch-hessian-eigenthings
cd pytorch-hessian-eigenthings
uv sync --group dev --group docs --extra transformers --extra transformer-lens --extra curvlinops
uv run pytest
uv run mkdocs serve

引用本文工作

如果你觉得这个仓库有用并希望引用它（正如其他人 (https://scholar.google.com/scholar?oi=bibs&hl=en&cites=18039594054930134223) 所做的那样，谢谢！）：

@misc{hessian-eigenthings,
  author = {Noah Golmant and Zhewei Yao and Amir Gholami and Michael Mahoney and Joseph Gonzalez},
  title = {pytorch-hessian-eigenthings: efficient PyTorch Hessian eigendecomposition},
  month = oct,
  year = 2018,
  version = {1.0},
  url = {https://github.com/noahgolmant/pytorch-hessian-eigenthings}
}

致谢

最初的 2018 年实现是与加州大学伯克利分校 RISELab (https://rise.cs.berkeley.edu) 的 Zhewei Yao、Amir Gholami、Michael Mahoney 和 Joseph Gonzalez 共同编写的。减阶幂迭代基于 HessianFlow (https://github.com/amirgholami/HessianFlow) 的代码（Z. Yao, A. Gholami, Q. Lei, K. Keutzer, M. Mahoney. “Hessian-based Analysis of Large Batch Training and Robustness to Adversaries”, NeurIPS 2018, arXiv:1802.08241 (https://arxiv.org/abs/1802.08241)）。加速随机幂迭代来自 C. De Sa 等，“Accelerated Stochastic Power Iteration”, PMLR 2017 (arXiv:1707.02670 (https://arxiv.org/abs/1707.02670))。v1 刷新版本借鉴了 PyHessian (https://github.com/amirgholami/PyHessian)、curvlinops (https://github.com/f-dangel/curvlinops) 和 HessFormer (https://github.com/PureStrength-AI/HessFormer) 的思想。

License

MIT.

时隔8年，我重写了我的开源PyTorch曲率库

noahgolmant/pytorch-hessian-eigenthings 来源：https://github.com/noahgolmant/pytorch-hessian-eigenthings

pytorch-hessian-eigenthings

为什么要用这个？

安装

用法

参与库的开发

引用本文工作

致谢

License

相似文章

神经网络损失景观的谱渐近：曲率指数的精确分解

机器学习粗粒化分子动力学中的Hessian匹配方法

通信动力学神经网络：通过快速傅里叶变换对角化层减少参数数量并改善海森矩阵条件数

我们很高兴开源 LIDARLearn [R] [D] [P]

PyTorch 生态圈

提交意见反馈