Score Broadcast and Decorrelation: 一种基于广播的信用分配通用框架

arXiv cs.LG 论文

摘要

介绍了Score Broadcast and Decorrelation (SBD),一种原则性的基于广播的信用分配框架,该框架可推广到包括交叉熵、Bregman散度和适当评分规则在内的可微损失族。该工作为三因子学习规则提供了理论基础,并在CIFAR-10和Tiny ImageNet上展示了相较于现有广播方法的性能提升。

arXiv:2605.30638v1 公告类型:新 摘要:我们提出了Score Broadcast and Decorrelation (SBD),一种针对一般可微损失族基于广播的信用分配的原则性框架。误差广播是反向传播的一种生物合理替代方案,它将输出信息发送到隐藏层而无需权重传输。最近针对均方误差(MSE)设置提出的Error Broadcast and Decorrelation (EBD)框架,将该机制建立在最优估计器的随机正交性之上,即最优残差与输入的函数正交。我们通过引入输出分数(损失关于最终层输出的梯度)与隐藏层激活之间的正交性原理来推广该基础,该原理在最优分数条件均值为零时成立。这一单一原理统一了跨标准可微损失族(包括交叉熵、Bregman散度、适当评分规则和指数族负对数似然)的基于广播的信用分配。该框架为一般损失下的三因子学习规则提供了理论基础,其中神经调质因子被推导为广播损失分数。我们显式推导了交叉熵情况,刻画了可允许损失类,并引入了一种分数向量扩展技术,该技术在保持正交性框架的同时丰富了广播信号。在CIFAR-10和Tiny ImageNet上的实验表明,SBD显著优于现有的广播方法,而分数向量扩展带来了进一步的提升。总体而言,这项工作将损失分数识别为广播信号,提供了来自神经科学的三因子学习规则的正交性理论和理论基础,并展示了分数向量扩展如何丰富所得目标的去相关方向。
查看原文
查看缓存全文

缓存时间: 2026/06/01 09:29

# 分数广播与去相关:广播式信用分配的通用框架 来源:https://arxiv.org/html/2605.30638 Mustafa Uzun$^{1,2}$, Mete Erdogan$^{3}$, Cengiz Pehlevan$^{4,5,6}$, Alper T. Erdogan$^{1,2}$  
$^1$KUIS AI Center, Koc University, 土耳其  
$^2$电气与电子工程系, Koc University, 土耳其  
$^3$电气工程系, Stanford University, 美国  
$^4$John A. Paulson 工程与应用科学学院, Harvard University, 美国  
$^5$Kempner 研究所, Harvard University, 美国  
$^6$脑科学中心, Harvard University, 美国  
\{muzun22, alperdogan\}@ku.edu.tr, [email protected], [email protected]  

###### 摘要  
我们提出了Score Broadcast and Decorrelation(SBD)框架,这是一个为可微损失函数族提供广播式信用分配原则性框架的通用方法。误差广播是一种生物上合理的反向传播替代方案,它通过将输出信息直接发送到隐藏层,无需权重传输。最近提出的误差广播与去相关(EBD)框架针对均方误差(MSE)设定,将此机制建立在最优估计量的随机正交性基础上,即最优残差与输入的函数正交。我们通过引入**输出分数**(损失对最终层输出的梯度)与隐藏层激活之间的正交性原理来扩展这一基础,该原理在最优分数具有条件零均值时成立。这一单一原理统一了标准可微损失函数族(包括交叉熵、Bregman散度(MSE为其特例)、通过无约束链接的恰当评分规则以及指数族负对数似然)的广播式信用分配。该框架为一般损失下的三因子学习规则提供了理论基础,其中神经调节因子被推导为广播损失分数而非假定。我们明确推导了交叉熵情形,刻画了可允许的损失函数类,并引入了一种**分数向量扩展**技术,该技术在保持正交性框架的同时丰富了广播信号。在CIFAR-10和Tiny ImageNet上的实验表明,SBD显著优于现有的广播方法,且分数向量扩展进一步提升了性能。总体而言,本工作确定了损失分数作为广播的信号,提供了神经科学中三因子学习规则的正交性理论和理论基础,并展示了分数向量扩展如何丰富目标函数的去相关方向。  

## 1 引言  
神经网络是自然智能的基本计算模型,也是现代AI的核心引擎。在这两个领域中,一个关键挑战是信用分配问题:如何更新局部突触权重以优化全局性能指标。机器学习中的主流解决方案是反向传播(BP),它通过将输出误差沿网络向后传播来推导精确梯度(Rumelhart et al., 1986 (https://arxiv.org/html/2605.30638#bib.bib1))。尽管BP非常有效,但它需要对称的向后通路和精确的权重传输,这些长期被视为生物上不合理(Crick, 1989 (https://arxiv.org/html/2605.30638#bib.bib2); Lillicrap et al., 2020 (https://arxiv.org/html/2605.30638#bib.bib3))且不适合高效硬件实现的架构约束。这些约束推动了对替代信用分配框架的研究,这些框架放宽了BP的严格路由要求(Whittington and Bogacz, 2019 (https://arxiv.org/html/2605.30638#bib.bib5))。其中一类显著的方法依赖于**误差广播**:将全局输出信息直接分发到隐藏层,而非顺序传播。尽管这一方法具有吸引力且生物上合理,但它引发了一个基本问题:*对于给定的损失函数,应该广播什么具体量?以及什么理论原理证明这种去中心化机制能够驱动学习?*  

误差广播与去相关(EBD)框架(Erdogan et al., 2025 (https://arxiv.org/html/2605.30638#bib.bib7))在均方误差(MSE)设定下提供了一个有原则的答案。其出发点是MMSE估计的随机正交性:在最优情况下,残差误差与输入的适当函数正交。EBD将这一性质转化为隐藏激活与广播输出误差之间的逐层去相关目标,从而导出了局部**三因子学习规则**,这是一类长期受生物学启发的突触更新规则,其中突触前活动项和突触后敏感性项由第三个神经调节因子控制(Frémaux and Gerstner, 2016 (https://arxiv.org/html/2605.30638#bib.bib8); Gerstner et al., 2018 (https://arxiv.org/html/2605.30638#bib.bib9); Kuśmierz et al., 2017 (https://arxiv.org/html/2605.30638#bib.bib10); Schultz, 1998 (https://arxiv.org/html/2605.30638#bib.bib11))。然而,这一基础仅适用于平方误差。在分类任务中,标准目标函数是交叉熵,更一般地,我们希望优化可微损失函数,其欧几里得残差并非自然误差信号。因此,广播学习的通用理论需要一种依赖于损失函数的概念来定义应广播的内容。  

在本文中,我们确定该量为**输出分数**,定义为损失对最终层输出的梯度。对于交叉熵,该分数是概率残差 $\boldsymbol{\delta}=\mathbf{p}-\mathbf{y}$。我们证明,在总体交叉熵最优处,该分数具有条件零均值,因此与输入的任何确定性函数(包括隐藏层激活)正交。更一般地,同一原理适用于条件风险由零分数条件刻画的任何可微损失函数。因此,EBD使用的MSE残差和分类中使用的交叉熵残差都是更通用的基于分数的正交性原理的实例。  

**突触前** **突触后** **输出分数**  
$h_j^{(k-1)}$ $h_i^{(k)}$ 突触 $\boldsymbol{\delta}$ $\phi_2(\mathbf{x})\odot\boldsymbol{\delta}$ $\vdots$ $\phi_M(\mathbf{x})\odot\boldsymbol{\delta}$ $\tilde{\boldsymbol{\delta}}\in\mathbb{R}^{M D_{\mathrm{out}}}$  
$q_i^{(k)}=\hat{\tilde{\mathbf{R}}}^{(k)}_i\,\tilde{\boldsymbol{\delta}}$  
$\Delta W_{ij}^{(k)}\;\propto\;$ \hbox to15.38pt{\vbox to17.14pt{\pgfpicture\makeatletter\hbox{\quad\lower-8.57pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{{}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgffillcolor}{rgb}{0.925,0.925,0.925}\pgfsys@color@gray@fill{0.925}\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }{{}{}{{}}{}{}{}{}{}{}{}{}{}{ \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgffillcolor}{rgb}{0.925,0.925,0.925}\pgfsys@color@gray@fill{0.925}\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@invoke{ }\pgfsys@setlinewidth{\the\pgflinewidth}\pgfsys@invoke{ }{}{{}}{}{}{}{{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}}{{}}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{

相似文章

DiffScore:超越自回归似然性的文本评估

arXiv cs.CL

本文介绍了 DiffScore,这是一个基于掩码大型扩散语言模型(Masked Large Diffusion Language Models)的文本评估框架,通过利用掩码重建来解决自回归评分中的位置偏差问题。

先思考,再打分:解耦推理与打分的视频奖励建模

Hugging Face Daily Papers

本文介绍了 DeScore,这是一种通过解耦推理和打分过程来提高训练效率和泛化能力的视频奖励模型。它利用多模态大语言模型采用“先思考再打分”的范式,解决了现有判别式和生成式奖励模型的局限性。

通过反事实推理路径减少信用分配方差

arXiv cs.LG

提出隐式行为策略优化(IBPO),一种基于反事实比较的信用分配框架,通过将稀疏的终端奖励转化为对步骤敏感的学习信号,提升了大型语言模型在多步推理任务中的训练稳定性和性能。

ARCA:令牌信号退化时的适配器残差信用分配

arXiv cs.LG

本文指出了在使用LoRA进行LLM强化学习时,令牌级信用分配中存在的一种结构性失效模式,即内在信号退化。它提出了适配器残差信用分配(ARCA),该方法从适配器的隐藏状态残差中推导令牌显著性,并与基线方法保持竞争力。