Supervised Training Rapidly Degrades Early Visual Cortex Alignment Across Biologically Plausible Learning Rules
Summary
This paper tracks how supervised training with different learning rules (backpropagation, feedback alignment, predictive coding, STDP) degrades alignment between neural network representations and early visual cortex fMRI data, finding that untrained networks often match or exceed trained ones in V1 alignment.
View Cached Full Text
Cached at: 06/01/26, 09:27 AM
# Supervised Training Rapidly Degrades Early Visual Cortex Alignment Across Biologically Plausible Learning Rules
Source: [https://arxiv.org/html/2605.30556](https://arxiv.org/html/2605.30556)
\(May 2026\)
###### Abstract
Random, untrained neural networks consistently match or exceed trained networks in representational similarity to early visual cortex\. This puzzling finding challenges the assumption that learning improves brain alignment\. We investigate it by tracking representational similarity analysis \(RSA\) alignment to human fMRI data across training for four learning rules: backpropagation \(BP\), feedback alignment \(FA\), predictive coding \(PC\), and spike\-timing\-dependent plasticity \(STDP\)\. Using 720 object images from the THINGS database and fMRI data from three subjects across six visual ROIs, we measure Spearman correlations between model and brain representational dissimilarity matrices at eight training checkpoints \(epochs 0–40\)\. We find that \(1\) a single epoch of training reduces V1 alignment by 25–90%, depending on the learning rule; \(2\) backpropagation reduces V1 alignment most severely \(Δr=−0\.080\\Delta r=\-0\.080\), while predictive coding and STDP preserve substantially more \(Δr≈−0\.04\\Delta r\\approx\-0\.04\); and \(3\) a weaker, opposite tendency appears in object\-selective cortex \(LOC\), where BP shows the largest increase in alignment during training, although the absolute change is small\. These results suggest that untrained architectures capture low\-level visual statistics through inductive biases alone, and that global error signals \(BP\) reshape early representations more aggressively than local learning rules \(PC, STDP\), which better preserve brain\-like structure\.
## 1Introduction
A growing body of work demonstrates that deep neural networks trained on visual tasks develop internal representations that correlate with neural responses in primate visual cortex\(Yamins et al\.,[2014](https://arxiv.org/html/2605.30556#bib.bib13); Khaligh\-Razavi and Kriegeskorte,[2014](https://arxiv.org/html/2605.30556#bib.bib5); Cichy et al\.,[2016](https://arxiv.org/html/2605.30556#bib.bib2)\)\. This alignment has been taken as evidence that optimising for ecologically relevant tasks produces brain\-like computations\. However, a surprising counter\-finding has emerged: untrained, randomly initialised networks often match or outperform trained networks in alignment with early visual areas, particularly V1\(Leutenegger,[2026](https://arxiv.org/html/2605.30556#bib.bib6)\)\.
This raises a fundamental question: does learning improve brain alignment, or does it erode it? Prior work has compared learning rules at a single trained endpoint\(Leutenegger,[2026](https://arxiv.org/html/2605.30556#bib.bib6); Lillicrap et al\.,[2016](https://arxiv.org/html/2605.30556#bib.bib7)\), but the dynamics of how alignment changes during training remain unexplored\. Understanding these dynamics could reveal whether the random\-weights advantage reflects an intrinsic property of network architectures or an active degradation caused by training\.
We address this gap by measuring representational similarity to human fMRI across training for four learning rules spanning a spectrum of biological plausibility: backpropagation \(BP\), feedback alignment \(FA\)\(Lillicrap et al\.,[2016](https://arxiv.org/html/2605.30556#bib.bib7)\), predictive coding \(PC\)\(Rao and Ballard,[1999](https://arxiv.org/html/2605.30556#bib.bib10); Whittington and Bogacz,[2017](https://arxiv.org/html/2605.30556#bib.bib12)\), and spike\-timing\-dependent plasticity \(STDP\)\(Bi and Poo,[1998](https://arxiv.org/html/2605.30556#bib.bib1); Masquelier and Thorpe,[2007](https://arxiv.org/html/2605.30556#bib.bib8)\)\. By extracting model representational dissimilarity matrices \(RDMs\) at eight checkpoints during 40 epochs of CIFAR\-10 training and comparing them against fMRI RDMs from the THINGS dataset\(Hebart et al\.,[2019](https://arxiv.org/html/2605.30556#bib.bib3),[2023](https://arxiv.org/html/2605.30556#bib.bib4)\), we track how each learning rule reshapes representational geometry relative to the visual cortex\.
Our results reveal three key findings\. First, all learning rules degrade V1 alignment, but they do so at dramatically different rates: BP reduces V1 alignment by 90% within a single epoch, while PC and STDP preserve approximately 70%\. Second, the magnitude of degradation tracks the globality of the error signal: rules that propagate precise, layer\-specific error gradients \(BP\) are more destructive than rules relying on local computation \(PC, STDP\)\. Third, a weaker, opposite tendency appears in object\-selective cortex \(LOC\), where BP shows the largest gain in alignment during training, suggesting that the same mechanism that erodes V1 structure may also build task\-relevant representations in higher areas\.
## 2Methods
### 2\.1Network Architecture
All learning rules were implemented on a shared convolutional architecture consisting of three convolutional blocks \(Conv1: 32 filters, Conv2: 64 filters, Conv3: 128 filters; each with3×33\\times 3kernels, batch normalisation, ReLU, and2×22\\times 2max\-pooling\) followed by a fully connected layer \(FC1: 512 units\) and a classification head \(10 classes\)\. This architecture was chosen to matchLeutenegger \([2026](https://arxiv.org/html/2605.30556#bib.bib6)\)and to ensure that differences between conditions reflect the learning rule, not the architecture\.
### 2\.2Learning Rules
Backpropagation \(BP\)\.Standard supervised training with cross\-entropy loss, Adam optimiser \(lr=10−3\\text\{lr\}=10^\{\-3\}, weight decay10−410^\{\-4\}\), cosine annealing schedule, gradient clipping at 1\.0, and dropout \(0\.3\)\.
Feedback Alignment \(FA\)\.FollowingLillicrap et al\. \([2016](https://arxiv.org/html/2605.30556#bib.bib7)\), the backward pass uses fixed random feedback weights at all convolutional layers instead of the transpose of the forward weights\. This replaces the symmetric weight transport required by BP with a biologically more plausible asymmetric pathway\. SGD optimiser \(lr=5×10−4\\text\{lr\}=5\\times 10^\{\-4\}, momentum 0\.9\)\.
Predictive Coding \(PC\)\.FollowingRao and Ballard \([1999](https://arxiv.org/html/2605.30556#bib.bib10)\); Whittington and Bogacz \([2017](https://arxiv.org/html/2605.30556#bib.bib12)\), each layer maintains a prediction of the layer below via learned transpose convolutions\. During inference, representations are iteratively refined forT=10T=10steps to minimise prediction errors\. Feedforward weights are updated using local prediction error signals \(learning rate10−410^\{\-4\}\), and a separate classifier head is trained with Adam\.
STDP\.FollowingMasquelier and Thorpe \([2007](https://arxiv.org/html/2605.30556#bib.bib8)\), convolutional weights are updated using spike\-timing correlations: input activations are converted to Poisson spike trains, and weight changes follow an exponential STDP kernel \(A\+=A−=0\.003A\_\{\+\}=A\_\{\-\}=0\.003,τ\+=τ−=20\\tau\_\{\+\}=\\tau\_\{\-\}=20ms,Tsim=10T\_\{\\text\{sim\}\}=10steps\)\. A supervised classifier head is trained separately with Adam\.
Random Weights\.The untrained baseline uses the same architecture as BP \(including batch normalisation and dropout\) at initialisation \(epoch 0\)\.
### 2\.3Training
All models were trained on a random subset of 8,000 CIFAR\-10 training images \(batch size 128\) for 40 epochs\. Five random seeds \(42, 123, 456, 789, 1337\) were used per learning rule\. Model activations were extracted at eight checkpoints: epochs 0, 1, 2, 5, 10, 20, 30, and 40\. Epoch 0 corresponds to the untrained random\-weight baseline\.
### 2\.4fMRI Data
We use the THINGS\-fMRI dataset\(Hebart et al\.,[2023](https://arxiv.org/html/2605.30556#bib.bib4)\), which provides blood\-oxygen\-level\-dependent \(BOLD\) responses from three human subjects viewing naturalistic object images\. We selected 720 images for which fMRI data were available across all subjects\. Responses were extracted from six regions of interest \(ROIs\): V1, V2, V3, V4, LOC, and IT\. Subject\-level representational dissimilarity matrices \(RDMs\) were computed using correlation distance\.
### 2\.5Representational Similarity Analysis
At each checkpoint, 720 THINGS images \(224×224224\\times 224pixels, ImageNet normalisation\) were passed through the model\. Layer activations were global\-average\-pooled to produce feature vectors, and model RDMs were computed using correlation distance\. Brain–model alignment was quantified as the Spearman rank correlation between the upper triangles of the model and fMRI RDMs\. We report the best\-layer alignment per ROI \(i\.e\., the layer yielding the highest Spearmanrrfor each ROI, evaluated independently at each epoch\)\.
### 2\.6Statistical Testing
All statistical comparisons use paired, one\-sided permutation tests across the five seeds, with the seed serving as the pairing unit\. With five seeds there are only25=322^\{5\}=32possible sign assignments, so the smallest attainable one\-sidedpp\-value is1/32≈0\.0311/32\\approx 0\.031; a comparison reaches this floor exactly when the effect is consistent in direction across all five seeds\. We therefore reportp=0\.031p=0\.031as evidence of a fully consistent directional effect rather than of a small tail probability, and we treat the five\-seed design as a limit on statistical resolution \(Section[4\.4](https://arxiv.org/html/2605.30556#S4.SS4)\)\. Significance was assessed atα=0\.05\\alpha=0\.05\. Cohen’sddeffect sizes are reported for key comparisons\.
## 3Results
### 3\.1Training Universally Degrades V1 Alignment
At initialisation \(epoch 0\), all models show comparable V1 alignment \(Spearmanr≈0\.09r\\approx 0\.09–0\.100\.10; Figure[1](https://arxiv.org/html/2605.30556#S3.F1)A\)\. After a single epoch of training, all learning rules show reduced V1 alignment, but the magnitude differs dramatically across rules \(Figure[1](https://arxiv.org/html/2605.30556#S3.F1)B\)\. Backpropagation shows the most severe degradation, losing 90% of its V1 alignment after one epoch \(r:0\.102→0\.011r:0\.102\\to 0\.011, paired permutation testp=0\.031p=0\.031\)\. Feedback alignment shows an intermediate drop of 49% \(r:0\.089→0\.044r:0\.089\\to 0\.044\)\. Predictive coding and STDP show the least degradation, losing only 25% and 31% respectively \(PC:r:0\.093→0\.070r:0\.093\\to 0\.070; STDP:r:0\.097→0\.067r:0\.097\\to 0\.067\)\.
Figure 1:V1 brain alignment across training\.\(A\)Spearmanrrbetween model RDMs \(best layer\) and V1 fMRI RDMs across training epochs\. Shaded regions:±\\pm1 SD across 5 seeds\. Dashed grey line: untrained baseline\.\(B\)Change in alignment relative to epoch 0\. BP \(blue\) drops most steeply; PC \(green\) and STDP \(orange\) degrade least\.\(C\)Final alignment at epoch 40\. Stars indicate significant difference from untrained baseline \(paired permutation test,p<0\.05p<0\.05\)\. Individual seed values shown as dots\.By epoch 40, the ordering stabilises at PC \(r=0\.064±0\.012r=0\.064\\pm 0\.012\)\>\>STDP \(0\.059±0\.0100\.059\\pm 0\.010\)\>\>BP \(0\.022±0\.0060\.022\\pm 0\.006\)≈\\approxFA \(0\.019±0\.0060\.019\\pm 0\.006\)\. Both PC and STDP retain significantly more V1 alignment than BP \(paired permutation test,p=0\.031p=0\.031, the resolution floor for five seeds; Cohen’sd\>5d\>5for both, reflecting the very low between\-seed variance\)\. All trained models show significantly lower V1 alignment than the untrained baseline \(p=0\.031p=0\.031for all comparisons, that is, every seed showed the same ordering; Figure[1](https://arxiv.org/html/2605.30556#S3.F1)C\)\.
### 3\.2The Degradation Pattern Generalises Across Early Visual Areas
The V1 degradation pattern extends to V2 and V3, with BP consistently showing the largest drop and PC/STDP preserving the most alignment \(Figure[2](https://arxiv.org/html/2605.30556#S3.F2)\)\. In V4, the pattern attenuates: all rules show moderate degradation with smaller differences between them\. Notably, in LOC and IT, the trends reverse or flatten \(see Section[3\.3](https://arxiv.org/html/2605.30556#S3.SS3)\)\.
Figure 2:fMRI alignment across training for all six ROIs\.Same conventions as Figure[1](https://arxiv.org/html/2605.30556#S3.F1)A\. The degradation pattern is strongest in early visual areas \(V1–V3\) and absent in higher areas \(LOC, IT\)\.
### 3\.3Opposing Trend in Object\-Selective Cortex
While training degrades alignment with early visual cortex, the opposite tendency appears, weakly, in LOC \(Figure[3](https://arxiv.org/html/2605.30556#S3.F3)\)\. Backpropagation, the rule that degrades V1 alignment most severely, shows the largest increase in LOC alignment during training \(epoch 0:r=−0\.001r=\-0\.001; epoch 40:r=0\.011r=0\.011;Δr=\+0\.011\\Delta r=\+0\.011\)\. The other rules show smaller or negligible changes \(FA:Δr=\+0\.005\\Delta r=\+0\.005; PC:Δr=−0\.001\\Delta r=\-0\.001; STDP:Δr=\+0\.001\\Delta r=\+0\.001\)\. These LOC changes are small in absolute terms and we did not subject them to a significance test, so they should be read as a suggestive tendency rather than an established effect\.
Figure 3:Opposing trends in V1 and LOC\.\(A\)V1 alignment decreases during training for all rules\.\(B\)LOC alignment increases for BP only, while local rules \(PC, STDP\) show no change\. Same y\-axis scale highlights the magnitude difference\.This dissociation suggests a trade\-off: BP’s global error signal reshapes representations throughout the network, degrading early visual structure while building task\-relevant object representations in higher layers\. Local learning rules, by contrast, lack the top\-down pressure needed to sculpt higher\-layer representations but consequently preserve early visual statistics\.
### 3\.4Seed Variability
At epoch 0, all architectures cluster tightly aroundr≈0\.09r\\approx 0\.09–0\.100\.10with low seed variability \(Figure[4](https://arxiv.org/html/2605.30556#S3.F4)A\)\. After training, the rules separate into two groups: BP and FA converge to low alignment \(r≈0\.02r\\approx 0\.02\) with low variance, while PC and STDP maintain higher alignment \(r≈0\.06r\\approx 0\.06\) with moderate variance \(Figure[4](https://arxiv.org/html/2605.30556#S3.F4)B\)\. The consistency across seeds confirms that the observed differences reflect systematic properties of the learning rules rather than random fluctuations\.
Figure 4:Seed variability\.\(A\)V1 alignment at epoch 0 \(untrained\)\. All rules cluster nearr≈0\.10r\\approx 0\.10\.\(B\)V1 alignment at epoch 40 \(trained\)\. BP and FA collapse tor≈0\.02r\\approx 0\.02; PC and STDP retainr≈0\.06r\\approx 0\.06\. Box plots with individual seed values \(circles\)\.
## 4Discussion
### 4\.1Why Does Training Degrade V1 Alignment?
Our central finding, that supervised training degrades V1 alignment across all learning rules, suggests that untrained networks capture low\-level visual statistics through their architectural inductive biases \(convolutional filters, pooling, normalisation\) rather than through learning\. Training then reshapes these representations toward task\-relevant features, moving them away from the general\-purpose visual statistics encoded by V1\. This view is consistent with work showing that explicitly aligning a network’s early layers to primate V1 reshapes its representations and improves robustness\(Safarani et al\.,[2021](https://arxiv.org/html/2605.30556#bib.bib11)\), which indicates that V1\-like structure is a specific, shapeable property rather than a generic by\-product of training\.
The key insight is that the*degree*of degradation depends on the learning rule\. BP, which computes exact gradients and propagates precise error signals across all layers, reshapes representations most aggressively\. FA, which substitutes random feedback weights, delivers noisier error signals and degrades V1 alignment less rapidly \(though it converges to similar levels by epoch 40\)\. PC and STDP, which rely on local computation without top\-down error propagation, preserve substantially more V1\-like structure throughout training\.
### 4\.2A Trade\-Off Between Early and Higher\-Level Alignment
The opposing trends in V1 \(degradation\) and LOC \(a small increase for BP\) suggest a representational trade\-off\. BP’s global gradient signal may simultaneously degrade early visual structure and build category\-selective representations\. Local learning rules avoid this trade\-off: they preserve V1 alignment but fail to develop LOC\-aligned representations, suggesting that local credit assignment is insufficient for building hierarchical object representations\.
This finding has implications for theories of cortical learning\. The brain appears to maintain strong V1 representations while also developing object selectivity in LOC and IT, a combination that none of our tested rules achieves\. This may point to a learning mechanism that combines local representational preservation with a form of hierarchical credit assignment, potentially more nuanced than any single rule tested here\.
### 4\.3Relation to Prior Work
Our results complement and extendLeutenegger \([2026](https://arxiv.org/html/2605.30556#bib.bib6)\), which compared trained learning rules at a single endpoint and found that random weights outperform all trained models at V1\. The present work reveals the*dynamics*of this phenomenon: the random\-weights advantage is not merely a static comparison but reflects an active, rapid degradation of V1\-aligned structure by training\. The finding that PC and STDP partially preserve V1 alignment is novel and connects to a broader literature on the biological plausibility of learning rules\(Whittington and Bogacz,[2017](https://arxiv.org/html/2605.30556#bib.bib12); Payeur et al\.,[2021](https://arxiv.org/html/2605.30556#bib.bib9)\)\.
### 4\.4Limitations
Several limitations warrant consideration\. First, all rules share a common simple architecture \(3 conv \+ 1 FC\)\. Deeper architectures \(e\.g\., ResNets\) may show different dynamics\. Second, the training dataset \(CIFAR\-10, 8,000 images\) is small relative to natural vision, and the networks are trained on32×3232\\times 32CIFAR\-10 images but evaluated on224×224224\\times 224THINGS images; this resolution and domain shift could itself affect the extracted representations, so the absolute alignment values should be interpreted with care\. Third, the STDP and PC implementations are simplified approximations of their biological counterparts\. Fourth, the fMRI dataset comprises only three subjects, and the five\-seed design caps the resolution of the permutation tests atp≈0\.031p\\approx 0\.031, limiting statistical power for subject\- and seed\-level analyses\. Fifth, the LOC increases were small in absolute terms and were not tested for significance, so the proposed early\-versus\-higher trade\-off should be regarded as suggestive rather than established\. Finally, the effect sizes, while large in relative terms, operate on small absolute Spearman correlations \(r<0\.10r<0\.10\), so the practical significance of these representational differences for downstream neural processing remains an open question\.
### 4\.5Conclusion
Supervised training universally degrades early visual cortex alignment, but the magnitude depends systematically on the learning rule\. Backpropagation, the least biologically plausible rule, is the most destructive, while local learning rules \(predictive coding, STDP\) preserve substantially more V1\-like structure\. This pattern reveals a fundamental tension between task optimisation and representational brain\-likeness that any theory of cortical learning must resolve\.
## Data and Code Availability
All code is available at[https://github\.com/nilsleut](https://github.com/nilsleut)\. The THINGS\-fMRI dataset is publicly available\(Hebart et al\.,[2023](https://arxiv.org/html/2605.30556#bib.bib4)\)\. Training dynamics data \(model RDMs, checkpoints, and RSA results at all milestones\) will be released upon publication\.
## Acknowledgements
This work was conducted independently\. Compute was provided by Modal \(GPU cloud\) and Kaggle Notebooks\.
## References
- Bi and Poo \[1998\]Bi, G\.\-q\. and Poo, M\.\-m\. \(1998\)\.Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type\.*Journal of Neuroscience*, 18\(24\):10464–10472\.
- Cichy et al\. \[2016\]Cichy, R\. M\., Khosla, A\., Pantazis, D\., Torralba, A\., and Oliva, A\. \(2016\)\.Comparison of deep neural networks to spatio\-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence\.*Scientific Reports*, 6:27755\.
- Hebart et al\. \[2019\]Hebart, M\. N\., Dickter, A\. H\., Kidder, A\., Kwok, W\. Y\., Corriveau, A\., Van Wicklin, C\., and Baker, C\. I\. \(2019\)\.THINGS: A database of 1,854 object concepts and more than 26,000 naturalistic object images\.*PLOS ONE*, 14\(10\):e0223792\.
- Hebart et al\. \[2023\]Hebart, M\. N\., Contier, O\., Teichmann, L\., Rockter, A\. H\., Zheng, C\. Y\., Kidder, A\., Corriveau, A\., Vaziri\-Pashkam, M\., and Baker, C\. I\. \(2023\)\.THINGS\-data, a multimodal collection of large\-scale datasets for investigating object representations in human brain and behavior\.*eLife*, 12:e82580\.
- Khaligh\-Razavi and Kriegeskorte \[2014\]Khaligh\-Razavi, S\.\-M\. and Kriegeskorte, N\. \(2014\)\.Deep supervised, but not unsupervised, models may explain IT cortical representation\.*PLOS Computational Biology*, 10\(11\):e1003915\.
- Leutenegger \[2026\]Leutenegger, N\. \(2026\)\.Untrained CNNs match backpropagation at V1: A systematic RSA comparison of four learning rules against human fMRI\.*arXiv preprint arXiv:2604\.16875*\.
- Lillicrap et al\. \[2016\]Lillicrap, T\. P\., Cownden, D\., Tweed, D\. B\., and Akerman, C\. J\. \(2016\)\.Random synaptic feedback weights support error backpropagation for deep learning\.*Nature Communications*, 7:13276\.
- Masquelier and Thorpe \[2007\]Masquelier, T\. and Thorpe, S\. J\. \(2007\)\.Unsupervised learning of visual features through spike timing dependent plasticity\.*PLOS Computational Biology*, 3\(2\):e31\.
- Payeur et al\. \[2021\]Payeur, A\., Guerguiev, J\., Zenke, F\., Richards, B\. A\., and Naud, R\. \(2021\)\.Burst\-dependent synaptic plasticity can coordinate learning in hierarchical circuits\.*Nature Neuroscience*, 24\(7\):1010–1019\.
- Rao and Ballard \[1999\]Rao, R\. P\. and Ballard, D\. H\. \(1999\)\.Predictive coding in the visual cortex: a functional interpretation of some extra\-classical receptive\-field effects\.*Nature Neuroscience*, 2\(1\):79–87\.
- Safarani et al\. \[2021\]Safarani, S\., Nix, A\., Willeke, K\., Cadena, S\. A\., Restivo, K\., Denfield, G\., Tolias, A\. S\., and Sinz, F\. H\. \(2021\)\.Towards robust vision by multi\-task learning on monkey visual cortex\.*Advances in Neural Information Processing Systems*, 34:739–751\.
- Whittington and Bogacz \[2017\]Whittington, J\. C\. and Bogacz, R\. \(2017\)\.An approximation of the error backpropagation algorithm in a predictive coding network with local Hebbian synaptic plasticity\.*Neural Computation*, 29\(5\):1229–1262\.
- Yamins et al\. \[2014\]Yamins, D\. L\., Hong, H\., Cadieu, C\. F\., Solomon, E\. A\., Seibert, D\., and DiCarlo, J\. J\. \(2014\)\.Performance\-optimized hierarchical models predict neural responses in higher visual cortex\.*Proceedings of the National Academy of Sciences*, 111\(23\):8619–8624\.Similar Articles
Backpropagation destroys V1 brain alignment in one epoch, tracking RSA alignment to fMRI across training for BP, FA, predictive coding, and STDP [R]
This paper tracks how different learning rules (backprop, feedback alignment, predictive coding, STDP) affect the alignment of CNN representations with human fMRI across training. It finds that backprop destroys V1 alignment in one epoch, while local rules preserve it, suggesting a trade-off between building higher-level representations and retaining early visual features.
Biological Plausibility and Representational Alignment of Feedback Alignment in Convolutional Networks
This paper evaluates the biological plausibility and representational alignment of feedback alignment algorithms in convolutional networks, comparing them to standard backpropagation on CIFAR-10. The authors find that modified feedback alignment methods converge on internal representations similar to those produced by backpropagation, suggesting functional success through mimicking representational geometry.
Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer
This paper investigates emergent and subliminal misalignment in LLMs through a data-centric lens, showing that harmful fine-tuning effects depend on structural properties of the data, task difficulty, pretraining composition, and training channels, with experiments comparing off-policy and on-policy distillation.
Teaching AI to see the world more like we do
Google DeepMind published a paper in Nature detailing a method to align AI visual representations with human cognitive structures, improving model robustness and reliability.
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
This paper introduces a meta-optimized approach for semantic visual decoding from fMRI signals that generalizes to novel subjects without fine-tuning, using in-context learning to infer unique neural encoding patterns from a small set of image-brain activation examples. The method achieves strong cross-subject and cross-scanner generalization without requiring anatomical alignment or stimulus overlap.