@che_shr_cat: 1/ What if you could train a model on totally benign-looking Wikipedia articles, but secretly force its internal weight…
Summary
This thread presents a technique to encode a functional QR code into neural network weights using natural language text during training, enabling hidden information embedding in models trained on benign data.
View Cached Full Text
Cached at: 06/16/26, 01:09 AM
1/ What if you could train a model on totally benign-looking Wikipedia articles, but secretly force its internal weights to encode a fully functional QR code?
This is now possible. We can program neural network weights using natural words.
2/ In “Synthetic Data for any Differentiable Target”, Tristan Thrush, Christopher Potts, Tatsunori Hashimoto, and team introduce Dataset Policy Gradient (DPG).
It is a new RL primitive that optimizes synthetic text generators for downstream model targets.
3/ Directly training a downstream model from scratch to get a dataset-level RL reward is computationally impossible.
DPG bypasses this. It treats individual synthetic examples as actions and computes example-level metagradient rewards through the training trajectory.
4/ The core trick: virtual loss weights.
The algorithm applies a virtual weight to each synthetic text. By taking the gradient of the downstream target metric with respect to these weights, it gets a precise reward signal for every single generated token sequence.
5/ A fascinating technical insight from the paper: standard SGD completely fails here.
The meta-optimization only succeeds when using Adam for the inner loop. The metagradients must track Adam’s running second-moment states to get high-fidelity training signals.
6/ The results are wild.
DPG achieves 100% accuracy in programming a target model’s language model head weights to reconstruct a QR code.
It also optimized synthetic data that drastically improved multilingual performance over naive baselines.
7/ The catch? Massive memory overhead.
Backpropagating through multi-step training means storing the entire computational graph.
For large LLMs, the authors had to restrict the inner loop to a single step. Also, abstract targets can cause text quality to decay.
8/ This is a double-edged sword.
For alignment, it allows high-precision steering of model behavior through curated data.
For security, it is a nightmare. It enables clean-label data poisoning that is virtually impossible to detect by human inspection.
9/ If synthetic data can be optimized to program weights directly, we need to rethink how we audit training runs.
Read the full technical breakdown: https://arxiviq.substack.com/p/synthetic-data-for-any-differentiable…
Paper: https://arxiv.org/abs/2604.08423
What are your thoughts on this optimization vector?
10/ I also illustrated these concepts in a short visual comic — sometimes seeing the loop makes the math click instantly.
#MachineLearning
1/ Standard transformers have a fundamental topological flaw: they cannot track dynamic states over time without running out of layers.
Once a state representation reaches the top layer of the feedforward stack, the model’s ability to update its belief collapses.
Similar Articles
Could Open Models be trained to secretly go rogue?
A discussion on whether open-weight AI models could be secretly trained with backdoors that activate upon trigger phrases or dates, potentially allowing unauthorized data exfiltration through tool-use harnesses.
Understanding neural networks through sparse circuits
OpenAI researchers present methods for training sparse neural networks that are easier to interpret by forcing most weights to zero, enabling the discovery of small, disentangled circuits that can explain model behavior while maintaining performance. This work aims to advance mechanistic interpretability as a complement to post-hoc analysis of dense networks and support AI safety goals.
Channel-Level Semantic Perturbations: Unlearnable Examples for Diverse Training Paradigms
This paper systematically investigates unlearnable examples under diverse training paradigms, revealing that pretrained weights weaken existing methods, and proposes Shallow Semantic Camouflage (SSC) to maintain unlearnability by generating perturbations in a semantically valid subspace.
Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics
This paper investigates how weight decay acts as a control parameter for transitioning between memorization and generalization in transformers trained on modular arithmetic, and introduces two cheap online diagnostic metrics from attention activations that track these dynamics.
@omarsar0: https://x.com/omarsar0/status/2057114824467792189
This article describes using Fireworks Agent to automate the fine-tuning of a small open-weight model to generate wiki-style summaries, enabling a self-improving agent loop where model training becomes a callable step.