reparameterization

#reparameterization

@murage_kibicho: I added a neural sorting algorithm. It builds on the reparametarization trick from Stable diffusion! It's called a Gumb…

X AI KOLs Timeline ↗ · 3d ago Cached

A Python implementation of the Gumbel-Sinkhorn neural network for sorting lists of numbers, based on the 2018 paper by Mena et al.

0 favorites 0 likes

#reparameterization

@HanGuo97: LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes…

X AI KOLs Following ↗ · 2026-05-21 Cached

CODA reparameterizes memory-bound operations in LLM training to fuse them into the matmul epilogue, achieving near state-of-the-art performance with LLM-generated kernels.

0 favorites 0 likes

#reparameterization

Are Flat Minima an Illusion?

arXiv cs.LG ↗ · 2026-05-08 Cached

This paper challenges the common belief that flat minima cause better generalization in neural networks, arguing that 'weakness'—a reparameterization-invariant measure of function simplicity—is the true driver. Empirical results on MNIST and Fashion-MNIST show that weakness predicts generalization while sharpness anticorrelates, and the large-batch generalization advantage vanishes as training data increases.

0 favorites 0 likes

reparameterization

@murage_kibicho: I added a neural sorting algorithm. It builds on the reparametarization trick from Stable diffusion! It's called a Gumb…

@HanGuo97: LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes…

Are Flat Minima an Illusion?

Submit Feedback