parameter-efficiency

#parameter-efficiency

Communication Dynamics Neural Networks: FFT-Diagonalized Layers for Improved Hessian Conditioning at Reduced Parameter Count

arXiv cs.LG ↗ · yesterday Cached

This paper introduces CDLinear, a block-circulant neural network layer that reduces parameter count and improves Hessian conditioning via FFT diagonalization, validated on MNIST with theoretical proofs.

0 favorites 0 likes

#parameter-efficiency

Forgive my ignorance but how is a 27B model better than 397B?

Reddit r/LocalLLaMA ↗ · 2026-04-22

User questions how Qwen's 27B dense model can outperform its 397B MoE variant, sparking discussion on MoE efficiency versus dense model quality.

0 favorites 0 likes

#parameter-efficiency

@aakashgupta: Karpathy told Dwarkesh that a 1 billion parameter model, trained on clean data, could hit the intelligence of today's 1…

X AI KOLs Timeline ↗ · 2026-04-22 Cached

Andrej Karpathy claimed to Dwarkesh Patel that a 1B-parameter model trained on ultra-clean data could match today's 1.8T-parameter frontier models, implying 1,800× effective compression.

0 favorites 0 likes

#parameter-efficiency

ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning

arXiv cs.CL ↗ · 2026-04-22 Cached

ShadowPEFT introduces a centralized parameter-efficient fine-tuning method that uses a depth-shared shadow module to refine transformer layer representations, matching or outperforming LoRA/DoRA with comparable trainable parameters.

0 favorites 0 likes

parameter-efficiency

Communication Dynamics Neural Networks: FFT-Diagonalized Layers for Improved Hessian Conditioning at Reduced Parameter Count

Forgive my ignorance but how is a 27B model better than 397B?

@aakashgupta: Karpathy told Dwarkesh that a 1 billion parameter model, trained on clean data, could hit the intelligence of today's 1…

ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning

Submit Feedback