model-architecture

#model-architecture

Mix, Don't Pick: Why Synthetic Corpus Composition Matters for Time Series Foundation Model Pretraining

arXiv cs.LG ↗ · 5d ago Cached

This paper systematically evaluates 11 synthetic time-series generators for foundation model pretraining and finds that generator rankings are not stable across architectures, but an equal-weight mixture of all generators matches or beats the best individual. Blending this mixture with real data yields the strongest pretraining corpora, reframing synthetic pretraining as a corpus composition problem rather than a generator selection problem.

0 favorites 0 likes

#model-architecture

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Hugging Face Daily Papers ↗ · 5d ago Cached

Researchers propose a novel router redesign for Mixture-of-Experts models that aligns router rows with principal singular directions using Manifold Power Iteration, improving model effectiveness.

0 favorites 0 likes

#model-architecture

[Opinion] Gemma4-12B means that Google is going hard after the market of IoT and mobile and we're helping them

Reddit r/LocalLLaMA ↗ · 2026-06-05

An opinion piece argues that Google's Gemma4-12B model is strategically designed for IoT and mobile devices within the Android ecosystem, not just laptops as marketed, prioritizing low-latency speech and video processing over quality.

0 favorites 0 likes

#model-architecture

What is the point of MoE models, beyond being faster?

Reddit r/LocalLLaMA ↗ · 2026-05-19

A discussion about the advantages of Mixture of Experts (MoE) models over dense models beyond speed, considering RAM constraints and scaling limits.

0 favorites 0 likes

#model-architecture

Delta Attention Residuals

Hugging Face Daily Papers ↗ · 2026-05-13 Cached

Delta Attention Residuals improve layer-wise routing in transformer models by attending to feature changes (deltas) rather than cumulative hidden states, achieving 1.7–8.2% validation perplexity gains across scales from 220M to 7.6B parameters.

0 favorites 0 likes

#model-architecture

Interfaze: A new model architecture built for high accuracy at scale

Hacker News Top ↗ · 2026-05-11 Cached

Interfaze introduces a hybrid AI model architecture combining CNN/DNN specialization with transformer capabilities, achieving superior accuracy on deterministic tasks like OCR and translation while maintaining cost efficiency at scale.

0 favorites 0 likes

#model-architecture

Why there isn't any top LLM providers investing on diffusion LLM?

Reddit r/singularity ↗ · 2026-05-11

This article questions why major LLM providers are not investing in Diffusion LLMs despite recent advancements like Mercury 2. It explores potential fundamental issues or hardware bottlenecks hindering broader adoption.

0 favorites 0 likes

#model-architecture

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

Hugging Face Daily Papers ↗ · 2026-05-07 Cached

UniPool introduces a shared expert pool architecture for Mixture-of-Experts models, reducing parameter growth with depth while improving efficiency and performance over standard MoE baselines.

0 favorites 0 likes

#model-architecture

Towards Intrinsic Interpretability of Large Language Models: A Survey of Design Principles and Architectures

arXiv cs.CL ↗ · 2026-04-20 Cached

A comprehensive survey reviewing recent advances in intrinsic interpretability for Large Language Models, categorizing approaches into five design paradigms: functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction. The paper addresses the challenge of building transparency directly into model architectures rather than relying on post-hoc explanation methods.

0 favorites 0 likes

model-architecture

Submit Feedback