model-architecture

Tag

Cards List
#model-architecture

What is the point of MoE models, beyond being faster?

Reddit r/LocalLLaMA · 2026-05-19

A discussion about the advantages of Mixture of Experts (MoE) models over dense models beyond speed, considering RAM constraints and scaling limits.

0 favorites 0 likes
#model-architecture

Delta Attention Residuals

Hugging Face Daily Papers · 2026-05-13 Cached

Delta Attention Residuals improve layer-wise routing in transformer models by attending to feature changes (deltas) rather than cumulative hidden states, achieving 1.7–8.2% validation perplexity gains across scales from 220M to 7.6B parameters.

0 favorites 0 likes
#model-architecture

Interfaze: A new model architecture built for high accuracy at scale

Hacker News Top · 2026-05-11 Cached

Interfaze introduces a hybrid AI model architecture combining CNN/DNN specialization with transformer capabilities, achieving superior accuracy on deterministic tasks like OCR and translation while maintaining cost efficiency at scale.

0 favorites 0 likes
#model-architecture

Why there isn't any top LLM providers investing on diffusion LLM?

Reddit r/singularity · 2026-05-11

This article questions why major LLM providers are not investing in Diffusion LLMs despite recent advancements like Mercury 2. It explores potential fundamental issues or hardware bottlenecks hindering broader adoption.

0 favorites 0 likes
#model-architecture

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

Hugging Face Daily Papers · 2026-05-07 Cached

UniPool introduces a shared expert pool architecture for Mixture-of-Experts models, reducing parameter growth with depth while improving efficiency and performance over standard MoE baselines.

0 favorites 0 likes
#model-architecture

Towards Intrinsic Interpretability of Large Language Models: A Survey of Design Principles and Architectures

arXiv cs.CL · 2026-04-20 Cached

A comprehensive survey reviewing recent advances in intrinsic interpretability for Large Language Models, categorizing approaches into five design paradigms: functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction. The paper addresses the challenge of building transparency directly into model architectures rather than relying on post-hoc explanation methods.

0 favorites 0 likes
← Back to home

Submit Feedback