transformers

#transformers

We are hitting a wall trying to force transformers to do actual logic [D]

Reddit r/MachineLearning ↗ · 4h ago

The author expresses frustration with the industry's reliance on prompt engineering and scaling to fix logical reasoning deficits in transformer-based LLMs, arguing that these probabilistic models fundamentally lack the architecture for deterministic logic.

0 favorites 0 likes

#transformers

Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident Hallucination

arXiv cs.AI ↗ · yesterday Cached

This paper presents a unified geometric framework for understanding transformer memory failures, distinguishing between conflict arbitration and hallucination through hidden-state attractor basins. It demonstrates that geometric margin is a superior diagnostic for detecting these failures compared to output entropy, particularly as model scale increases.

0 favorites 0 likes

#transformers

Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning

arXiv cs.LG ↗ · yesterday Cached

This research paper investigates how shortcut solutions learned by Transformer models, specifically BERT, impair their ability to perform continual compositional reasoning. It contrasts BERT with ALBERT, finding that ALBERT's recurrent nature offers better inductive bias for continual learning tasks.

0 favorites 0 likes

#transformers

TIDE: Every Layer Knows the Token Beneath the Context

arXiv cs.CL ↗ · yesterday Cached

This paper introduces TIDE, a method that addresses the Rare Token and Contextual Collapse problems in LLMs by injecting token identity into every layer via Embedding Memory. The authors demonstrate theoretical and empirical improvements across language modeling and downstream tasks.

0 favorites 0 likes

#transformers

On Semantic Loss Fine-Tuning Approach for Preventing Model Collapse in Causal Reasoning

arXiv cs.LG ↗ · yesterday Cached

This paper identifies a critical 'model collapse' issue in standard fine-tuning for causal reasoning and proposes a semantic loss function with graph-based logical constraints to prevent it.

0 favorites 0 likes

#transformers

@YouJiacheng: > Directly applying RoPE rotation to KV will leak positional information into value matrix V 科学空间亦有记载 https://kexue.fm/…

X AI KOLs Timeline ↗ · 2d ago Cached

A social media post discusses the technical implication of applying RoPE rotation directly to KV caches, noting that it leaks positional information into the value matrix V.

0 favorites 0 likes

#transformers

Transformer Math Explorer [P]

Reddit r/MachineLearning ↗ · 2d ago

This interactive tool visualizes the mathematical underpinnings of transformer models through dataflow graphs, covering architectures from GPT-2 to Qwen 3.6 and various attention mechanisms.

0 favorites 0 likes

#transformers

huggingface/transformers Release 5.8.0

GitHub Releases Watchlist ↗ · 4d ago Cached

Hugging Face has released version 5.8.0 of the Transformers library, a widely used open-source framework for natural language processing and deep learning.

0 favorites 0 likes

#transformers

The Scaling Properties of Implicit Deductive Reasoning in Transformers

Hugging Face Daily Papers ↗ · 4d ago Cached

This research examines how deep Transformers with bidirectional masking achieve implicit deductive reasoning comparable to explicit chain-of-thought methods. The study demonstrates that algorithmically aligned models can scale reasoning capabilities across diverse graph topologies and problem widths.

0 favorites 0 likes

#transformers

huggingface/transformers Patch release v5.6.2

GitHub Releases Watchlist ↗ · 2026-04-23 Cached

Hugging Face Transformers library released patch version 5.6.2, containing minor bug fixes and maintenance updates.

0 favorites 0 likes

#transformers

@ramin_m_h: Shopify CTO: “I think in its hybrid form with Transformers, they [Liquid models] are probably the best architecture I’m…

X AI KOLs Following ↗ · 2026-04-22 Cached

Shopify CTO endorses hybrid Liquid-Transformer models as the best architecture currently available, with Microsoft executive discussing real-world use-cases.

0 favorites 0 likes

#transformers

huggingface/transformers Release v5.6.0

GitHub Releases Watchlist ↗ · 2026-04-22 Cached

Hugging Face released version 5.6.0 of its popular transformers library.

0 favorites 0 likes

#transformers

@simpreetkaur_19: Research papers you must read for AI Engineer interviews: 1. Attention is all you need (Transformers) 2. LoRA (Low rank…

X AI KOLs Timeline ↗ · 2026-04-22 Cached

A curated list of foundational AI papers recommended for interview prep, covering transformers, efficient fine-tuning, vision models, and generative networks.

0 favorites 0 likes

#transformers

@reach_vb: Attention truly is all you need

X AI KOLs Following ↗ · 2026-04-22 Cached

A playful tweet referencing the famous "Attention Is All You Need" transformer paper.

0 favorites 0 likes

#transformers

Product-of-Experts Training Reduces Dataset Artifacts in Natural Language Inference

arXiv cs.CL ↗ · 2026-04-22 Cached

This paper proposes Product-of-Experts (PoE) training to reduce dataset artifacts in Natural Language Inference, downweighting examples where biased models are overconfident. PoE nearly preserves accuracy on SNLI (89.10% vs. 89.30%) while reducing bias reliance by ~4.85 percentage points.

0 favorites 0 likes

#transformers

KV Cache Compression 900000x Beyond TurboQuant and Per-Vector Shannon Limit

Hacker News Top ↗ · 2026-04-21 Cached

A new paper proposes sequential KV cache compression using probabilistic language tries and predictive delta coding, achieving theoretical compression ratios of ~914,000× beyond TurboQuant by exploiting the sequential structure of language model tokens rather than treating vectors independently.

0 favorites 0 likes

#transformers

The PR you would have opened yourself

Hugging Face Blog ↗ · 2026-04-16 Cached

Hugging Face releases a new 'Skill' and test harness designed to help port language models from the transformers library to mlx-lm, leveraging code agents to streamline open-source contributions.

0 favorites 0 likes

#transformers

huggingface/transformers Patch release v5.5.4

GitHub Releases Watchlist ↗ · 2026-04-13 Cached

Hugging Face releases transformers library patch version v5.5.4, a routine maintenance update to the widely-used NLP/deep learning framework.

0 favorites 0 likes

#transformers

Mixture of Experts (MoEs) in Transformers

Hugging Face Blog ↗ · 2026-02-26 Cached

Hugging Face blog post explaining Mixture of Experts (MoEs) architecture in Transformers, covering the shift from dense to sparse models, weight loading optimizations, expert parallelism, and training techniques for MoE-based language models.

0 favorites 0 likes

#transformers

How is it so good ? (DALL-E Explained Pt. 2)

ML at Berkeley ↗ · 2021-04-07 Cached

This article explains the architecture of DALL-E, focusing on its transformer component that correlates language with discrete image representations to generate high-quality images from text prompts.

0 favorites 0 likes

transformers

Submit Feedback