transformers

#transformers

Transformer Math Explorer [P]

Reddit r/MachineLearning ↗ · 6d ago

This interactive tool visualizes the mathematical underpinnings of transformer models through dataflow graphs, covering architectures from GPT-2 to Qwen 3.6 and various attention mechanisms.

0 favorites 0 likes

#transformers

huggingface/transformers Release 5.8.0

GitHub Releases Watchlist ↗ · 2026-05-05 Cached

Hugging Face has released version 5.8.0 of the Transformers library, a widely used open-source framework for natural language processing and deep learning.

0 favorites 0 likes

#transformers

The Scaling Properties of Implicit Deductive Reasoning in Transformers

Hugging Face Daily Papers ↗ · 2026-05-05 Cached

This research examines how deep Transformers with bidirectional masking achieve implicit deductive reasoning comparable to explicit chain-of-thought methods. The study demonstrates that algorithmically aligned models can scale reasoning capabilities across diverse graph topologies and problem widths.

0 favorites 0 likes

#transformers

huggingface/transformers Patch release v5.6.2

GitHub Releases Watchlist ↗ · 2026-04-23 Cached

Hugging Face Transformers library released patch version 5.6.2, containing minor bug fixes and maintenance updates.

0 favorites 0 likes

#transformers

@ramin_m_h: Shopify CTO: “I think in its hybrid form with Transformers, they [Liquid models] are probably the best architecture I’m…

X AI KOLs Following ↗ · 2026-04-22 Cached

Shopify CTO endorses hybrid Liquid-Transformer models as the best architecture currently available, with Microsoft executive discussing real-world use-cases.

0 favorites 0 likes

#transformers

huggingface/transformers Release v5.6.0

GitHub Releases Watchlist ↗ · 2026-04-22 Cached

Hugging Face released version 5.6.0 of its popular transformers library.

0 favorites 0 likes

#transformers

@simpreetkaur_19: Research papers you must read for AI Engineer interviews: 1. Attention is all you need (Transformers) 2. LoRA (Low rank…

X AI KOLs Timeline ↗ · 2026-04-22 Cached

A curated list of foundational AI papers recommended for interview prep, covering transformers, efficient fine-tuning, vision models, and generative networks.

0 favorites 0 likes

#transformers

@reach_vb: Attention truly is all you need

X AI KOLs Following ↗ · 2026-04-22 Cached

A playful tweet referencing the famous "Attention Is All You Need" transformer paper.

0 favorites 0 likes

#transformers

Product-of-Experts Training Reduces Dataset Artifacts in Natural Language Inference

arXiv cs.CL ↗ · 2026-04-22 Cached

This paper proposes Product-of-Experts (PoE) training to reduce dataset artifacts in Natural Language Inference, downweighting examples where biased models are overconfident. PoE nearly preserves accuracy on SNLI (89.10% vs. 89.30%) while reducing bias reliance by ~4.85 percentage points.

0 favorites 0 likes

#transformers

KV Cache Compression 900000x Beyond TurboQuant and Per-Vector Shannon Limit

Hacker News Top ↗ · 2026-04-21 Cached

A new paper proposes sequential KV cache compression using probabilistic language tries and predictive delta coding, achieving theoretical compression ratios of ~914,000× beyond TurboQuant by exploiting the sequential structure of language model tokens rather than treating vectors independently.

0 favorites 0 likes

#transformers

The PR you would have opened yourself

Hugging Face Blog ↗ · 2026-04-16 Cached

Hugging Face releases a new 'Skill' and test harness designed to help port language models from the transformers library to mlx-lm, leveraging code agents to streamline open-source contributions.

0 favorites 0 likes

#transformers

huggingface/transformers Patch release v5.5.4

GitHub Releases Watchlist ↗ · 2026-04-13 Cached

Hugging Face releases transformers library patch version v5.5.4, a routine maintenance update to the widely-used NLP/deep learning framework.

0 favorites 0 likes

#transformers

Mixture of Experts (MoEs) in Transformers

Hugging Face Blog ↗ · 2026-02-26 Cached

Hugging Face blog post explaining Mixture of Experts (MoEs) architecture in Transformers, covering the shift from dense to sparse models, weight loading optimizations, expert parallelism, and training techniques for MoE-based language models.

0 favorites 0 likes

#transformers

How is it so good ? (DALL-E Explained Pt. 2)

ML at Berkeley ↗ · 2021-04-07 Cached

This article explains the architecture of DALL-E, focusing on its transformer component that correlates language with discrete image representations to generate high-quality images from text prompts.

0 favorites 0 likes

#transformers

Generative language modeling for automated theorem proving

OpenAI Blog ↗ · 2020-09-07 Cached

OpenAI presents GPT-f, a transformer-based automated theorem prover for the Metamath formalization language, which discovered new short proofs accepted into the main Metamath library — marking the first time a deep-learning system contributed proofs adopted by a formal mathematics community.

0 favorites 0 likes

#transformers

Improving language understanding with unsupervised learning

OpenAI Blog ↗ · 2018-06-11 Cached

OpenAI presents a two-stage approach for improving language understanding: pretraining a transformer model on large unsupervised datasets using language modeling, then fine-tuning on smaller supervised datasets for specific tasks. The method achieves state-of-the-art results across diverse tasks including commonsense reasoning, semantic similarity, and reading comprehension with minimal hyperparameter tuning.

0 favorites 0 likes

#transformers

vaibhavs10/incredibly-fast-whisper

Replicate Explore ↗ · 5d ago Cached

A highly optimized version of OpenAI's Whisper Large v3 using Transformers, Optimum, and Flash Attention 2, capable of transcribing 150 minutes of audio in under 2 minutes on Replicate.

0 favorites 0 likes

transformers

Submit Feedback