Tag
This interactive tool visualizes the mathematical underpinnings of transformer models through dataflow graphs, covering architectures from GPT-2 to Qwen 3.6 and various attention mechanisms.
Hugging Face has released version 5.8.0 of the Transformers library, a widely used open-source framework for natural language processing and deep learning.
This research examines how deep Transformers with bidirectional masking achieve implicit deductive reasoning comparable to explicit chain-of-thought methods. The study demonstrates that algorithmically aligned models can scale reasoning capabilities across diverse graph topologies and problem widths.
Hugging Face Transformers library released patch version 5.6.2, containing minor bug fixes and maintenance updates.
Shopify CTO endorses hybrid Liquid-Transformer models as the best architecture currently available, with Microsoft executive discussing real-world use-cases.
Hugging Face released version 5.6.0 of its popular transformers library.
A curated list of foundational AI papers recommended for interview prep, covering transformers, efficient fine-tuning, vision models, and generative networks.
A playful tweet referencing the famous "Attention Is All You Need" transformer paper.
This paper proposes Product-of-Experts (PoE) training to reduce dataset artifacts in Natural Language Inference, downweighting examples where biased models are overconfident. PoE nearly preserves accuracy on SNLI (89.10% vs. 89.30%) while reducing bias reliance by ~4.85 percentage points.
A new paper proposes sequential KV cache compression using probabilistic language tries and predictive delta coding, achieving theoretical compression ratios of ~914,000× beyond TurboQuant by exploiting the sequential structure of language model tokens rather than treating vectors independently.
Hugging Face releases a new 'Skill' and test harness designed to help port language models from the transformers library to mlx-lm, leveraging code agents to streamline open-source contributions.
Hugging Face releases transformers library patch version v5.5.4, a routine maintenance update to the widely-used NLP/deep learning framework.
Hugging Face blog post explaining Mixture of Experts (MoEs) architecture in Transformers, covering the shift from dense to sparse models, weight loading optimizations, expert parallelism, and training techniques for MoE-based language models.
This article explains the architecture of DALL-E, focusing on its transformer component that correlates language with discrete image representations to generate high-quality images from text prompts.
OpenAI presents GPT-f, a transformer-based automated theorem prover for the Metamath formalization language, which discovered new short proofs accepted into the main Metamath library — marking the first time a deep-learning system contributed proofs adopted by a formal mathematics community.
OpenAI presents a two-stage approach for improving language understanding: pretraining a transformer model on large unsupervised datasets using language modeling, then fine-tuning on smaller supervised datasets for specific tasks. The method achieves state-of-the-art results across diverse tasks including commonsense reasoning, semantic similarity, and reading comprehension with minimal hyperparameter tuning.
A highly optimized version of OpenAI's Whisper Large v3 using Transformers, Optimum, and Flash Attention 2, capable of transcribing 150 minutes of audio in under 2 minutes on Replicate.