transformers

Tag

Cards List
#transformers

@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2069424192274252094

X AI KOLs Timeline · 6h ago Cached

Microsoft's NextLat introduces a training objective that rewards belief-state representations instead of relying solely on next-token prediction, pushing models toward compact world models for better generalization.

0 favorites 0 likes
#transformers

@TheAhmadOsman: INCREDIBLE RESOURCE The MOST COMPLETE GUIDE for understanding LLMs from first principles is now available online to rea…

X AI KOLs Timeline · yesterday Cached

A comprehensive free guide explaining LLMs from first principles, covering tokens, transformers, attention, fine-tuning, and local deployment.

0 favorites 0 likes
#transformers

@omershapira: TIL Jurafsky & Martin, the textbook I used for Computational Linguistics in undergrad many years ago (when TAU didn't o…

X AI KOLs Following · 2d ago Cached

The third edition of the Speech and Language Processing textbook by Jurafsky and Martin was released in January 2026, featuring a clear explanation of Transformers and various updates including new chapters on ASR, TTS, and DPO.

0 favorites 0 likes
#transformers

@antoniolupetti: "Transformers" by Daniel Jurafsky and James H. Martin is one of the clearest and most mathematically grounded introduct…

X AI KOLs Timeline · 5d ago Cached

A tweet highlights the Transformer architecture chapter from Jurafsky and Martin's textbook, praising its clear and mathematically grounded explanation of self-attention, multi-head attention, and related mechanisms.

0 favorites 0 likes
#transformers

Dual Dimensionality for Local and Global Attention

arXiv cs.CL · 5d ago Cached

Proposes Distance-Adaptive Representation (DAR) which reduces key-value dimensionality for distant tokens while preserving full dimensionality for nearby tokens, improving KV cache efficiency without performance loss.

0 favorites 0 likes
#transformers

@markchen90: Very excited to welcome @NoamShazeer to OpenAI as our new lead for architecture research! His work on transformers, MoE…

X AI KOLs Timeline · 5d ago Cached

Noam Shazeer, a key researcher behind transformers and MoE, is joining OpenAI as head of architecture research, moving from Google.

0 favorites 0 likes
#transformers

Is it agentic enough? Benchmarking open models on your own tooling

Hugging Face Blog · 5d ago Cached

This blog post introduces a benchmark methodology for evaluating how well open models perform on agentic coding tasks, focusing not just on accuracy but on the efficiency of the agent's process. It provides a customizable tooling harness using the pi coding agent and tests across models and library revisions.

0 favorites 0 likes
#transformers

Next-Latent Prediction Transformers [R]

Reddit r/MachineLearning · 6d ago

Microsoft Research introduces Next-Latent Prediction (NextLat), a self-supervised method that trains transformers to predict their own next latent state, enabling compact world models for reasoning and planning and achieving up to 3.3x faster inference via self-speculative decoding.

0 favorites 0 likes
#transformers

An expressivity analysis of hierarchical modelling in deep transformers via bounded-depth grammars

arXiv cs.CL · 6d ago Cached

This paper provides a theoretical analysis of deep transformers' ability to model hierarchical structures using bounded-depth context-free grammars, constructing explicit positional-attention transformers that encode grammatical states in linearly separable subspaces.

0 favorites 0 likes
#transformers

MorphStrata: Layer-Specific Perturbations for Generating Morphence Students in Time-Series Moving Target Defense

arXiv cs.LG · 6d ago Cached

MorphStrata introduces a layer-specific stochastic noise injection strategy for generating diverse student models in a Moving Target Defense framework to enhance adversarial robustness in time-series forecasting, achieving up to 97.97% improvement in RMSE under BIM attacks with minimal training overhead.

0 favorites 0 likes
#transformers

Models Take Notes at Prefill: KV Cache Can Be Editable and Composable

arXiv cs.LG · 6d ago Cached

This paper proposes that the KV cache in transformers acts as a notebook of memoized conclusions, enabling surgical editing and composition without full recomputation. The method achieves significant latency reductions while preserving decision equivalence across model scales.

0 favorites 0 likes
#transformers

@machinestein: ICML 2026: Latent Reasoning in TRMs is Secretly a Policy Improvement Operator Why does recursive reasoning, especially …

X AI KOLs Timeline · 2026-06-16 Cached

The paper reveals that latent reasoning in transformer-based reasoning models (TRMs) functions as a policy improvement operator, and proposes an algorithm that enhances learning and inference efficiency by up to 18x.

0 favorites 0 likes
#transformers

Recurrent Reasoning on Symbolic Puzzles with Sequence Models

arXiv cs.AI · 2026-06-16 Cached

This paper introduces RecurrReason, a difficulty-controlled benchmark of four symbolic logic puzzles to evaluate multi-step reasoning in sequence models. Fine-tuning experiments on T5 and GPT-2 show that architecture determines success more than scale, and that pre-training transfer depends on local transition structure.

0 favorites 0 likes
#transformers

Transformers Learn the Mestre-Nagao Heuristic

arXiv cs.LG · 2026-06-16 Cached

This paper trains a two-layer transformer encoder to classify rational elliptic curves by rank from Frobenius traces, achieving >99% accuracy. Mechanistic interpretability reveals the model learns the Mestre-Nagao heuristic and concentrates attention on prime positions, demonstrating that transformers can learn number-theoretic algorithms.

0 favorites 0 likes
#transformers

Leveraging Physiological Signals to Predict Exam Outcomes with Machine Learning

arXiv cs.LG · 2026-06-16 Cached

This study investigates machine learning models to predict exam outcomes using physiological data such as electrodermal activity, heart rate, and skin temperature, finding that both deep learning approaches and simpler models like random forests can be effective.

0 favorites 0 likes
#transformers

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

Hugging Face Daily Papers · 2026-06-16 Cached

LoopCoder-v2 proposes Parallel Loop Transformers (PLT) for efficient test-time computation scaling in code generation, showing that two loops yield significant gains while more loops cause diminishing returns and positional mismatch costs.

0 favorites 0 likes
#transformers

Variable-Width Transformers

Hugging Face Daily Papers · 2026-06-16 Cached

Proposes a nonuniform width allocation transformer (hourglass shape) that outperforms uniform baselines in language modeling, reducing FLOPs and KV cache size.

0 favorites 0 likes
#transformers

@che_shr_cat: 1/ Standard transformers have a fundamental topological flaw: they cannot track dynamic states over time without runnin…

X AI KOLs Timeline · 2026-06-15 Cached

This thread argues that standard transformers have a topological flaw: once a state representation reaches the top layer, they cannot update beliefs over time, causing collapse as depth increases.

0 favorites 0 likes
#transformers

The Transformer Pill

Reddit r/ArtificialInteligence · 2026-06-12

A reflection on the broad implications of transformer architectures beyond LLMs, including potential impacts on linguistics, genetics, and causal modeling, comparing their significance to the Haber-Bosch process.

0 favorites 0 likes
#transformers

An End-to-End Hybrid Framework for Rumour Detection in Low-Resources Algerian Dialect

arXiv cs.CL · 2026-06-12 Cached

This paper presents an end-to-end hybrid framework for rumour detection in low-resource Algerian dialect social media content, achieving an F1-score of 0.84 by combining transformer embeddings with a classical classifier.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback