transformer

#transformer

The KV-cache wall: why fixed-size memory sequence models keep coming back

Reddit r/ArtificialInteligence ↗ · 4d ago

Explores the growing memory bottleneck of KV-cache in transformer inference, explaining why alternative architectures with fixed-size memory like Mamba and RWKV are gaining renewed attention.

0 favorites 0 likes

#transformer

Optimizing Abstractive Summarization With Fine-Tuned PEGASUS

arXiv cs.CL ↗ · 4d ago Cached

This paper presents fine-tuning of PEGASUS on the XL-Sum English corpus, achieving state-of-the-art results with significant improvements over the baseline mT5 model across ROUGE scores.

0 favorites 0 likes

#transformer

Multi-Stream Temporal Fusion for Financial Fraud Detection

arXiv cs.LG ↗ · 4d ago Cached

Proposes the Multi-Stream Fraud Transformer (MSFT) for financial fraud detection, which independently encodes transaction, login, and risk event streams using Transformers and fuses them with time-aware positional encoding and gated fusion, achieving 0.9961 AUROC on a large dataset.

0 favorites 0 likes

#transformer

Why Do Accumulated Transformations Extrapolate?

arXiv cs.LG ↗ · 4d ago Cached

This paper investigates why accumulated token-dependent orthogonal transformations, such as those used in PaTH Attention and a simplified variant with SO(2) rotations, enable length extrapolation in transformers. It proves that such transformations become incoherent after a finite number of steps, suppressing attention to distant tokens, and shows both theoretically and experimentally that this mechanism improves extrapolation but eventually degrades at extreme context lengths.

0 favorites 0 likes

#transformer

Towards Scalable Multi-Task Reinforcement Learning with Large Decision Models

arXiv cs.LG ↗ · 4d ago Cached

This paper introduces LDM-v0, a large decision model trained offline on trajectories from thousands of diverse reinforcement learning environments, demonstrating that a single transformer policy can match the performance of task-specific policies across robotics, autonomous driving, inventory management, cybersecurity, trading, and video games.

0 favorites 0 likes

#transformer

@Phoenixyin13: I think this is a top-notch work in ICML 2026. The attention mechanism of traditional Transformers is essentially point-to-point matching: it cuts input into a bunch of tokens (discrete points), computes similarity between Query and Key, and then weights the Value. In NLP...

X AI KOLs Timeline ↗ · 4d ago Cached

Introduces the ICML 2026 paper Functional Attention, which treats functions as first-class citizens and replaces softmax point-to-point similarity with structured linear operators. It addresses issues of discretization, resolution sensitivity, and high computational complexity in traditional Transformers when handling continuous functions. Achieves or surpasses SOTA in tasks like PDE solving and 3D segmentation, and exhibits strong OOD generalization.

0 favorites 0 likes

#transformer

@agisummitai: Speaker Spotlight: Christopher Manning If you've used an LLM, you've used his research. Christopher Manning is one of t…

X AI KOLs Following ↗ · 4d ago Cached

Christopher Manning is spotlighted as a keynote speaker at the AGI Summit, highlighting his pioneering research in NLP, including GloVe and the attention mechanism, as well as his role at Stanford.

0 favorites 0 likes

#transformer

High Dimensional, Dynamic Rotary Positional Embedding [P]

Reddit r/MachineLearning ↗ · 4d ago

Introduces HDD-RoPE, an extension of rotary positional embeddings that uses high-dimensional chunks and data-dependent rotation rates, showing faster convergence on TinyStories compared to xPos.

0 favorites 0 likes

#transformer

@s_scardapane: The Transformer Cookbook by @pentagonalize @davidweichiang et al. A beautiful introduction to "hardcoding" algorithms…

X AI KOLs Timeline ↗ · 5d ago Cached

A tweet introducing 'The Transformer Cookbook', a paper that provides a beautiful introduction to hardcoding algorithms (addition, lookup, branching) inside transformer weights, following the RASP paper.

0 favorites 0 likes

#transformer

An LLM-based Two-Stage Transformer Framework for Cross-Domain Bearing Fault Diagnosis with Limited Data

arXiv cs.LG ↗ · 5d ago Cached

Proposes a knowledge-guided two-stage transfer learning framework using a lightweight GPT-2-style Transformer for cross-domain bearing fault diagnosis with limited data, achieving 92.61% accuracy with only 10% labeled data.

0 favorites 0 likes

#transformer

Parallel Manifold Steering: Efficient Adaptation of Large Associative Memories via Residual Energy Shaping

arXiv cs.LG ↗ · 5d ago Cached

This paper proposes H-Res, a method to adapt large transformer models by shaping the energy landscape of associative memories without modifying weights or adding prompts, preserving memory capacity and outperforming LoRA.

0 favorites 0 likes

#transformer

Lightweight Transformer Models for On-Device Fault Detection: A Benchmark Study on Resource-Constrained Deployment

arXiv cs.LG ↗ · 5d ago Cached

A benchmark study comparing traditional machine learning methods (Random Forest, XGBoost, SVM, Logistic Regression) against lightweight transformer variants (DistilBERT, TinyBERT, MobileBERT) for on-device fault detection across three public datasets. Traditional ML offers competitive accuracy at far smaller resource footprints, while TinyBERT-4L is the most deployment-friendly transformer.

0 favorites 0 likes

#transformer

NeuroSonic: Conditional Flow Matching for EEG-to-Speech Reconstruction

arXiv cs.LG ↗ · 5d ago Cached

NeuroSonic introduces a conditional flow-matching framework for reconstructing continuous speech from EEG signals, addressing the structural mismatch between neural and acoustic data by learning a deterministic probability-flow velocity field. It achieves up to 26.3% improvement in perceptual quality over existing GAN, diffusion, and mean-flow baselines on cross-subject benchmarks.

0 favorites 0 likes

#transformer

Deciphering Fingerprints of 3D Molecular Surfaces for Accurate Epitope Prediction

arXiv cs.LG ↗ · 5d ago Cached

SurfBind, a surface-centric learning framework for epitope prediction, uses Transformer-based architecture with patch-level surface modeling and binder-aware cross-attention to achieve state-of-the-art performance on epitope identification benchmarks.

0 favorites 0 likes

#transformer

AutoSpecNER: A Fine-Grained Named Entity Recognition Dataset for Vehicle Specification Extraction

arXiv cs.CL ↗ · 5d ago Cached

Introduces AutoSpecNER, an expert-annotated dataset for fine-grained named entity recognition in vehicle listings, with 659 advertisements annotated across 15 entity types. Benchmark results show DeBERTa achieves 90% micro-F1, outperforming rule-based and LLM approaches.

0 favorites 0 likes

#transformer

Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment

arXiv cs.CL ↗ · 5d ago Cached

A comprehensive survey of transformer-based language models covering architectures, applications across domain verticals (healthcare, finance, legal, etc.), and critical assessment of trade-offs including compute cost, alignment, and data provenance.

0 favorites 0 likes

#transformer

SURGELLM: Rethinking Multi-Task Evaluation through Task-Aware Feature Gating with Class-Balanced Normalization

arXiv cs.CL ↗ · 5d ago Cached

SURGeLLM introduces a unified transformer framework with surgical feature gates, task-conditioned prefix tokens, and instance-weighted normalization to address mismatched inductive biases, class imbalance, and lexical knowledge injection in multi-task learning, achieving significant gains across four diverse NLP tasks.

0 favorites 0 likes

#transformer

AI Bubble about to Burst? Nvidia quietly acquihires Essential AI team, including Transformer coauthor Ashish Vaswani. Vaswani was struggling to raise money for his AI company.

Reddit r/ArtificialInteligence ↗ · 5d ago

Nvidia has quietly acquihired the team from Essential AI, including Transformer paper coauthor Ashish Vaswani, who was struggling to raise funds for his startup. Vaswani will work on Nvidia's Nemotron open-source models.

0 favorites 0 likes

#transformer

Certainty Is All You Need

Reddit r/artificial ↗ · 5d ago

This paper introduces a new approach leveraging certainty in transformer models, building on the 'Attention Is All You Need' paradigm.

0 favorites 0 likes

#transformer

@JustinAngel: https://x.com/JustinAngel/status/2069482255312195980

X AI KOLs Timeline ↗ · 5d ago Cached

Release of free workshop recordings and materials (23 videos, 250 slides, 50 exercises) for building your own LLM from fundamentals to transformer architecture, with no math or ML prerequisites.

0 favorites 0 likes

transformer

Submit Feedback