transformer

#transformer

Parallel Manifold Steering: Efficient Adaptation of Large Associative Memories via Residual Energy Shaping

arXiv cs.LG ↗ · 4d ago Cached

This paper proposes H-Res, a method to adapt large transformer models by shaping the energy landscape of associative memories without modifying weights or adding prompts, preserving memory capacity and outperforming LoRA.

0 favorites 0 likes

#transformer

Lightweight Transformer Models for On-Device Fault Detection: A Benchmark Study on Resource-Constrained Deployment

arXiv cs.LG ↗ · 4d ago Cached

A benchmark study comparing traditional machine learning methods (Random Forest, XGBoost, SVM, Logistic Regression) against lightweight transformer variants (DistilBERT, TinyBERT, MobileBERT) for on-device fault detection across three public datasets. Traditional ML offers competitive accuracy at far smaller resource footprints, while TinyBERT-4L is the most deployment-friendly transformer.

0 favorites 0 likes

#transformer

NeuroSonic: Conditional Flow Matching for EEG-to-Speech Reconstruction

arXiv cs.LG ↗ · 4d ago Cached

NeuroSonic introduces a conditional flow-matching framework for reconstructing continuous speech from EEG signals, addressing the structural mismatch between neural and acoustic data by learning a deterministic probability-flow velocity field. It achieves up to 26.3% improvement in perceptual quality over existing GAN, diffusion, and mean-flow baselines on cross-subject benchmarks.

0 favorites 0 likes

#transformer

Deciphering Fingerprints of 3D Molecular Surfaces for Accurate Epitope Prediction

arXiv cs.LG ↗ · 4d ago Cached

SurfBind, a surface-centric learning framework for epitope prediction, uses Transformer-based architecture with patch-level surface modeling and binder-aware cross-attention to achieve state-of-the-art performance on epitope identification benchmarks.

0 favorites 0 likes

#transformer

AutoSpecNER: A Fine-Grained Named Entity Recognition Dataset for Vehicle Specification Extraction

arXiv cs.CL ↗ · 4d ago Cached

Introduces AutoSpecNER, an expert-annotated dataset for fine-grained named entity recognition in vehicle listings, with 659 advertisements annotated across 15 entity types. Benchmark results show DeBERTa achieves 90% micro-F1, outperforming rule-based and LLM approaches.

0 favorites 0 likes

#transformer

Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment

arXiv cs.CL ↗ · 4d ago Cached

A comprehensive survey of transformer-based language models covering architectures, applications across domain verticals (healthcare, finance, legal, etc.), and critical assessment of trade-offs including compute cost, alignment, and data provenance.

0 favorites 0 likes

#transformer

SURGELLM: Rethinking Multi-Task Evaluation through Task-Aware Feature Gating with Class-Balanced Normalization

arXiv cs.CL ↗ · 4d ago Cached

SURGeLLM introduces a unified transformer framework with surgical feature gates, task-conditioned prefix tokens, and instance-weighted normalization to address mismatched inductive biases, class imbalance, and lexical knowledge injection in multi-task learning, achieving significant gains across four diverse NLP tasks.

0 favorites 0 likes

#transformer

AI Bubble about to Burst? Nvidia quietly acquihires Essential AI team, including Transformer coauthor Ashish Vaswani. Vaswani was struggling to raise money for his AI company.

Reddit r/ArtificialInteligence ↗ · 4d ago

Nvidia has quietly acquihired the team from Essential AI, including Transformer paper coauthor Ashish Vaswani, who was struggling to raise funds for his startup. Vaswani will work on Nvidia's Nemotron open-source models.

0 favorites 0 likes

#transformer

Certainty Is All You Need

Reddit r/artificial ↗ · 4d ago

This paper introduces a new approach leveraging certainty in transformer models, building on the 'Attention Is All You Need' paradigm.

0 favorites 0 likes

#transformer

@JustinAngel: https://x.com/JustinAngel/status/2069482255312195980

X AI KOLs Timeline ↗ · 4d ago Cached

Release of free workshop recordings and materials (23 videos, 250 slides, 50 exercises) for building your own LLM from fundamentals to transformer architecture, with no math or ML prerequisites.

0 favorites 0 likes

#transformer

@thtrkim: Visual deep dive on FlashAttention by hand (drawn with Excalidraw) https://winterrykim.github.io/blog/2026/training-lm-…

X AI KOLs Timeline ↗ · 4d ago Cached

A visual deep dive into FlashAttention, explaining memory optimization and operator fusion for efficient attention computation in language model training.

0 favorites 0 likes

#transformer

Overfitted a 900KB Transformer to Compress a 100MB CSV into 7MB

Hacker News Top ↗ · 4d ago Cached

A developer overfits a small 900KB transformer model to compress a 100MB CSV file down to 7MB, demonstrating a novel approach to data compression using overfitted neural networks.

0 favorites 0 likes

#transformer

Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models

Hugging Face Daily Papers ↗ · 5d ago Cached

Wan-Streamer is a unified end-to-end multimodal model for real-time audio-visual interaction using causal attention and integrated processing of visual, audio, and text modalities, achieving sub-second latency.

0 favorites 0 likes

#transformer

@li9292: How to join OpenAI? Just master the following courses: 1. Stanford's "Language Modeling from Scratch" course: http://cs336.stanford.edu/spring2025/ 2. After gaining breadth, she dives deep into each concept, using blogs, papers, and ChatGPT…

X AI KOLs Timeline ↗ · 5d ago Cached

This tweet recommends Stanford's CS336 course and a series of learning resources as a preparation path for joining OpenAI.

0 favorites 0 likes

#transformer

nvidia/GLM-5.2-NVFP4

Hugging Face Models Trending ↗ · 5d ago Cached

NVIDIA released GLM-5.2-NVFP4, a quantized version of ZAI's GLM-5.2 MoE language model optimized for inference on NVIDIA Blackwell GPUs using Model Optimizer.

0 favorites 0 likes

#transformer

Attention Is All You Need

Reddit r/ArtificialInteligence ↗ · 5d ago

A reflection on the landmark 'Attention Is All You Need' paper, highlighting how removing recurrence and relying solely on attention mechanisms revolutionized AI and led to modern LLMs like GPT and Claude.

0 favorites 0 likes

#transformer

The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

Hugging Face Daily Papers ↗ · 6d ago Cached

A comprehensive practitioner's guide covering the full stack of building autonomous AI systems, from foundational transformer architecture to advanced agentic topics like multi-agent coordination and production deployment.

0 favorites 0 likes

#transformer

Tapered Language Models

Hugging Face Daily Papers ↗ · 6d ago Cached

This paper introduces Tapered Language Models (TLMs), an architecture principle that allocates more parameters to earlier layers and fewer to later layers, consistently improving perplexity and downstream performance across multiple architectures without extra cost.

0 favorites 0 likes

#transformer

[NEW MODEL] SupraLabs started the Any2Any model family!

Reddit r/LocalLLaMA ↗ · 2026-06-21 Cached

SupraLabs released Supra-A2A-Nano-Exp, a small any-to-any autoregressive model that unifies text and image tokenization into a single Transformer, serving as an educational prototype rather than a production-ready system.

0 favorites 0 likes

#transformer

In the span of 3 days: Noam Shazeer (Transformer co-author) leaves Google for OpenAI, and John Jumper (Nobel laureate, AlphaFold lead) leaves Google DeepMind for Anthropic

Reddit r/singularity ↗ · 2026-06-19

In three days, Noam Shazeer (co-author of the Transformer paper) left Google for OpenAI, and Nobel laureate John Jumper (AlphaFold lead) left Google DeepMind for Anthropic, marking significant talent shifts in AI.

0 favorites 0 likes

transformer

Submit Feedback