Tag
This paper proposes H-Res, a method to adapt large transformer models by shaping the energy landscape of associative memories without modifying weights or adding prompts, preserving memory capacity and outperforming LoRA.
A benchmark study comparing traditional machine learning methods (Random Forest, XGBoost, SVM, Logistic Regression) against lightweight transformer variants (DistilBERT, TinyBERT, MobileBERT) for on-device fault detection across three public datasets. Traditional ML offers competitive accuracy at far smaller resource footprints, while TinyBERT-4L is the most deployment-friendly transformer.
NeuroSonic introduces a conditional flow-matching framework for reconstructing continuous speech from EEG signals, addressing the structural mismatch between neural and acoustic data by learning a deterministic probability-flow velocity field. It achieves up to 26.3% improvement in perceptual quality over existing GAN, diffusion, and mean-flow baselines on cross-subject benchmarks.
SurfBind, a surface-centric learning framework for epitope prediction, uses Transformer-based architecture with patch-level surface modeling and binder-aware cross-attention to achieve state-of-the-art performance on epitope identification benchmarks.
Introduces AutoSpecNER, an expert-annotated dataset for fine-grained named entity recognition in vehicle listings, with 659 advertisements annotated across 15 entity types. Benchmark results show DeBERTa achieves 90% micro-F1, outperforming rule-based and LLM approaches.
A comprehensive survey of transformer-based language models covering architectures, applications across domain verticals (healthcare, finance, legal, etc.), and critical assessment of trade-offs including compute cost, alignment, and data provenance.
SURGeLLM introduces a unified transformer framework with surgical feature gates, task-conditioned prefix tokens, and instance-weighted normalization to address mismatched inductive biases, class imbalance, and lexical knowledge injection in multi-task learning, achieving significant gains across four diverse NLP tasks.
Nvidia has quietly acquihired the team from Essential AI, including Transformer paper coauthor Ashish Vaswani, who was struggling to raise funds for his startup. Vaswani will work on Nvidia's Nemotron open-source models.
This paper introduces a new approach leveraging certainty in transformer models, building on the 'Attention Is All You Need' paradigm.
Release of free workshop recordings and materials (23 videos, 250 slides, 50 exercises) for building your own LLM from fundamentals to transformer architecture, with no math or ML prerequisites.
A visual deep dive into FlashAttention, explaining memory optimization and operator fusion for efficient attention computation in language model training.
A developer overfits a small 900KB transformer model to compress a 100MB CSV file down to 7MB, demonstrating a novel approach to data compression using overfitted neural networks.
Wan-Streamer is a unified end-to-end multimodal model for real-time audio-visual interaction using causal attention and integrated processing of visual, audio, and text modalities, achieving sub-second latency.
This tweet recommends Stanford's CS336 course and a series of learning resources as a preparation path for joining OpenAI.
NVIDIA released GLM-5.2-NVFP4, a quantized version of ZAI's GLM-5.2 MoE language model optimized for inference on NVIDIA Blackwell GPUs using Model Optimizer.
A reflection on the landmark 'Attention Is All You Need' paper, highlighting how removing recurrence and relying solely on attention mechanisms revolutionized AI and led to modern LLMs like GPT and Claude.
A comprehensive practitioner's guide covering the full stack of building autonomous AI systems, from foundational transformer architecture to advanced agentic topics like multi-agent coordination and production deployment.
This paper introduces Tapered Language Models (TLMs), an architecture principle that allocates more parameters to earlier layers and fewer to later layers, consistently improving perplexity and downstream performance across multiple architectures without extra cost.
SupraLabs released Supra-A2A-Nano-Exp, a small any-to-any autoregressive model that unifies text and image tokenization into a single Transformer, serving as an educational prototype rather than a production-ready system.
In three days, Noam Shazeer (co-author of the Transformer paper) left Google for OpenAI, and Nobel laureate John Jumper (AlphaFold lead) left Google DeepMind for Anthropic, marking significant talent shifts in AI.