transformer

Tag

Cards List
#transformer

RNNs vs Transformers vs SSMs: where should AI memory live for continual learning?

Reddit r/artificial · 2026-06-18

A technical analysis comparing memory designs in RNNs, Transformers, and SSMs, arguing that the key question is where to store sequence state rather than which architecture is better. Discusses trade-offs between compressed hidden states, growing KV caches, and synaptic-like memory in model connectivity.

0 favorites 0 likes
#transformer

@MosiAI_Official: MOSS-TTS Local Transformer v1.5 is here. Clone any voice. Speak any language. Hear every detail. 30+ languages, 48 kHz …

X AI KOLs Following · 2026-06-18 Cached

MosiAI has released MOSS-TTS Local Transformer v1.5, a text-to-speech model that supports voice cloning, over 30 languages, and high-quality 48 kHz output.

0 favorites 0 likes
#transformer

@jbhuang0604: Huge! It’s amazing how often Noam’s papers end up at the center of the field. In many tutorial videos I’ve made, they’v…

X AI KOLs Following · 2026-06-18 Cached

The article provides a detailed explanation of Mixture of Experts (MoE) in transformers, covering routing, load balancing, and recent innovations like fine-grained experts. It also highlights the significance of Noam Shazeer's research contributions and his move from Google to OpenAI.

0 favorites 0 likes
#transformer

@ns123abc: “Sir… Noam Shazeer, the legend who invented the Transformer… who Sundar paid $2.7 billion to bring back and led Gemini……

X AI KOLs Timeline · 2026-06-18 Cached

Noam Shazeer, co-inventor of the Transformer architecture and key figure behind Gemini, is leaving Google to join OpenAI, marking his second departure from Google after being brought back in a $2.7 billion deal.

0 favorites 0 likes
#transformer

@0xLogicrw: Noam Shazeer, Google AI key figure and Gemini model technical lead, leaves Google again and officially joins rival OpenAI. OpenAI announced to employees that Shazeer will focus on finding entirely new underlying architectures for large models and advancing the Transformer...

X AI KOLs Timeline · 2026-06-18 Cached

Noam Shazeer, co-author of the Transformer architecture and technical lead of Google's Gemini model, has left Google again and officially joined OpenAI. He will focus on discovering new underlying architectures for large models and driving research into the evolution of Transformers.

0 favorites 0 likes
#transformer

QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

Hugging Face Daily Papers · 2026-06-18 Cached

This paper introduces QG-MIL, a gated transformer aggregator that mitigates attention concentration in multiple instance learning for medical imaging, achieving domain-agnostic performance without auxiliary losses.

0 favorites 0 likes
#transformer

Grouped Query Experts: Mixture-of-Experts on GQA Self-Attention

Hugging Face Daily Papers · 2026-06-18 Cached

Grouped Query Experts (GQE) improves Transformer efficiency by applying a mixture-of-experts layer on top of grouped-query attention, selectively activating query heads per token while keeping key-value cache benefits, matching baseline accuracy with half the query-head compute at 250M parameter scale.

0 favorites 0 likes
#transformer

@DanKornas: Building an LLM from scratch is easier when each layer has its own notebook. EveryonesLLM is a Google Colab-based tutor…

X AI KOLs Timeline · 2026-06-17 Cached

EveryonesLLM is an open-source Google Colab-based tutorial repository for building a nanoGPT-style LLM from scratch, with step-by-step chapters covering dataloading, embeddings, attention, training, and instruction tuning.

0 favorites 0 likes
#transformer

Multilingual-Multimodal-NLP/LoopCoder-V2 · Hugging Face

Reddit r/LocalLLaMA · 2026-06-17 Cached

LoopCoder-V2 is a 7B instruction-tuned code model built on the Parallel Loop Transformer (PLT), demonstrating non-monotonic test-time scaling with two loops providing the best gain-cost trade-off and significant improvements over baselines on code generation and reasoning benchmarks.

0 favorites 0 likes
#transformer

@retr0sushi_: looped transformer -> hyper-looped transformer -> looped world model ??

X AI KOLs Timeline · 2026-06-17 Cached

Speculates on a progression from looped transformers to hyper-looped transformers to looped world models, hinting at a new research direction.

0 favorites 0 likes
#transformer

Reconfigurable Computing Challenge: Transformer for Jet Tagging on Versal AI Engines

arXiv cs.LG · 2026-06-17 Cached

This paper presents a quantized, integer-only transformer implementation for jet tagging on AMD Versal AI Engines, including a reusable open-source framework that maps transformer layers to AIE tiles for low-latency trigger systems at CERN LHC.

0 favorites 0 likes
#transformer

Discrete Autoregressive Transformer for Generative Mechanism Synthesis

arXiv cs.LG · 2026-06-17 Cached

This paper presents a discrete autoregressive transformer that generates planar mechanisms from target coupler curves, using variational autoencoder latents and tokenized joint coordinates to achieve diverse, accurate designs across multiple topologies.

0 favorites 0 likes
#transformer

The Discrete-Log Clock: How a Transformer Learns Modular Multiplication

arXiv cs.LG · 2026-06-17 Cached

This paper demonstrates that when transformers grok modular multiplication, the dense Fourier spectrum observed in previous work is an artifact of using the additive Fourier transform; using the multiplicative character transform reveals a sparse representation, leading to a reverse-engineered 'Discrete-Log Clock' algorithm analogous to the clock algorithm for modular addition.

0 favorites 0 likes
#transformer

Counterfactual Optimization of Baseball Pitch Sequences and Estimation of Its Impact on Season-Level Statistics

arXiv cs.LG · 2026-06-17 Cached

This paper uses a Transformer-based model on MLB Statcast data to counterfactually optimize baseball pitch sequences, finding that optimizing both final and setup pitches can improve season-level statistics like K/9 by over 1.0.

0 favorites 0 likes
#transformer

The Critical Role of Model Selection in Causal Inference: A Comparative Analysis of Classification Models within the InferBERT Framework for Pharmacovigilance

arXiv cs.LG · 2026-06-17 Cached

This paper systematically evaluates the impact of classification model selection within the InferBERT framework for causal adverse drug event detection, finding that domain-specific pre-training (BioBERT) outperforms both simpler models and larger LLMs like Med-LLaMA.

0 favorites 0 likes
#transformer

@Phoenixyin13: If the full score is 10, I would honestly give this MIT paper's SMT idea and writing an 8. The paper proposes Supervised Memory Training, using Transformer as a super teacher to first distill in parallel the most important things to remember at each moment…

X AI KOLs Timeline · 2026-06-16 Cached

This paper proposes Supervised Memory Training (SMT), which uses Transformer as a super teacher to distill memory states in parallel, then trains RNN with one-step supervised learning, achieving fully parallel training and reducing gradient path from O(T) to O(1), significantly improving long-range dependency learning.

0 favorites 0 likes
#transformer

GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz

Hacker News Top · 2026-06-16 Cached

A custom FPGA implementation of a Transformer with KV cache achieves 56,000 tokens per second at 80 MHz, running microGPT on a tiny LCD.

0 favorites 0 likes
#transformer

@NFTCPS: You keep talking about AI, but can't even explain what a Transformer is? There's a repo that goes all out — builds a GPT from scratch without using any high-level libraries. It lays out exactly how Attention, Multi-Head, Feed-Forward, Embedding, Residual connections, and Layer Norm are pieced together. And it's not just the model; the entire pipeline is covered…

X AI KOLs Timeline · 2026-06-16 Cached

A GitHub open-source project that implements the complete GPT training pipeline from scratch, including data preprocessing, pretraining, SFT, and RLHF post-training, all based on native PyTorch. Ideal for developers who want to deeply understand the Transformer architecture.

0 favorites 0 likes
#transformer

@GitHub_Daily: Want to understand the underlying principles of large language models? Most resources only cover theory or provide source code, leaving you still confused. Stumbled upon this open-source tutorial, EveryonesLLM, which guides us step by step to build a complete large language model from scratch on Google Colab, writing code throughout. The whole tutorial is divided into...

X AI KOLs Timeline · 2026-06-16 Cached

EveryonesLLM is an open-source tutorial that provides 29 chapters of Colab notebooks. It teaches users step by step to build a complete large language model from scratch on Google Colab, including pre-training and instruction fine-tuning, and supports Chinese.

0 favorites 0 likes
#transformer

Hierarchical Modeling of ICD Codes in EHR Foundation Models

arXiv cs.AI · 2026-06-16 Cached

This paper investigates explicit encoding of ICD-10-CM hierarchy in EHR foundation models, using hierarchical token augmentation and graph-based code representations. Experiments on MIMIC-IV and eICU show improvements over flat code representations for in-domain and cross-dataset prediction tasks.

0 favorites 0 likes
← Previous
Next →
← Back to home

Submit Feedback