transformer

#transformer

RNNs vs Transformers vs SSMs: where should AI memory live for continual learning?

Reddit r/artificial ↗ · 2026-06-18

A technical analysis comparing memory designs in RNNs, Transformers, and SSMs, arguing that the key question is where to store sequence state rather than which architecture is better. Discusses trade-offs between compressed hidden states, growing KV caches, and synaptic-like memory in model connectivity.

0 favorites 0 likes

#transformer

@MosiAI_Official: MOSS-TTS Local Transformer v1.5 is here. Clone any voice. Speak any language. Hear every detail. 30+ languages, 48 kHz …

X AI KOLs Following ↗ · 2026-06-18 Cached

MosiAI has released MOSS-TTS Local Transformer v1.5, a text-to-speech model that supports voice cloning, over 30 languages, and high-quality 48 kHz output.

0 favorites 0 likes

#transformer

@jbhuang0604: Huge! It’s amazing how often Noam’s papers end up at the center of the field. In many tutorial videos I’ve made, they’v…

X AI KOLs Following ↗ · 2026-06-18 Cached

The article provides a detailed explanation of Mixture of Experts (MoE) in transformers, covering routing, load balancing, and recent innovations like fine-grained experts. It also highlights the significance of Noam Shazeer's research contributions and his move from Google to OpenAI.

0 favorites 0 likes

#transformer

@ns123abc: “Sir… Noam Shazeer, the legend who invented the Transformer… who Sundar paid $2.7 billion to bring back and led Gemini……

X AI KOLs Timeline ↗ · 2026-06-18 Cached

Noam Shazeer, co-inventor of the Transformer architecture and key figure behind Gemini, is leaving Google to join OpenAI, marking his second departure from Google after being brought back in a $2.7 billion deal.

0 favorites 0 likes

#transformer

@0xLogicrw: Noam Shazeer, Google AI key figure and Gemini model technical lead, leaves Google again and officially joins rival OpenAI. OpenAI announced to employees that Shazeer will focus on finding entirely new underlying architectures for large models and advancing the Transformer...

X AI KOLs Timeline ↗ · 2026-06-18 Cached

Noam Shazeer, co-author of the Transformer architecture and technical lead of Google's Gemini model, has left Google again and officially joined OpenAI. He will focus on discovering new underlying architectures for large models and driving research into the evolution of Transformers.

0 favorites 0 likes

#transformer

QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

Hugging Face Daily Papers ↗ · 2026-06-18 Cached

This paper introduces QG-MIL, a gated transformer aggregator that mitigates attention concentration in multiple instance learning for medical imaging, achieving domain-agnostic performance without auxiliary losses.

0 favorites 0 likes

#transformer

Grouped Query Experts: Mixture-of-Experts on GQA Self-Attention

Hugging Face Daily Papers ↗ · 2026-06-18 Cached

Grouped Query Experts (GQE) improves Transformer efficiency by applying a mixture-of-experts layer on top of grouped-query attention, selectively activating query heads per token while keeping key-value cache benefits, matching baseline accuracy with half the query-head compute at 250M parameter scale.

0 favorites 0 likes

#transformer

@DanKornas: Building an LLM from scratch is easier when each layer has its own notebook. EveryonesLLM is a Google Colab-based tutor…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

EveryonesLLM is an open-source Google Colab-based tutorial repository for building a nanoGPT-style LLM from scratch, with step-by-step chapters covering dataloading, embeddings, attention, training, and instruction tuning.

0 favorites 0 likes

#transformer

Multilingual-Multimodal-NLP/LoopCoder-V2 · Hugging Face

Reddit r/LocalLLaMA ↗ · 2026-06-17 Cached

LoopCoder-V2 is a 7B instruction-tuned code model built on the Parallel Loop Transformer (PLT), demonstrating non-monotonic test-time scaling with two loops providing the best gain-cost trade-off and significant improvements over baselines on code generation and reasoning benchmarks.

0 favorites 0 likes

#transformer

@retr0sushi_: looped transformer -> hyper-looped transformer -> looped world model ??

X AI KOLs Timeline ↗ · 2026-06-17 Cached

Speculates on a progression from looped transformers to hyper-looped transformers to looped world models, hinting at a new research direction.

0 favorites 0 likes

#transformer