transformer

#transformer

Controlled Dynamics Attractor Transformer

arXiv cs.LG ↗ · 2026-06-16 Cached

The Controlled Dynamics Attractor Transformer (CDAT) combines a mixture von Mises-Fisher attention energy with a Hopfield refinement energy and CANN-inspired excitation-inhibition modulation, providing topology-constrained dynamical systems for stable inference. It achieves state-of-the-art performance on graph anomaly detection and classification benchmarks.

0 favorites 0 likes

#transformer

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

Hugging Face Daily Papers ↗ · 2026-06-16 Cached

AdaVoMP uses a sparse adaptive voxel structure and transformer encoder-decoder to predict spatially-varying mechanical properties for 3D objects, enabling high-resolution deformable simulations with improved accuracy and efficiency.

0 favorites 0 likes

#transformer

Looped World Models

Hugging Face Daily Papers ↗ · 2026-06-16 Cached

Looped World Models introduce iterative latent state refinement through shared transformer blocks, achieving 100x parameter efficiency while adapting computational depth to prediction complexity.

0 favorites 0 likes

#transformer

@qinzytech: https://x.com/qinzytech/status/2066585405479371092

X AI KOLs Timeline ↗ · 2026-06-15 Cached

A technical analysis of two approaches to building self-evolving AI agents: model-based (via architecture like SSMs or transformer with fast-weight updates, and training methods) and harness-based (via memory or meta harness that can rewrite itself). The author provides practical recommendations for different audiences.

0 favorites 0 likes

#transformer

@sairahul1: Nobody tells you what's actually inside GPT or Claude. They say "transformer" and move on. This repo builds one from sc…

X AI KOLs Timeline ↗ · 2026-06-15 Cached

A repository that builds a transformer from scratch without high-level libraries, explaining attention mechanisms and the full training pipeline, trainable in a day on free Colab.

0 favorites 0 likes

#transformer

@akshay_pachaar: Train your own LLM from scratch. This repo builds a GPT-style transformer from the ground up, without using any high-le…

X AI KOLs Following ↗ · 2026-06-15 Cached

A repository that builds a GPT-style transformer from scratch without high-level libraries, covering everything from data preprocessing to generation, and includes guides for SFT and RLHF.

0 favorites 0 likes

#transformer

@tanzhengmc97: https://x.com/tanzhengmc97/status/2066531753762656730

X AI KOLs Timeline ↗ · 2026-06-15 Cached

Explained the operating principles of large models in easy-to-understand language, including word vectors, Transformer attention mechanism, next-word prediction training, and emergent abilities, suitable for beginners to understand basic AI concepts.

0 favorites 0 likes

#transformer

@freeman1266: You don't need math to understand most AI papers—just understand this chain: token → embedding → position encoding → attention → FFN → residual stream → next-token prediction. LLMs essentially stack Transf…

X AI KOLs Timeline ↗ · 2026-06-15 Cached

A Chinese science tweet that intuitively explains the core chain of LLMs (Large Language Models): from token, embedding, position encoding, attention, FFN to residual stream and next-token prediction, helping readers without a math background understand AI papers.

0 favorites 0 likes

#transformer

@Fluyeporlaweb: This genius published a step-by-step guide on GitHub for building and training your own model from scratch. No magic. N…

X AI KOLs Timeline ↗ · 2026-06-15 Cached

A GitHub guide published by Fluyeporlaweb shows how to build and train a Transformer model from scratch, implementing attention, multi-head, embeddings, and post-training algorithms (SFT, PPO, DPO, GRPO) without high-level libraries, trained on The Pile dataset.

0 favorites 0 likes

#transformer

DRIVE: Distributional and Retrieval-Augmented Bidding with Value Evaluation

arXiv cs.LG ↗ · 2026-06-15 Cached

This paper introduces DRIVE, a unified Transformer-based framework for offline auto-bidding that decouples candidate action generation from decision making, combining distributional action modeling, retrieval-augmented candidate generation, and value-based evaluation to improve bidding performance under budget and cost constraints.

0 favorites 0 likes

#transformer

Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

arXiv cs.LG ↗ · 2026-06-15 Cached

Zeta proposes a dual whitening optimizer that applies coordinate whitening before spectral whitening to resolve scale heterogeneity in momentum matrices, reducing orthogonalization error and improving convergence and generalization in large-scale neural network training.

0 favorites 0 likes

#transformer

A Deep Reinforcement Learning (DRL)-Based Transformer Method for Solving the Open Shop Scheduling Problem

arXiv cs.AI ↗ · 2026-06-15 Cached

Presents a Transformer-based scheduling policy trained with reinforcement learning for the open shop scheduling problem, showing that a model trained on small instances can generalize to much larger problems and compete with classical dispatching heuristics.

0 favorites 0 likes

#transformer

Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation

Hugging Face Daily Papers ↗ · 2026-06-15 Cached

Taylor-Calibrate proposes a principled initialization method for hybrid linear attention models that significantly improves the efficiency of distilling pretrained Transformers into Gated DeltaNet students, achieving up to 88x improvement and reducing training tokens by 4.9x-9.2x.

0 favorites 0 likes

#transformer

MiniMax Sparse Attention for Million-Token Contexts (GitHub Repo)

TLDR AI ↗ · 2026-06-15 Cached

MiniMaxAI releases MSA, a library for dense and sparse attention kernels optimized for NVIDIA SM100 GPUs, enabling efficient processing of million-token contexts with FlashAttention and sparse top-k attention.

0 favorites 0 likes

#transformer

@CamilleRoux: Une explication bien faite du fonctionnement interne des LLMs : tokens, embeddings, positional encoding, attention, fee…

X AI KOLs Timeline ↗ · 2026-06-14 Cached

This tweet shares a well-made explanation of the internal workings of LLMs, covering tokens, embeddings, positional encoding, attention, and feed-forward networks, via a blog post by 0xkato.

1 favorites 1 likes

#transformer

The Curse of Depth in Large Language Models

Lobsters Hottest ↗ · 2026-06-13 Cached

This paper introduces the Curse of Depth in LLMs, where deep layers become ineffective due to Pre-Layer Normalization causing output variance explosion. The authors propose LayerNorm Scaling to mitigate this, showing consistent improvements in pre-training and fine-tuning across model sizes up to 7B.

0 favorites 0 likes

#transformer

@rasbt: Cool new open-weight model by Cohere: a new lightweight 30B open-weight model for agentic coding tasks. This one builds…

X AI KOLs Timeline ↗ · 2026-06-13

Cohere released a new lightweight 30B open-weight model for agentic coding tasks, built on Command A+ with parallel transformer design, showing strong performance on agentic benchmarks like Terminal-Bench and SWE-Bench.

0 favorites 0 likes

#transformer

Otters++: A Time-to-first-spike Based Energy Efficient Optical Spiking Transformer

arXiv cs.AI ↗ · 2026-06-12 Cached

Otters++ is a novel optical spiking Transformer that leverages time-to-first-spike coding and physical hardware decay to achieve energy-efficient inference, achieving 84.17% on GLUE while maintaining a clear energy advantage over prior spiking Transformer baselines.

0 favorites 0 likes

#transformer

@PierceZhang34: Train a Small Model in 10 Seconds! First Look at the LLM Training Tool: http://llm.istanbul Recently discovered a super fun open-source style tool website — http://llm.istanbul, which claims to be a WebGPU LLM Workbench, meaning it fully...

X AI KOLs Timeline ↗ · 2026-06-12 Cached

Introduces llm.istanbul, a WebGPU LLM workbench that lets you train small models, train tokenizers, and generate text entirely in the browser, no server required, fully local.

0 favorites 0 likes

#transformer

Adaptive Tokenisation Via Temporal Redundancy Masking And Latent Inpainting [R]

Reddit r/MachineLearning ↗ · 2026-06-11

This paper introduces an adaptive video tokenisation method that exploits temporal redundancy in latent space to allocate tokens dynamically, achieving efficient compression without auxiliary networks. The proposed Latent Inpainting Transformer reconstructs dropped positions, delivering 31x speedup over ElasticTok-CV and 2x over InfoTok.

0 favorites 0 likes

transformer

Submit Feedback