llm-architecture

#llm-architecture

@shao__meng: Last night I used Cursor (Opus 4.7) + Remotion to make a video. After the first version was complete, there were quite a few issues—font and background color blending, text overlapping, etc. I was a bit harsh with my words, something like: 'Terrible, so many issues, didn't you check before outputting?' and pointed out a few specific problems. …

X AI KOLs Following ↗ · 2d ago Cached

The article shares a personal experience using Cursor and Opus 4.7 to generate videos, highlighting the agent's rigorous self-correction process. It then transitions into a technical discussion on 'Agent = Model + Harness,' arguing that engineering systems like ratchets and context management are more critical to AI agent performance than the underlying model alone.

0 favorites 0 likes

#llm-architecture

I stopped trying to build one super-agent and split it into 4 narrow agents. Reliability went way up.

Reddit r/AI_Agents ↗ · 3d ago

The author describes improving AI agent reliability by replacing a single general-purpose agent with a four-agent workflow specializing in intake, research, action, and review. This shift prioritized system predictability and easier debugging over raw autonomy.

0 favorites 0 likes

#llm-architecture

TIDE: Every Layer Knows the Token Beneath the Context

arXiv cs.CL ↗ · 5d ago Cached

This paper introduces TIDE, a method that addresses the Rare Token and Contextual Collapse problems in LLMs by injecting token identity into every layer via Embedding Memory. The authors demonstrate theoretical and empirical improvements across language modeling and downstream tasks.

0 favorites 0 likes

#llm-architecture

@YouJiacheng: > Directly applying RoPE rotation to KV will leak positional information into value matrix V 科学空间亦有记载 https://kexue.fm/…

X AI KOLs Timeline ↗ · 6d ago Cached

A social media post discusses the technical implication of applying RoPE rotation directly to KV caches, noting that it leaks positional information into the value matrix V.

0 favorites 0 likes

#llm-architecture

Transformer Math Explorer [P]

Reddit r/MachineLearning ↗ · 6d ago

This interactive tool visualizes the mathematical underpinnings of transformer models through dataflow graphs, covering architectures from GPT-2 to Qwen 3.6 and various attention mechanisms.

0 favorites 0 likes

#llm-architecture

StageMem: Lifecycle-Managed Memory for Language Models

arXiv cs.CL ↗ · 2026-04-21 Cached

StageMem proposes a lifecycle-managed memory framework for language models that organizes memory into transient, working, and durable stages with explicit confidence and strength metrics, treating memory as a stateful process rather than a static store to better manage retention and forgetting under bounded capacity.

0 favorites 0 likes

#llm-architecture

@techyoutbe: 1.5-Hour Lecture on “LLM Architecture” by Stanford

X AI KOLs Timeline ↗ · 2026-04-19 Cached

Stanford University offers a 1.5-hour lecture on LLM architecture covering fundamental concepts and design principles of large language models.

0 favorites 0 likes

#llm-architecture

Mixture of Experts (MoEs) in Transformers

Hugging Face Blog ↗ · 2026-02-26 Cached

Hugging Face blog post explaining Mixture of Experts (MoEs) architecture in Transformers, covering the shift from dense to sparse models, weight loading optimizations, expert parallelism, and training techniques for MoE-based language models.

0 favorites 0 likes

llm-architecture

Submit Feedback