Tag
A comprehensive guide to building AI agent harnesses, covering tool execution, context management, state/memory, and guardrails, based on lessons from building Claude Code and other harnesses for enterprise.
Karpathy describes a new paradigm for interacting with Claude where it becomes a persistent, async team member with org-wide tools and context, similar to Claude Tag in Slack, marking a third major redesign of LLM user interfaces.
Explains that super weights in large language models arise from the SoftMax-Attention interaction creating a 'Nothing Dump' token that serves as a stable reference point; removing these weights cripples performance.
Modal introduces Auto Endpoints, a self-serve service for optimized, production-grade LLM inference with full code ownership, transparent metrics, and autoscaling, built on their serverless GPU infrastructure.
This paper investigates an alignment vulnerability in instruction-tuned LLMs, specifically Gemma-3-12B, by showing that pre-token hidden state shifts can act as an alignment policy traversal vector, potentially enabling bypass of safety measures.
A discussion on the methodologies and challenges involved in evaluating AI features once they are deployed in production environments.
A benchmark of 8 LLMs for medical scribing found hallucinations rare but omissions a concern.
Discusses real-world experiences with GLM 5.2 in complex production business workloads, focusing on practical performance beyond benchmark scores.
Filippo Valsorda argues that LLMs have made vulnerability reports no longer special, as AI can now generate insights that were once exclusive to human researchers, shifting the bottleneck from discovery to triage.
The article reports a potential alignment vulnerability in LLMs where processing a structured passage before an unrelated question can alter the model's response, with mechanistic evidence from Gemma-3-12B showing hidden-state separation.
A brief prediction that in 2025 engineers will integrate LLM APIs into their test harnesses, and in 2026 they will design harnesses to work within their agents.
An LLM was given access to a thermal camera pointing at the Raspberry Pi it runs on, and it began conducting experiments by toggling the fan to observe temperature changes.
NVIDIA's nvfp4 quantized version of Qwen 3.6 35B is recommended over the Unsloth variant, offering better performance. The model is available on HuggingFace for use in AI applications.
This paper presents a theory that prompt injection attacks on LLMs stem from a fundamental flaw in how models perceive roles, treating roles as a type system for language. It explains existing attacks, predicts new ones, and proposes a research agenda for a science of roles.
A reflection on the landmark 'Attention Is All You Need' paper, highlighting how removing recurrence and relying solely on attention mechanisms revolutionized AI and led to modern LLMs like GPT and Claude.
This article argues that traditional prompt engineering is obsolete; modern LLMs are intent reconstructors, and interactions should be through natural, rich conversation rather than condensed instructions.
This article explores how AI agents can automatically write and optimize their skill files using techniques like SkillOpt from Microsoft Research, which treats skill documents as trainable state and delivers significant performance improvements. It addresses the challenge of manual skill tuning and presents frameworks like GEPA and EvoSkill as evolutionary approaches.
The article argues that human code reviewers should use AI to handle large diffs, and instead contribute their out-of-distribution knowledge and high-level context.
Alisa Liu is joining OpenAI next week and shared a blog with job-search notes, including LLM and math resources.
Introduces Test-Time Reinforcement Learning (TTRL), a method that uses majority voting on unlabeled data to create pseudo-labels for RL training, enabling self-improvement of LLMs without ground-truth answers. Achieves significant gains (e.g., +159-211% on AIME 2024 for Qwen-2.5-Math-7B).