Tag
Garry Tan describes using a personal AI agent system, termed 'Book Mirror', to deeply integrate reading material with his life context via Meta-Meta-Prompting. He shares insights on building real AI systems as an operating system rather than just a chat interface.
The article details an expanded 12-rule CLAUDE.md configuration template that builds upon Andrej Karpathy's original 4 rules to further reduce AI coding errors and handle complex agent orchestration issues.
The user shares progress on commercial testing involving Codex and HyperFrames to generate a Nike promotional video, stating that the results will be open-sourced if successful.
This article outlines a 2026 roadmap for LLM engineering, detailing eight key pillars including prompt engineering, RAG systems, and context management, while providing curated free and open-source resources for each.
YC founder Garry Tan shares a set of system prompts for OpenClaw designed to transform AI assistants from disposable tools into persistent automated systems, achieving automation of repetitive tasks through a self-evolving skill library.
GitHub's 'spec-kit' repository has gained 92k+ stars by offering a structured 6-command workflow that transforms vague ideas into executable specifications for AI coding agents, positioning itself as an alternative to unstructured 'vibe coding'. It supports Claude Code, Copilot, Cursor, Codex, Gemini, and 25+ other AI agents.
Introduces Garry Tan's 'Plan-Eng-Review' skill, emphasizing that before using AI for coding, one should first use an Agent to generate ASCII diagrams to plan data flows and state machines, in order to prevent the code implementation from deviating from the intended direction.
This article promotes the open-source book 'Foundations of LLMs', which systematically explains knowledge about large language models, and introduces the multi-agent development framework Agent-Kernel.
A developer shares their experience of a single system prompt change degrading LLM response quality without triggering traditional monitoring alerts, and describes internal tooling they built to monitor semantic quality in production LLM applications.
Marc Andreessen faced online mockery after sharing a custom AI prompt that demonstrated a fundamental misunderstanding of how large language models work, particularly regarding hallucinations and knowledge limits.
The article introduces Prober.ai, a web-based writing environment that uses LLM-constrained personas to provide inquiry-based feedback for argumentative writing, aiming to prevent cognitive outsourcing. Developed as a hackathon prototype, the system gates revision suggestions behind student reflection to preserve critical thinking skills.
The user asks about the internal processes ChatGPT uses to generate essays, specifically whether it synthesizes information and structures arguments like a human or simply copies existing text.
The article argues that reliable AI agents require deterministic control flow and programmatic verification in software, rather than relying solely on complex prompt chains.
The Zhejiang University team open-sourced an easy-to-understand textbook on large models 'Foundations of Large Models', covering from architectural evolution to key technologies like RAG, accompanied by the Agent-Kernel multi-agent framework.
This paper introduces LoPE, a training framework that uses prompt-space perturbations to address the zero-advantage problem in reinforcement learning with verifiable rewards, thereby enhancing reasoning exploration in large language models.
A developer building an AI legal assistant for a German law firm details seven specific LLM citation failure modes and the prompt-engineering fixes used to meet strict legal citation standards.
A viral hot take argues that today's "AI engineers" are mostly prompt engineers rebranded, questioning whether API-chaining and guardrails count as true engineering versus just using AI effectively.
Empirical study on LLM formal-math reasoning finds a single-prompt ceiling: accuracy plateaus around 60–79% regardless of prompt size, driven by undecidability, model fragility, and distribution mismatch.
A 2026 blog post revisits how prompt tone and context depth shift LLM responses, showing richer gamer-style prompts yield deeper, stat-backed answers than bare questions.
A developer describes how French text in retrieved contexts caused their multilingual RAG system to unpredictably switch languages mid-answer, ultimately solved with a regex-based German detector and explicit negative prompts.