Tag
A developer shares their workflow using Cursor's subagent harness with Opus 4.8 Max Thinking for long context understanding and implementing large features in Swift, emphasizing hands-on planning and phased acceptance testing.
DecomposeR introduces a planner-centric reinforcement learning framework that represents research plans as typed DAGs, enabling finer-grained optimization of planning and execution for deep research tasks, achieving 5.1–8.0 point improvements over open baselines.
This paper proposes three rerooter designs for Levin Tree Search that leverage state-space structure and learned heuristics to improve search efficiency without explicit subgoal generation, achieving state-of-the-art online training efficiency.
This paper investigates how to encode factored planning tasks (FTS) into SAT, proposing multiple encoding strategies and analyzing the impact of task transformations on SAT-based planning performance. It aims to extend SAT solving to more compact planning representations beyond heuristic search.
This article discusses an anti-pattern in AI agent systems where agents appear busy but fail to complete tasks. The author suggests separating responsibilities and requiring proof of completion as a solution.
A developer tested over 30 Claude Code repositories and found 5 that genuinely improve Claude's building capabilities, such as Superpowers which forces structured planning before coding.
Introduces Thoughts-as-Planning, a framework that models chain-of-thought optimization as sequential decision-making using latent world models and reinforcement learning, outperforming existing methods in efficiency and generalization.
Fox Issue Tracker 4 is a tool for tracking, planning, and releasing software projects.
Introduces SVI-Bench, a large-scale benchmark for strategic video intelligence using team sports, designed to evaluate models on dynamic scene understanding, causal reasoning, strategic simulation, and agentic synthesis. The benchmark reveals a capability cliff where models perform well on perceptual tasks but sharply degrade on higher-level strategic reasoning.
RabbitTravel is a smart travel planning tool that makes trip organization effortless.
RePoT improves Program-of-Thought by enabling deterministic recovery from invalid actions through checkpoint-based repair, achieving higher success rates across multiple models and benchmarks.
This paper presents a prototype framework for managing uncertainty in LLM-generated procedural knowledge for virtual laboratory planning, using structured domain representations to repair uncertain procedural steps.
This paper introduces a neuro-inspired framework called Inverter that uses Inverse Learning (IL) for fast and efficient planning and control, achieving significant improvements on D4RL benchmarks and quantum gate synthesis with orders of magnitude less inference computation.
Explains how to use Claude to perform a premortem, a technique by Daniel Kahneman, to stress-test plans by imagining they have already failed.
A tweet highlights the effectiveness of using /goal with coding agents, emphasizing planning before setting the goal for better context and results.
This paper identifies a failure mode in LLM-based multi-agent systems where plans fail due to agents misjudging their knowledge (epistemic miscalibration) and proposes EPC-AW, a workflow that uses information-consistency and epistemic state refinement to improve system-level success by 9.75%.
A tweet introduces a workflow where GPT-5.5 xhigh plans and delegates implementation to Composer 2.5 subagents via the pi-cursor-sdk, claiming it outperforms using either model alone. The linked GitHub repo is an open-source SDK that integrates Cursor models into the pi agent runtime.
Discussing two papers, ByteDance Seed's Cola DLM and MIT Kaiming He's ELF, which break the limitations of discrete tokens through a continuous diffusion paradigm, achieving better global planning and multimodal alignment.
PlanningBench is a framework for generating scalable, diverse, and verifiable planning data to evaluate and train large language models, featuring a constraint-driven synthesis pipeline with adaptive difficulty control and quality filtering. Experiments show that frontier LLMs struggle with coupled constraints, and reinforcement learning on PlanningBench data improves performance on unseen planning tasks.
An article comparing software development without planning to building a house without blueprints, emphasizing the importance of design and documentation.