planning

#planning

@GitTrend0x: Hermes Aesthetics + Planning + Fantasy Triple Threat Plugins! Hermes Skins custom themes, Planning-with-Files persistent planning, Draw.io automatic flowchart skill, Litprog literate programming, Wizards-of-th…

X AI KOLs Timeline ↗ · 2d ago Cached

Introducing multiple Hermes plugins: theme skins, persistent planning, Draw.io automatic flowcharts, literate programming skill pack, fantasy skill lab, etc., turning Hermes into a versatile terminal and intelligent planning tool.

0 favorites 0 likes

#planning

Qwen 27B for planning, Qwen 35B-A3B for execution?

Reddit r/LocalLLaMA ↗ · 3d ago

Discusses using Qwen 27B for planning tasks and Qwen 35B-A3B for execution tasks, suggesting a specialized model approach.

0 favorites 0 likes

#planning

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

Hugging Face Daily Papers ↗ · 3d ago Cached

PlanBench-XL is a new benchmark that evaluates LLM agents' ability to plan and adapt in large tool ecosystems with limited visibility and dynamic disruptions. Experiments show GPT-5.4 achieves only 51.9% accuracy in block-free settings and collapses to 11.36% under severe blocking, highlighting significant challenges in long-horizon planning.

0 favorites 0 likes

#planning

@kentcdodds: More on planning with real business context:

X AI KOLs Following ↗ · 4d ago Cached

A discussion between Kent C. Dodds and Sean Roberts on product engineering, planning with real business context, and the importance of conversations and curiosity over pure data.

0 favorites 0 likes

#planning

How Should World Models Be Evaluated? A Decision-Making-Centric Position

arXiv cs.LG ↗ · 2026-06-16 Cached

This paper surveys evaluation methods for world models and argues for a decision-making-centric framework that prioritizes counterfactual reasoning, planning, and policy optimization over visual quality. It introduces an L0–L7 evaluation ladder and a benchmark protocol to align evaluation with claimed utility.

0 favorites 0 likes

#planning

CEO-Bench: Can Agents Play the Long Game?

Hugging Face Daily Papers ↗ · 2026-06-16 Cached

CEO-Bench introduces a simulation benchmark that evaluates language model agents' ability to manage a startup over 500 days, testing long-term planning, noise handling, adaptability, and multi-task coordination. Results show that even the strongest models struggle, with only Claude Opus 4.8 and GPT-5.5 finishing above the starting balance.

0 favorites 0 likes

#planning

@mattpocockuk: Cooking a /decision-mapping skill, for splitting planning into multiple sessions Kind of like /to-issues, but for plann…

X AI KOLs Following ↗ · 2026-06-15 Cached

Matt Pocock introduces a decision-mapping skill to split planning into multiple sessions, similar to /to-issues, aiming to streamline greenfield and brownfield builds.

0 favorites 0 likes

#planning

An agent that plans with a frontier model but runs most of tokens locally (built it for my own dual-3090 rig)

Reddit r/LocalLLaMA ↗ · 2026-06-15

The author built a personal AI agent that uses a frontier model (Codex) for high-level planning while running most token processing locally on a dual RTX 3090 system, enabling long-duration tasks with deterministic validation. The agent supports three swappable tiers: planner, local, and senior, and is available as an open-source repository.

1 favorites 0 likes

#planning

Causal Object-Centric Models for Planning with Monte Carlo Tree Search

arXiv cs.AI ↗ · 2026-06-15 Cached

COMET is a model-based reinforcement learning algorithm that combines a frozen object-centric encoder with a transformer-based world model and Monte Carlo Tree Search, using causal attention to focus on task-relevant objects, achieving higher scores on visual RL benchmarks.

0 favorites 0 likes

#planning

Deep Work Plan

Product Hunt ↗ · 2026-06-15

Deep Work Plan is a product that helps users provide their AI agents with a structured plan, emphasizing the importance of context over models.

0 favorites 0 likes

#planning

@omarsar0: Same here. Happy with Opus 4.8 (planning) and GPT-5.5 (execution). Also, breaking steps into smaller ones for increasin…

X AI KOLs Following ↗ · 2026-06-11 Cached

A developer shares satisfaction with Opus 4.8 for planning and GPT-5.5 for execution, emphasizing that breaking tasks into smaller steps improves quality and that dynamic workflows are underrated.

0 favorites 0 likes

#planning

SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning

arXiv cs.AI ↗ · 2026-06-11 Cached

The paper proposes SVoT, a reinforcement learning framework that generates interleaved, verifiable intermediate states and visualizations for multi-hop spatial reasoning in MLLMs, achieving significant accuracy gains on new benchmarks involving multi-object interactions and numerical reasoning.

0 favorites 0 likes

#planning

Has anyone deployed a multi-agent AI employee in production?

Reddit r/AI_Agents ↗ · 2026-06-10

A discussion about deploying multi-agent AI systems in production, where different agents handle planning, execution, communication, and project management, asking about real-world experiences and bottlenecks.

0 favorites 0 likes

#planning

Beyond APIs: Probing the Limits of MLLMs in Physical Tool Use

arXiv cs.CL ↗ · 2026-06-10 Cached

This paper introduces PhysTool-Bench, a benchmark for evaluating multimodal large language models' ability to recognize and plan the use of physical tools in real-world scenes. The authors find that even the best model identifies only 58.7% of tools and completes just 21.0% of queries end-to-end, revealing a two-level deficit in perception and functional commonsense.

0 favorites 0 likes

#planning

Front-to-Attractors: Modifying the Front-to-Front Heuristic in Bidirectional Search

arXiv cs.AI ↗ · 2026-06-08 Cached

Introduces front-to-attractors (F2A), a new heuristic class for bidirectional search that reduces computational cost by evaluating distances to a small set of attractors instead of the full opposite frontier, achieving up to 11.2x fewer pairwise evaluations and 4.8x fewer node expansions than existing methods.

0 favorites 0 likes

#planning

Bridging the Agent-World Gap: Text World Models for LLM-based Agents

Hugging Face Daily Papers ↗ · 2026-06-08 Cached

This paper systematically reviews text world models for LLM-based agents, covering foundations, construction paradigms, applications in planning and training, and evaluation methods.

0 favorites 0 likes

#planning

Stride

Product Hunt ↗ · 2026-06-06

Stride is an AI-powered workspace that assists with planning, designing, and shipping projects.

0 favorites 0 likes

#planning

we stopped letting agents plan 3 steps ahead, reliability got better fast

Reddit r/AI_Agents ↗ · 2026-06-02

A practitioner observes that limiting AI agents to plan only one step ahead instead of multiple steps significantly improves reliability in real-world automation workflows involving CRM and lead qualification, as long-range plans become brittle when external state changes.

0 favorites 0 likes

#planning

Efficient Test-time Inference for Generative Planning Models

arXiv cs.AI ↗ · 2026-06-02 Cached

This paper introduces OCLGen, a compute-efficient test-time search algorithm that integrates generative planning models with a classical Open-Closed List framework, improving solution quality across combinatorial planning domains.

0 favorites 0 likes

#planning

World Models: A Comprehensive Survey of Architectures, Methodologies, Reasoning Paradigms, and Applications

arXiv cs.LG ↗ · 2026-06-02 Cached

A comprehensive survey of world models that provides a multi-axis taxonomy covering architectures, methodologies, reasoning strategies, and applications across AI domains, including key systems like Dreamer, MuZero, and Sora.

0 favorites 0 likes

planning

Submit Feedback