Tag
This paper presents an automated pipeline for optimizing natural language skill descriptions in enterprise AI agents to resolve skill collisions, achieving performance matching manual tuning with a 32× speedup. Ablation studies show that a single LLM rewrite using error cases captures most improvements, while other design choices have minimal impact.
A Twitter thread highlights key takeaways from a Latent.Space podcast episode with Databricks co-founders, covering why Databricks beat Snowflake, the rise of metaharners, Neon's success, HTAP via LTAP, MosaicML's fate, and maintaining startup culture in a large company.
EnterpriseClawBench presents a benchmark for enterprise agents based on real-world workplace sessions, offering 852 reproducible tasks and comprehensive evaluation metrics beyond single performance scores.
This paper introduces Queen-Bee, a governed multi-agent architecture for enterprise MCP orchestration that separates planning and execution via a BeeSpec intermediate representation, achieving high task success rates with zero governance failures in prototype evaluations.
Tavily, Gradium, Nebius, and Cursor are hosting a full-day hackathon in Berlin on May 29th focused on building autonomous AI agents that can transact and execute. The event includes tech talks, building sessions, and prizes.
LangChain announced SmithDB, a distributed database for agent observability, Context Hub for managing agent context with an open memory standard, and Deep Agents v0.6 at Interrupt 2026, alongside enterprise case studies and keynotes by Andrew Ng and Harrison Chase.
An enterprise agent developer discusses the trade-offs of using open-source models like Ling 1T 2.6, highlighting the high overhead of optimization and benchmarking compared to proprietary APIs.