agents

#agents

@FinanceYF5: Someone went through all 196 companies and 395 founders of YC's 2026 spring batch. 95% of this batch use AI, 85% are AI-native: AI is not a feature added to the product, AI itself is the product. Out of 196, only 10 don't touch AI at all. But…

X AI KOLs Following ↗ · 2026-06-16 Cached

Someone analyzed the 196 startups from YC's 2026 spring batch and found that 95% use AI, 85% are AI-native, and the real keyword is agents rather than AI.

0 favorites 0 likes

#agents

Towards Verifiable Agentic Data Science: Solving Irregular TSQA Via Tool-Grounded Reasoning

arXiv cs.AI ↗ · 2026-06-16 Cached

Introduces IRTS-ToolBench, a benchmark of 1,700 questions for evaluating LLMs and AI agents on irregular time series question answering via tool-grounded reasoning, covering 10 task types across 13 domains.

0 favorites 0 likes

#agents

Kepler

Product Hunt ↗ · 2026-06-15

Kepler is an agentic development environment designed to run AI agents at scale, targeting developers who need to manage multiple agent workflows.

0 favorites 0 likes

#agents

@ericosiu: https://x.com/ericosiu/status/2066625875622129767

X AI KOLs Timeline ↗ · 2026-06-15 Cached

An article explaining how to build AI-driven 'loops' to automate revenue-generating business processes, citing insights from Boris Cherny (Claude Code) and Peter Steinberger (OpenClaw).

0 favorites 0 likes

#agents

@sidpalas: https://x.com/sidpalas/status/2066521471430574162

X AI KOLs Timeline ↗ · 2026-06-15 Cached

This post evaluates sandbox platforms for background agents, focusing on requirements like running real workloads, ingress, and cost. It outlines the Deputies sandbox provider interface and key considerations.

0 favorites 0 likes

#agents

AI education still feels stuck in the chatbot era

Reddit r/artificial ↗ · 2026-06-15

The article argues that AI education remains focused on basic chatbot and prompt skills, while real-world AI development has shifted towards building agents, systems integration, and robust software engineering, creating a significant gap for learners.

0 favorites 0 likes

#agents

WorkBench Revisited: Workplace Agents Two Years On

arXiv cs.CL ↗ · 2026-06-15 Cached

This paper revisits the WorkBench benchmark for workplace agents two years after its initial release, showing that the best agent (Claude Opus 4.8) now completes 89% of tasks with only 2.5% harmful side effects, compared to GPT-4's 43% completion and 26% harm rate in 2024. It finds that capability and safety improve together, open-weight models have drastically lowered costs, and some basic mistakes persist.

0 favorites 0 likes

#agents

CacheRL:Multi-Turn Tool-Calling Agents via Cached Rollouts and Hybrid Reward

arXiv cs.CL ↗ · 2026-06-15 Cached

CacheRL trains small agent foundation models for multi-step tool-calling tasks, achieving 92% process accuracy (approaching GPT-5's 94%) with 100x less compute using cached rollouts and hybrid reward shaping, with innovations in knowledge transfer, cache-aware rewards, and iterative SFT/GRPO training.

0 favorites 0 likes

#agents

@omarsar0: To use an LLM Council with your own agent, check out my llm-council skill. It works with Fireworks AI APIs, but you can…

X AI KOLs Timeline ↗ · 2026-06-14 Cached

DAIR Academy Plugins is an open-source marketplace of plugins for Claude Code, including an llm-council skill that orchestrates multiple open-weight LLMs via Fireworks AI.

0 favorites 0 likes

#agents

@PierceZhang34: Sharing an open collaborative repository focused on AI-assisted research: Awesome Vibe Research. The core goal is to collect and curate reusable, verifiable, and evolvable AI-assisted components across the full research workflow (from idea generation to paper publication and dissemination), including: Agents, Skills...

X AI KOLs Timeline ↗ · 2026-06-14 Cached

Shared an open collaborative repository Awesome Vibe Research maintained by ModelScope. This repository collects and curates reusable, verifiable, and evolvable AI-assisted components across the full research workflow, including agents, skills, workflows, tools, and best practices. It aims to help researchers and developers leverage AI to improve research efficiency.

0 favorites 0 likes

#agents

When your agent screws up in production, how do you figure out which step went wrong?

Reddit r/AI_Agents ↗ · 2026-06-14

A developer shares the challenge of debugging multi-step agents in production, where failures are hard to trace due to complex tool use and confident wrong answers, and asks the community for better monitoring and regression detection approaches.

0 favorites 0 likes

#agents

@itsclelia: Had a lot of fun talking about retrieval in the agent of agents at the Vector Space meetup in Berlin on Thursday! Toget…

X AI KOLs Following ↗ · 2026-06-12

Clelia enjoyed speaking about retrieval in agent systems at the Vector Space meetup in Berlin, organized by Qdrant, with deepset, cognee, and n8n.

0 favorites 0 likes

#agents

@browser_use: Imagine thousands of agents working for you on the web This is why we rebuilt Browser Use: > Made for long-running task…

X AI KOLs Following ↗ · 2026-06-12 Cached

Browser Use 0.13.0 beta is rebuilt in Rust for long-running web agent tasks, featuring a custom LLM harness and a new terminal interface.

0 favorites 0 likes

#agents

Mythos Begets Fable, Cursor's Composer 2.5, Agents Building Agents

The Batch ↗ · 2026-06-12 Cached

Andrew Ng discusses the rise of desktop AI agents and coding CLI tools, introduces the open-source OpenCoworker project, and examines agent harness designs where LLMs drive autonomous task execution.

0 favorites 0 likes

#agents

BEAM benchmarks

Reddit r/AI_Agents ↗ · 2026-06-12

Midas achieves 0.56 recall@k on BEAM 100K and 0.51 on BEAM 500K with zero LLM calls and zero cost, demonstrating efficient long-term memory for agents.

0 favorites 0 likes

#agents

TerraBench: Can Agents Reason Over Heterogeneous Earth-System Data?

arXiv cs.AI ↗ · 2026-06-12 Cached

TerraBench is a new benchmark for evaluating AI agents' ability to reason over heterogeneous Earth-system data, including gridded data, satellite imagery, and simulator outputs. It reveals significant limitations in current frontier models, with top performers achieving only 59.2% tool-use score on average.

0 favorites 0 likes

#agents

Multi-Modal Agents for Power Distribution Defect Detection: An Evaluation of Foundation Models

arXiv cs.AI ↗ · 2026-06-12 Cached

This paper introduces a Multi-Modal Agent framework for power distribution defect detection, evaluating foundation models on perception, reasoning, and tool usage capabilities, with a new domain-specific dataset and benchmark.

0 favorites 0 likes

#agents

I think long context agents are failing in a very boring way

Reddit r/artificial ↗ · 2026-06-12

An opinion piece arguing that long context windows don't equate to memory and that agent failures are often mundane, like forgetting constraints or rereading files, emphasizing that reliability depends on context architecture decisions.

0 favorites 0 likes

#agents

@jeffreyliu_05: Maybe the best article on building good agents out there

X AI KOLs Following ↗ · 2026-06-11

A tweet recommends an article on building good AI agents, implying it is highly valuable for developers.

0 favorites 0 likes

#agents

Novu Connect

Product Hunt ↗ · 2026-06-11

Novu Connect enables users to ship agents where their users already work.

0 favorites 0 likes

agents

Submit Feedback