Tag
i10X has launched Superagent, an AI Chief of Staff that automates business goals by coordinating multiple tools and agents, with human approval gates for critical actions.
A guide on building a reusable Claude Code Agent loop that can be pointed at different tasks like bug fixing, speed optimization, or cost reduction by swapping check scripts.
DeepSeek-V4-Fable is a distilled variant of Claude-5-Fable built on DeepSeek-V4-Flash, designed for autonomous offensive security research, CTF problem solving, and controlled environment exploitation planning, with strict authorization requirements.
This article shares practical experience of using Codex /goal mode for long-term unattended programming, including how to write effective prompts, using persistent project memory to prevent deviation, and key settings and precautions.
DeepSeek researcher Deli Chen open-sourced Deli AutoResearch SKILL, a SKILL.md protocol file that defines the operating rules for AI's long-term autonomous research, including state persistence, stagnation detection, heartbeat mechanism, etc., aiming to decompose autonomous scientific research from a vision into a sustainable engineering closed loop.
作者构建了一个基于GPT-5.5的自主Codex代理循环运行器,用于测试,目前处于公开测试阶段,提供50次免费运行机会。
Skales is a private, local AI desktop agent for Windows, macOS, Linux, and Android that performs autonomous tasks, supports multiple AI providers, and emphasizes privacy with offline capabilities.
depthfirst's autonomous security agent discovered 21 zero-day vulnerabilities in FFmpeg, including several that had remained latent for 15-20 years, with a proof-of-concept demonstrating remote code execution. The findings highlight the capability of AI-driven security agents to uncover critical bugs that evaded previous intensive analyses by Google and Anthropic.
Microsoft Research introduces Arbor, a generalist autonomous research agent that uses persistent hypothesis-tree refinement for cumulative learning, outperforming Codex and Claude Code across six research tasks and achieving 86% Any-Medal on MLE-Bench Lite.
This paper presents Moonshine, an autonomous mathematical research agent that generates conjectures, exemplified by deriving the Neural Jacobian Conjecture from the classical Jacobian conjecture and proving a special case using LLMs.
Hermes Agent by Nous Research is an open-source autonomous AI agent that runs persistently on a server, remembers every conversation across sessions, and autonomously creates skill files, making it a fundamentally different category of agent compared to session-based coding tools like Claude Code and Cursor.
An AI agent named Annie autonomously recompiled a Pokémon Ruby GBA ROM into a full hybrid WASM recompiler and GBA runtime, completing a task that would normally take an expert team months and cost tens of thousands of dollars.
An autonomous research agent built by Weco became the top contributor by volume of merged records in OpenAI's Parameter Golf competition, demonstrating effective human-agent collaboration.
Oh My Hermes is a workflow layer for the Hermes AI agent, upgrading it into a development and operations partner that can automatically complete 20 skills including requirement clarification, coding, deployment, and operations, supporting 5 clearly divided agents working collaboratively.
Introduces Benchmark Agent, a fully autonomous system for creating diverse benchmarks with minimal human intervention, enabling continuous model assessment across domains.
In OpenAI's Parameter Golf hiring challenge, an autonomous research agent named Aiden outperformed all 1,016 human participants after running for 22 days.
Nous Research releases a desktop update for Hermes Agent, transforming it into an always-on autonomous AI employee that can replace a human chief of staff, with 166 skills and memory persistence.
An agent from an Artificial Life simulation is adapted into a real autonomous agent that runs on a laptop with file system, code execution, browser control, and task management, exhibiting persistent internal drive.
EvoDS is a self-evolving autonomous data science agent that improves via reinforcement learning-driven skill acquisition and adaptive context compression, outperforming open-source agents by 28.9% on benchmarks.
The author used Claude Code with a browser extension to autonomously create an 18-minute tutorial of their app by walking through a shot list, with some steps requiring human intervention.