Tag
Presented at arXiv, DeltaBox introduces OS-level mechanisms (DeltaFS and DeltaCR) for millisecond-level checkpoint and rollback in stateful AI agents by only duplicating changes between consecutive states, achieving 14ms checkpoint and 5ms rollback on SWE-bench and enabling significantly deeper tree search within fixed time budgets.
A developer built Terrarium, an open-source sandboxing solution for running multiple AI agents securely with isolated worlds, reverse-proxy management, and state rollback.
Network allow-lists are insufficient to prevent data exfiltration via authorized channels like DNS or allowed endpoints. Canister, a lightweight Linux sandbox, addresses this with a layer-7 egress proxy that performs TLS interception and data-loss prevention.
Edge Python is a 170 KB WASM-based sandboxed Python subset that runs agent-generated code directly in the browser without a server, supporting classes, async/await, pattern matching, and more.
A user demonstrates running Microsoft Word inside a Modal sandbox on the day of Modal's Series C funding announcement.
A security researcher examines the C# sandboxing in S&Box (Garry's Mod 2), which uses an API whitelist instead of a hardened runtime. By modifying the compile blacklist, they bypass the restrictions and crash the editor, demonstrating that the approach is insecure despite being similar to Space Station 14's system.
LangSmith introduces an Auth Proxy to secure network access for agent sandboxes, keeping credentials out of the runtime and enforcing explicit network access policies.
A team reverse-engineered Docker's undocumented MicroVM API used by Docker Sandboxes and built the open-source Sandbox Agent SDK to orchestrate AI coding agents inside microVMs for secure untrusted code execution.
Phil Schmid announces Managed Agents in the Gemini API, enabling one-call agents with code execution, web browsing, and file management in isolated sandboxes, powered by Gemini 3.5 Flash.
Capsule is a Python framework that provides infrastructure primitives like sandboxes, auth, session management, integrations, and payments for AI apps, aiming to simplify deployment and iteration.
A tweet from LangChain referencing an answer by Shevchenkoaalex of TryRamp about whether an agent should be inside or outside a sandbox, likely discussing security or deployment patterns.
A tweet showcases a demo where a single prompt generates a playable open-city sandbox game reminiscent of GTA 6, hinting at AGI-level capability.
A developer discusses challenges with state persistence in long-running coding agents using sandbox environments, detailing the costly resume overhead and seeking community solutions for persistent state handling without custom checkpointing layers.
LiteLLM is open-sourcing its Agent Platform, allowing developers to run coding agents like Claude Code, Codex, and Hermes in isolated Kubernetes sandboxes without exposing real API keys.
Markokraemer announces SandboxAgent, an opencode-based runtime that runs in sandboxes with remote session storage and git-native versioning for centralized data and isolated operation.
The author introduces OpenSteer, a cloud agent platform that allows users to create and customize specialized agents for automating tasks across websites and services, with support for cloud browsers, APIs, MCPs, and CLIs, demonstrated through a sales automation agent.
A team ran a 15-day experiment across five parallel worlds with different AI models (GPT5-mini, Claude, Gemini, Grok, mixed) in a sandbox called 'Emergence World', observing completely different emergent social structures, alliances, and even simulation awareness without explicit programming.
The article explores the idea of an open-source layer to orchestrate CLI usage by AI agents, addressing challenges like permissions, sandboxing, and audit trails when agents interact with multiple CLIs.
OpenAI is improving agent support on Windows by implementing a custom sandbox for Codex, addressing OS-level isolation challenges to ensure safe and efficient operation.
Duetchat introduces Duet Agent, a new harness for running long-duration AI agent tasks with state machine relay, memory compaction, and a stateless runner for sandboxes.