agent-training

Tag

Cards List
#agent-training

EnvSimBench: A Benchmark for Evaluating and Improving LLM-Based Environment Simulation

arXiv cs.AI · 2d ago Cached

This paper introduces EnvSimBench, a benchmark for evaluating Large Language Models' ability to simulate environments for agent training. It identifies a 'state change cliff' in current LLMs and proposes a constraint-driven pipeline to reduce hallucinations and costs.

0 favorites 0 likes
#agent-training

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

arXiv cs.CL · 2026-04-20 Cached

EnvScaler is an automated framework for scaling tool-interactive environments for LLM agents through programmatic synthesis, creating 191 diverse environments and 7K scenarios to improve agent performance on multi-turn, multi-tool interactions.

0 favorites 0 likes
#agent-training

CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution

arXiv cs.CL · 2026-04-20 Cached

CoEvolve proposes an agent-data mutual evolution framework for training LLM agents through closed-loop, interaction-driven learning that adapts both the agent and its training data distribution. The method extracts feedback signals from rollout trajectories to guide LLM-based task synthesis, demonstrating significant improvements (15-19% absolute gains) across multiple Qwen models on AppWorld and BFCL benchmarks.

0 favorites 0 likes
#agent-training

Mind DeepResearch Technical Report

Hugging Face Daily Papers · 2026-04-17 Cached

MindDR is a multi-agent deep research framework using a three-agent architecture (Planning, DeepSearch, Report) and a four-stage training pipeline, achieving competitive performance with ~30B-parameter models on multiple benchmarks. Developed by Li Auto and deployed as an online product, it also introduces MindDR Bench, a 500-query Chinese benchmark for evaluating deep research capabilities.

0 favorites 0 likes
#agent-training

Universe

OpenAI Blog · 2016-12-05 Cached

OpenAI Universe is a software platform that enables RL agents to interact with any existing computer program through a VNC-based interface, supporting parallel environment execution at 60 FPS with human demonstrations and automated reward extraction.

0 favorites 0 likes
← Back to home

Submit Feedback