Tag
Qwen releases Qwen-AgentWorld-35B-A3B, a native language world model that simulates agentic environments across seven domains via long chain-of-thought reasoning. The model is trained with a three-stage pipeline and supports MCP, Search, Terminal, SWE, Android, Web, and OS interactions.
LOGOS is a scientific generative language model that encodes diverse scientific objects and spatial interactions as token sequences, enabling a unified autoregressive framework for tasks across natural sciences. Models at 1B, 3B, and 8B parameters show consistent performance scaling and are released to facilitate research.
Count Anything is a generalist model for text-guided object counting that unifies multiple domains, supported by the new CLOC dataset with 220K images across six visual domains. It achieves strong accuracy and multi-domain generalization.
This paper introduces DoRA-RBAC, a framework for composing LLM adapters, and tests whether geometry-aware merging improves multi-domain performance. Results show no consistent advantage over standard averaging, suggesting adapter interference is not primarily driven by parameter-space geometry.
Arbor is an AI framework for autonomous scientific research that uses a coordinator, executors, and a persistent hypothesis tree to iteratively improve research outcomes across multiple domains, achieving strong results on six real research tasks.
SoCRATES introduces a realistic multi-domain benchmark for evaluating proactive LLM mediators, showing that top models resolve only about one-third of the consensus gap in conflict resolution.
This paper proposes a local perturbation theory to explain cross-domain interference in multi-domain RL for LLMs, showing that interference is driven by a second-order damage term in a low-dimensional conflict subspace, and demonstrates that brief domain refresh or training-free rollback can selectively recover lost capabilities.
Count Anything is a generalist vision model for text-guided object counting across multiple domains, using dual-granularity instance enumeration and complementary counting fusion. It achieves strong accuracy and cross-domain generalization, outperforming existing open-world counting methods.
This paper proposes TopoPrior, a framework that learns transferable topology priors from offline reference collaboration graphs to generate initial topologies for multi-agent LLM collaboration across domains, significantly reducing online search overhead and token consumption.
Google is updating the Gemini Interactions API to replace strict user/model roles with a flexible step-based system (outputs + roles → steps), introducing agentic steps like user_input, thought, function_call, tool_call, and model_output. The update also consolidates response_format controls and requires SDK upgrades (Python/JS ≥2.0.0) or a new API header to opt-in.