Tag
Ultralytics YOLO26 introduces a unified real-time vision model family with NMS-free inference, improved training strategies, and multi-task capabilities for detection, segmentation, and pose estimation, achieving state-of-the-art accuracy-latency trade-offs.
This paper presents a new framework for model merging that casts the problem as a convex quadratic program over residual updates, minimizing a squared-output calibration objective. It subsumes existing heuristic methods and provides a closed-form diagnostic to predict merge quality, showing consistent gains on language and vision benchmarks.
Qwen-VLA is a unified vision-language-action model for embodied decision-making, integrating manipulation, navigation, and trajectory prediction across different robot platforms. It uses a DiT-based action decoder and embodiment-aware prompt conditioning, achieving strong performance and out-of-distribution generalization.
This paper introduces AsyncTool, a benchmark for evaluating LLM-based agents' asynchronous function calling abilities in multi-task scenarios with delayed tool responses. It proposes efficiency-oriented metrics and identifies key failure modes of current tool-using agents.
A self-taught developer asks for advice on choosing between 3B and 7B models for a first multi-task fine-tuning project focused on deeper reasoning about underlying questions.