Tag
Proposes BindingSubspace (BSU), a representation-level framework that isolates and attenuates intent-conditioned directions in end-to-end spoken language understanding models to prevent capability persistence, where suppressing an intent still allows slot generation under forced prefixes. The method reduces forced-prefix recoverability while preserving retained performance on SLU benchmarks.
本文介绍ORAgentBench,一个用于评估LLM代理在端到端运筹学任务中表现的执行基准,包含107个经过人工审查的任务。实验表明,当前最佳代理仅通过35.51%的任务,揭示了在可靠决策制定方面的重大不足。
This paper investigates speech-driven features for fine-grained discrimination among Chinese dialects, using an end-to-end model that combines MFCC-based features with word-level embeddings via a CNN, outperforming text-driven methods.
Researchers at Nvidia Gear Lab achieved a milestone where 8 Codex-AutoResearch agents autonomously controlled a robot fleet to solve a physical world task without human intervention, demonstrating self-improvement.
A paper presenting The AI Scientist, a system that automates the entire research lifecycle from idea generation to peer review, demonstrating AI's growing capacity for scientific contribution.
HyVLA-0.5 is an end-to-end robotic learning system that integrates data collection, model design, pre-training, fine-tuning, and reinforcement learning for real-world deployment.
SCAIL-2 is an open-source model for end-to-end controlled character animation that animates a reference character with a driving video, supporting character replacement and multi-character scenarios without intermediate pose representations.
SCAIL-2 is a framework that achieves end-to-end controlled character animation by directly transferring motion from driving videos without intermediate representations, using unified task decomposition, synthetic data (MotionPair-60K), and novel conditioning techniques like in-context mask conditioning and Bias-Aware DPO.
LLMBridge introduces an LLM-based pipeline for end-to-end referential bridging resolution, achieving state-of-the-art performance on three English datasets. The system combines heuristic pre/post-processing with LLM natural language inference.
LELA is an LLM-based entity linking framework that combines zero-shot NER and entity disambiguation into an end-to-end Python library, validated across diverse settings.
FormalASR presents two compact end-to-end models that directly transcribe spoken Chinese into formal written text, achieving significant error reduction and eliminating the need for a separate LLM post-processing stage, enabling lightweight on-device deployment.
RankE introduces an end-to-end post-training framework for discrete text-to-image generation that jointly optimizes both the generator and decoder to address the latent covariate shift problem, improving alignment and fidelity simultaneously.
Reflecting on the fragmented AI tool landscape of 2023-24, the user highlights the arrival of Higgsfield AI's Supercomputer, a cloud-native AI agent that consolidates 40+ tools for end-to-end task execution.
MetaAgent-X introduces an end-to-end reinforcement learning framework that jointly optimizes the design and execution of automatic multi-agent systems, overcoming the frozen-executor ceiling and achieving up to 21.7% gains over existing baselines.
This paper presents a calculus-based framework that uses first and second derivative tests to estimate the optimal vocabulary size hyper-parameter for end-to-end ASR systems, improving performance on the Librispeech corpus.
Higgsfield is an all-in-one AI video platform handling character consistency, generation, audio, and distribution, contrasting with single-model specialists like Kling, Runway, and Veo. The discussion questions whether vertical integration or specialized quality will dominate AI video production.
EVA-Bench introduces a comprehensive end-to-end framework for evaluating voice agents, simulating realistic multi-turn conversations and measuring performance across voice-specific failure modes with novel accuracy (EVA-A) and experience (EVA-X) metrics. The benchmark includes 213 scenarios across enterprise domains and a perturbation suite for accent and noise robustness, revealing substantial gaps in current systems.
This post shares a curated GitHub repository containing over 30 practical AI projects, covering domains from regression to generative AI, with many end-to-end examples, suitable for learners and developers.
Announces liquid-audio, an open-source repository for Liquid AI's end-to-end speech-to-speech LFM models (LFM2-Audio-1.5B and LFM2.5-Audio-1.5B) with interleaved and sequential generation modes and fine-tuning support.
L2P proposes an efficient transfer paradigm that leverages pre-trained latent diffusion models to build pixel-space diffusion models, enabling high-quality generation with minimal computational overhead and data requirements, and supporting native 4K resolution.