Tag
Introduces SFL-MTSC, a structured aggregation framework for robust multi-intent spoken language understanding using LLM self-consistency at the semantic frame level, showing improved slot F1 and overall accuracy on the MAC-SLU benchmark.
VeryTrace is a zero-shot verification-and-repair framework that formalizes LLM reasoning traces into a compilable representation using a DSL, enabling step-level error localization through a hybrid of deterministic checks and LLM audits. It improves accuracy across math, robotics, and relational reasoning without domain-specific training.
This paper evaluates 42 large language models on their ability to measure item discrimination in reading comprehension assessments, finding weak alignment with human-calibrated measures and highlighting it as an open challenge for psychometric evaluation.
NAVI-Orbital demonstrates the first in-orbit deployment of a zero-shot vision-language model (Gemma 3) on a LEO satellite, enabling autonomous scene classification and semantic compression of Earth observation data without fine-tuning.
Google has released TimesFM, a time series forecasting model trained on 100 billion real-world time series data, supporting zero-shot prediction. It is free, open-source, and can run locally on ordinary computers.
JanusMesh is a fast, training-free framework that generates text-driven 3D visual illusions—a single mesh revealing different semantics from different viewing angles—by decoupling generation into cross-space dual-branch denoising and view-conditioned texture synthesis, achieving high realism in just 3-5 minutes.
Google has released TimesFM, an AI model for zero-shot time series forecasting, trained on 100 billion real data points, free and open-source.
This paper adapts AI Safety Gridworlds to text-based evaluation and finds that language model agents exhibit zero-shot reward hacking across scales, which is not corrected by standard RL mitigations.
MV3DT is a fully distributed multi-view 3D tracking framework. Through peer-to-peer coordination, it eliminates the compute bottleneck of centralized fusion, running at 30FPS on 100 cameras with only 2.2% communication overhead. It can be deployed with zero-shot calibration, achieving performance equal to or surpassing centralized methods.
A flow-matching model generates diverse human grasps from RGB-D images, enabling zero-shot robotic grasping with improved performance over existing methods. The model, trained on a large egocentric dataset, significantly outperforms state-of-the-art baselines on a new benchmark.
This paper introduces SP³, a method using Spherical Encoder priors for Plug-and-Play image restoration, achieving perceptual quality comparable to zero-shot diffusion priors while being 3–630× faster across tasks.
Introduces Flow Reversal Steering (FRS), a method to refine coarse actions from semantic reasoning into precise robot actions by reversing and re-denoising through a flow-matching generalist policy, improving zero-shot control and enabling policy learning.
This study reveals that LLM text embeddings are hijacked by high-frequency tokens (e.g., periods, articles) and proposes EmbedFilter, which performs SVD on the unembedding matrix and subtracts the projection component to release true semantics, achieving zero-training-cost dimensionality reduction and retrieval efficiency gains.
This paper introduces MVEB, a large-scale benchmark for evaluating video embeddings across 23 tasks, finding that no single model dominates and that audio's contribution depends on dataset annotation provenance. It integrates into the MTEB ecosystem for unified multimodal evaluation.
The article introduces a technique that extracts hidden states from an LLM at the last prompt token to perform classification without text generation, using a small MLP to read the model's internal decision, enabling fast and cheap zero-shot classifiers.
This paper introduces Sim2Schedule, a simulator-guided LLM framework for autonomous open-pit mine scheduling that achieves 94-99% of the optimal NPV from MILP while scaling linearly in computation time, operating zero-shot without fine-tuning.
World Pilot enhances Vision-Language-Action models by incorporating dynamic scene evolution and trajectory priors from a World-Action Model, achieving state-of-the-art zero-shot performance on manipulation tasks.
This paper proposes SRT (Super-Resolution for Time Series), a framework that reconstructs high-resolution temporal patterns from low-resolution inputs using a disentangled rectified flow approach. The method decomposes input into trend and seasonal components, applies implicit neural representation for resolution alignment, and introduces cross-resolution attention to generate fine-grained details, achieving state-of-the-art performance on multiple datasets.
This paper investigates how reasoning models perform zero-shot multi-label classification over millions of candidate labels. The authors characterize a two-phase process of shortlisting and fine-grained reasoning, and propose a mechanistic distillation method that outperforms standard distillation for transferring these capabilities to smaller models.
This paper introduces Zero-Shot Embedding Drift Detection (ZEDD), a lightweight framework that detects prompt injection attacks in LLMs by measuring semantic shifts in embedding space, achieving over 93% accuracy with less than 3% false positive rate across multiple architectures.