Tag
This paper introduces GATHER, a convergence-centric retrieval method for zero-shot cell-type annotation using knowledge graphs, which improves accuracy and reduces LLM costs compared to existing KG-RAG baselines.
This paper introduces DiGSeg, a framework that repurposes pretrained diffusion models for state-of-the-art semantic and open-vocabulary segmentation by leveraging latent space conditioning and text-guided alignment.
Researchers present a zero-shot LLM system that assesses depression risk from Reddit posts, achieving competitive F1 scores and demonstrating scalable mental-health monitoring.
DeVI introduces a framework that turns text-conditioned synthetic videos into physically plausible dexterous robot control via a hybrid 3D-2D tracking reward, enabling zero-shot generalization to unseen objects.
Google open-sourced TimesFM 2.5, a 200 M-parameter, 16 k-context zero-shot time-series forecasting base model that works straight out of the box on historical data.
QuantAgent is a multi-agent LLM framework designed specifically for high-frequency trading, using four specialized agents (Indicator, Pattern, Trend, Risk) to make rapid, risk-aware decisions based on short-horizon signals. In zero-shot evaluations across ten financial instruments including Bitcoin and Nasdaq futures, it outperforms existing neural and rule-based baselines in predictive accuracy and cumulative return.
This article presents a research paper on Time-Series Foundation Model (TimeFM), a decoder-only model that achieves near-optimal zero-shot performance across diverse time-series datasets by adapting large language model techniques.
OpenAI introduces Whisper, an end-to-end encoder-decoder Transformer model trained on large-scale diverse audio data for robust multilingual speech recognition, language identification, and speech-to-English translation. Whisper achieves 50% fewer errors than specialized models on diverse datasets and outperforms supervised benchmarks on speech translation despite not being fine-tuned to specific datasets.
CLIP is OpenAI's vision-language model that learns from text-image pairs from the internet, enabling zero-shot visual classification without task-specific training data. It addresses major limitations in traditional computer vision by reducing dependence on expensive labeled datasets and improving real-world generalization.