Tag
This paper compares fine-tuned BERT (gbert-large) with few-shot LLM prompting (Llama 4 Maverick) for detecting threat and solution framing in German climate news sentences. BERT achieves higher F1 scores (0.83 vs 0.78), and an ablation study shows that providing preceding sentence context improves performance.
Introduces AnySimLite, a lightweight similarity encoder for on-device speech-adjacent classification tasks, achieving state-of-the-art or competitive performance while using less than 1/250th the model size of the qLLaMA-LoRA-7B baseline.
This study investigates whether instruction-tuned LLMs (Llama-3.1-8B, Qwen2.5-7B, Mistral-7B, Phi-3-mini) can reliably classify Correct Information Units in aphasic discourse transcripts. Few-shot prompting yields competitive F1 scores (0.776–0.817) for three models, but performance varies by severity and human agreement remains insufficient for fully autonomous use.
This paper investigates few-shot biomedical relation extraction using prompt-based learning with LLMs, comparing pairwise classification and joint generation approaches. The best model achieves micro-F1 of 0.44, outperforming previous few-shot results but remaining below supervised baselines, while macro-F1 surpasses the supervised baseline on rare relation types.
PrintGuard 2.0 is a major rewrite of a few-shot FDM fault detector using a ShuffleNetV2 backbone and prototypical network, now with a single Python engine that runs unmodified on both CPython and Pyodide in the browser via a platform abstraction layer, enabling per-printer sensitivity tuning and fair inference scheduling.
This paper proposes LLM-GNN Co-Teaching, a bidirectional framework for few-shot graph learning on text-attributed graphs. The LLM and GNN exchange confident pseudo-labels and use round-based preference optimization (RPL-PO) to mutually improve, outperforming prior methods on benchmarks.
Proposes Demo2Reward, a test-time prompt optimization technique for VLM reward models using a few expert demonstrations, significantly reducing false positives and improving policy learning in robotics without additional model training.
This paper presents a hybrid framework that combines structured clinical data with LLM-generated narratives for coronary artery disease prediction, achieving high fidelity in variable extraction and comparing ML models with LLM-based zero-shot and few-shot classification.
GraphARC is a new benchmark for abstract reasoning on graph-structured data, extending the ARC paradigm to graphs. Evaluations of state-of-the-art language models reveal a comprehension-execution gap and performance degradation on larger instances, highlighting scaling challenges.
This paper introduces ACIL, an automatic Chain-of-Thought framework to enhance In-Context Learning by generating and pruning reasoning chains, improving LLM performance on complex tasks.
This paper explores using few-shot prompted LLMs for actionable triage categorization of online patient inquiries into self-care, schedule-visit, urgent-clinician-review, or emergency-referral. The best model (Claude Haiku 4.5 with 12-shot prompting) achieves macro-F1 of 0.475, surpassing supervised baselines, but the authors conclude that LLMs can support triage prioritization and selective human review, not autonomous deployment.
FFAvatar proposes a feed-forward framework for reconstructing high-quality, animatable 3D Gaussian head avatars from few unposed images in seconds, achieving a 5.5 PSNR improvement over state-of-the-art on the NeRSemble benchmark.
FEST is a few-shot demonstration-guided reinforcement learning algorithm that achieves strong performance with minimal supervised fine-tuning data by combining supervised signals, on-policy learning, and weighted training to prevent overfitting.
Independent study shows 227M-parameter hypernetwork adds zero gain over well-crafted few-shot prompts for tool-use in 3B Llama, achieving 79.7% of GPT-5 performance at 10× lower latency.
FSPO proposes a few-shot preference optimization algorithm for LLM personalization that reframes reward modeling as meta-learning, enabling models to quickly infer personalized reward functions from limited user preferences. The method achieves 87% personalization performance on synthetic users and 70% on real users through careful synthetic preference dataset construction.