Tag
Introduces OGCaReBench, a free-form retrieval benchmark for evaluating LLMs on clinical questions that require reasoning beyond standard guidelines. Experiments show that even the best model achieves only 56% accuracy, but retrieval augmentation boosts performance to 82%.
University of Florida Gators submission to the AmericasNLP 2026 shared task on cultural image captioning for Indigenous languages, using a two-stage pipeline with Qwen2.5-VL for Spanish captioning and retrieval-augmented Gemini 2.5 Flash for target-language translation, achieving significant improvements over the baseline.
A systematic study on detecting Schwartz values in political text, comparing context lengths, model sizes, and retrieval-augmented generation methods. Results show that full-document context improves supervised models but not zero-shot LLMs, while retrieved moral knowledge consistently helps via early fusion.
BELIEF is a structured evidence modeling and uncertainty-aware fusion framework for biomedical question answering that converts retrieved documents into evidence objects and combines symbolic Dempster-Shafer reasoning with LLM-based inference. Experiments on PubMedQA, MedQA, and MedMCQA show BELIEF achieves state-of-the-art results in the majority of settings.
Lean Refactor presents a retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs, achieving significant compression and compilation-time reduction.
A novel memory retrieval system inspired by episodic memory theory achieves state-of-the-art 96.4% top-50 accuracy on the LongMemEval benchmark using Gemini Flash, outperforming larger Pro-based baselines by isolating retrieval quality from model capability.
This paper evaluates six open-weight LLMs on biomedical QA under conflicting evidence conditions, revealing accuracy drops and prediction flips, and proposes a conflict-aware abstention score that improves selective accuracy.
EviMem combines IRIS for evidence-gap detection and LaceMem for layered memory to improve long-term conversational memory retrieval, achieving higher accuracy on temporal and multi-hop questions with lower latency.
CoAuthorAI is a human-in-the-loop system that combines retrieval-augmented generation and hierarchical outlines to enable accurate, coherent scientific book writing, achieving 98% recall and 82% human satisfaction in evaluations.
This paper introduces a retrieval-augmented LLM framework for financial sentiment analysis, achieving 15-48% improvement in accuracy and F1 score over traditional models and LLMs like ChatGPT and LLaMA.