Tag
CaVe-VLM-CoT is a modular reflection-based agentic-RAG framework for vision-language models that enforces evidence-grounded reasoning through a five-stage pipeline, achieving 87.1% accuracy on ScienceQA and proposing a suite of 23 metrics for evaluation.
Introduces AgentSpec, a modular specification framework for systematically composing and analyzing embodied LLM agent scaffolds, revealing that performance depends on scaffold compatibility and interaction effects rather than isolated module strength.
Palette proposes a modular framework for selectively relaxing safety refusal behaviors in LLMs for authorized professional domains, using multi-objective search and lightweight adaptation to avoid costly retraining.
GeoStack introduces a geometric framework to compose independently trained domain experts in Vision-Language Models without catastrophic forgetting, achieving constant-time inference and a 10x reduction in geometric error.