lvlm

#lvlm

Implicit vs. Explicit Prompting Strategies for LVLMs in Referential Communication

arXiv cs.CL ↗ · 2026-06-17 Cached

This paper investigates seemingly contradictory findings on whether large vision-language models (LVLMs) can coordinate efficient referring expressions. The authors show that models can achieve efficiency when explicitly prompted, but fail to infer the need for efficiency from implicit prompts, revealing key differences between human and AI communication.

0 favorites 0 likes

#lvlm

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

Hugging Face Daily Papers ↗ · 2026-04-16 Cached

UniDoc-RL presents a reinforcement learning framework for Large Vision-Language Models that optimizes retrieval, reranking, and visual reasoning through hierarchical decision-making and dense multi-reward supervision, achieving up to 17.7% improvements over prior RL-based methods on visual RAG tasks.

0 favorites 0 likes

lvlm

Implicit vs. Explicit Prompting Strategies for LVLMs in Referential Communication

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

Submit Feedback