Tag
A detailed technical query about building a local document RAG system covering storage, ingestion, query, and highlighting, seeking advice on vector databases, GraphRAG feasibility, and document highlighting implementations.
Unveil introduces a unified visual-textual embedding framework for multi-modal document retrieval, using knowledge distillation to transfer semantic understanding from a visual-textual model to a purely visual model, achieving robust and efficient retrieval.
UniDoc-RL presents a reinforcement learning framework for Large Vision-Language Models that optimizes retrieval, reranking, and visual reasoning through hierarchical decision-making and dense multi-reward supervision, achieving up to 17.7% improvements over prior RL-based methods on visual RAG tasks.