Tag
This paper introduces Agentick, a unified benchmark for evaluating general sequential decision-making agents across RL, LLM, and VLM paradigms. It provides 37 procedurally generated tasks and reveals that no single approach currently dominates, highlighting significant room for improvement in agent autonomy.
This paper introduces PRISM, a framework that integrates Vision-Language Models and Large Language Models through a dynamic question-answering pipeline to improve sequential decision-making in embodied AI tasks.