Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering
Summary
This paper presents a method for distilling answer-set programming rules from large language models to enhance neurosymbolic visual question answering, showing that only a few examples are needed to generate correct rules.
View Cached Full Text
Cached at: 06/03/26, 09:43 AM
# Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering Source: [https://arxiv.org/abs/2606.03269](https://arxiv.org/abs/2606.03269) [View PDF](https://arxiv.org/pdf/2606.03269) > Abstract:Visual Question Answering \(VQA\) is the task of answering questions about images, requiring the integration of multimodal input and reasoning\. Modular approaches that incorporate logic\-based representations into the reasoning component offer clear advantages over end\-to\-end trained systems, particularly in terms of interpretability\. However, adapting or extending these representations when task requirements change can place a significant burden on developers\. To address this challenge, we present an approach for distilling rules from Large Language Models \(LLMs\)\. Our method prompts an LLM to extend an initial VQA reasoning theory, expressed as an answer\-set program, to meet new requirements of the task\. Examples from VQA datasets guide the LLM, validate the results, and help correct erroneous rules by leveraging feedback from the ASP solver\. We demonstrate that our approach is effective across diverse VQA datasets\. Notably, only a few examples are needed to elicit correct rules from LLMs\. Our experiments suggest that rule distillation from LLMs is a promising alternative to traditional data\-driven rule learning approaches\. Under consideration in Theory and Practice of Logic Programming \(TPLP\)\. ## Submission history From: Nelson Higuera \[[view email](https://arxiv.org/show-email/82bd9d7c/2606.03269)\] **\[v1\]**Tue, 2 Jun 2026 07:35:31 UTC \(4,544 KB\)
Similar Articles
Neural Module Networks for Visual Question Answering
This article explains the Neural Module Networks (NMN) architecture from the paper 'Deep Compositional Question Answering with Neural Module Networks,' detailing how it handles the compositional structure of visual question answering tasks by decomposing questions into modular steps.
When No Answer Is Correct: Diagnosing Absent Answer Detection for MLLMs in Video Understanding
This paper studies the ability of multimodal large language models (MLLMs) to detect when the correct answer is absent in video understanding tasks, finding that models systematically fail by selecting plausible distractors instead of recognizing no valid option exists. The failure worsens in temporal reasoning and dense frame sampling, and chain-of-thought prompting only partially mitigates the issue.
Visual Graph Scaffolds for Structural Reasoning in Large Language Models
This paper explores using visual graph mind maps as reasoning scaffolds for LLMs, finding that visual guidance remains effective even without direct answer hints, while textual flattening of graphs loses benefits.
@neural_avb: https://x.com/neural_avb/status/2063907440509571354
Explores a common failure mode in recursive language models (RLMs) where free-text subagent responses cause issues, and presents a solution using structured outputs to improve reliability, illustrated with a long-context question-answering example from NarrativeQA.
Learning to reason with LLMs
OpenAI publishes an article exploring reasoning techniques with LLMs through cipher-decoding examples, demonstrating step-by-step problem-solving approaches and pattern recognition in language models.