Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering

arXiv cs.AI Papers

Summary

This paper presents a method for distilling answer-set programming rules from large language models to enhance neurosymbolic visual question answering, showing that only a few examples are needed to generate correct rules.

arXiv:2606.03269v1 Announce Type: new Abstract: Visual Question Answering (VQA) is the task of answering questions about images, requiring the integration of multimodal input and reasoning. Modular approaches that incorporate logic-based representations into the reasoning component offer clear advantages over end-to-end trained systems, particularly in terms of interpretability. However, adapting or extending these representations when task requirements change can place a significant burden on developers. To address this challenge, we present an approach for distilling rules from Large Language Models (LLMs). Our method prompts an LLM to extend an initial VQA reasoning theory, expressed as an answer-set program, to meet new requirements of the task. Examples from VQA datasets guide the LLM, validate the results, and help correct erroneous rules by leveraging feedback from the ASP solver. We demonstrate that our approach is effective across diverse VQA datasets. Notably, only a few examples are needed to elicit correct rules from LLMs. Our experiments suggest that rule distillation from LLMs is a promising alternative to traditional data-driven rule learning approaches. Under consideration in Theory and Practice of Logic Programming (TPLP).
Original Article
View Cached Full Text

Cached at: 06/03/26, 09:43 AM

# Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering
Source: [https://arxiv.org/abs/2606.03269](https://arxiv.org/abs/2606.03269)
[View PDF](https://arxiv.org/pdf/2606.03269)

> Abstract:Visual Question Answering \(VQA\) is the task of answering questions about images, requiring the integration of multimodal input and reasoning\. Modular approaches that incorporate logic\-based representations into the reasoning component offer clear advantages over end\-to\-end trained systems, particularly in terms of interpretability\. However, adapting or extending these representations when task requirements change can place a significant burden on developers\. To address this challenge, we present an approach for distilling rules from Large Language Models \(LLMs\)\. Our method prompts an LLM to extend an initial VQA reasoning theory, expressed as an answer\-set program, to meet new requirements of the task\. Examples from VQA datasets guide the LLM, validate the results, and help correct erroneous rules by leveraging feedback from the ASP solver\. We demonstrate that our approach is effective across diverse VQA datasets\. Notably, only a few examples are needed to elicit correct rules from LLMs\. Our experiments suggest that rule distillation from LLMs is a promising alternative to traditional data\-driven rule learning approaches\. Under consideration in Theory and Practice of Logic Programming \(TPLP\)\.

## Submission history

From: Nelson Higuera \[[view email](https://arxiv.org/show-email/82bd9d7c/2606.03269)\] **\[v1\]**Tue, 2 Jun 2026 07:35:31 UTC \(4,544 KB\)

Similar Articles

Neural Module Networks for Visual Question Answering

ML at Berkeley

This article explains the Neural Module Networks (NMN) architecture from the paper 'Deep Compositional Question Answering with Neural Module Networks,' detailing how it handles the compositional structure of visual question answering tasks by decomposing questions into modular steps.

When No Answer Is Correct: Diagnosing Absent Answer Detection for MLLMs in Video Understanding

arXiv cs.AI

This paper studies the ability of multimodal large language models (MLLMs) to detect when the correct answer is absent in video understanding tasks, finding that models systematically fail by selecting plausible distractors instead of recognizing no valid option exists. The failure worsens in temporal reasoning and dense frame sampling, and chain-of-thought prompting only partially mitigates the issue.

@neural_avb: https://x.com/neural_avb/status/2063907440509571354

X AI KOLs Timeline

Explores a common failure mode in recursive language models (RLMs) where free-text subagent responses cause issues, and presents a solution using structured outputs to improve reliability, illustrated with a long-context question-answering example from NarrativeQA.

Learning to reason with LLMs

OpenAI Blog

OpenAI publishes an article exploring reasoning techniques with LLMs through cipher-decoding examples, demonstrating step-by-step problem-solving approaches and pattern recognition in language models.