A Logic-based Approach to Contrastive Explainability for Neurosymbolic Visual Question Answering

Authors: Thomas Eiter, Tobias Geibinger, Nelson Higuera, Johannes Oetsch

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our approach on the CLEVR dataset, which we extend by more sophisticated questions to further demonstrate the robustness of the modular architecture. While we achieve top performance compared to related approaches, we can also produce CEs for explanation, model debugging, and validation tasks, showing the versatility of the declarative approach to reasoning. and 4 Evaluation Prior to an evaluation of our CE approach, we test the accuracy of NSVQASP on CLEVR and compare it against NS-VQA and other baseline approaches.
Researcher Affiliation Academia Thomas Eiter , Tobias Geibinger , Nelson Higuera and Johannes Oetsch Institute for Logic and Computation, TU Wien, Favoritenstraße 9 11, 1040 Vienna, Austria, {thomas.eiter, tobias.geibinger, nelson.ruiz, johannes.oetsch}@tuwien.ac.at
Pseudocode No The paper describes the logic and ASP rules but does not contain a dedicated section, figure, or block explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code and data are available from https://github.com/ pudumagico/nsvqasp.
Open Datasets Yes We validate our approach on the CLEVR dataset, which we extend by several more sophisticated questions to further demonstrate the robustness of the modular architecture of NSVQASP. In particular, we add 20 new question templates for different versions of the new spatial relation between , equality of objects, and counting. While we achieve top performance compared to related neural and neurosymbolic approaches, we can moreover produce CEs. We show this for model explanation, debugging, and validation tasks, demonstrating the versatility of the declarative approach to reasoning within modular neurosymbolic VQA architectures. Code and data are available from https://github.com/ pudumagico/nsvqasp.
Dataset Splits Yes The CLEVR dataset consists of 70k images plus 700k questions for training and 15k images plus 150k questions for validation. Questions are generated from templates which define the structure of a question. We extend the CLEVR dataset by introducing 20 new templates that include a new spatial relation between , questions regarding equality of objects, and new counting questions, respectively; consequently, they can be divided into three groups. We generated 200k new questions from the templates for training for each group and 150k questions for validation.
Hardware Specification Yes We use an Intel Core i7-12700K, 32GB RAM, and an NVIDIA Ge Force RTX 3080 Ti for training.
Software Dependencies Yes We use clingo (v. 5.6.2 ) [Gebser et al., 2019] with unsatisfiable core-guided optimisation [Andres et al., 2012].
Experiment Setup No The paper mentions using YOLOv5 and LSTM, and that 'YOLOv5 was trained with the CLEVR mini dataset', but it does not specify concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations for these models.