One Self-Configurable Model to Solve Many Abstract Visual Reasoning Problems

Authors: Mikołaj Małkiński, Jacek Mańdziuk

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments conducted on Raven s Progressive Matrices, Visual Analogy Problems, and Odd One Out problems show that SCAR (SAL-based models, in general) effectively solves diverse AVR tasks, and its performance is on par with the state-of-the-art task-specific baselines. The experimental evaluation of SCAR focuses on the following three settings: (1) single-task learning (STL), (2) multi-task learning (MTL), and (3) transfer learning (TL).
Researcher Affiliation Academia Mikołaj Małki nski1, Jacek Ma ndziuk1,2 1Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland 2Faculty of Computer Science, AGH University of Krakow, Krakow, Poland
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It provides architectural diagrams (Figure 2, Figure 3) and mathematical equations, but no procedural pseudocode.
Open Source Code Yes The code required to reproduce the experiments is available online1. Extended results are provided in Appendix A (Małki nski and Ma ndziuk 2023a). 1www.github.com/mikomel/sal
Open Datasets Yes The experiments are conducted on three challenging AVR problems. Firstly, we consider RPMs belonging to three datasets: G-set (Ma ndziuk and Zychowski 2019) with 1 100 matrices; PGM-S, a subset of the Neutral regime of PGM (Barrett et al. 2018)... I-RAVEN (Zhang et al. 2019a; Hu et al. 2021) with 70K samples. Secondly, we consider the VAP dataset (Hill et al. 2019) to construct the VAP-S dataset... Thirdly, the O3 tests (Ma ndziuk and Zychowski 2019) with 1 000 instances are utilized.
Dataset Splits Yes In G-set, I-RAVEN, and O3 datasets we uniformly allocate 60% / 20% / 20% samples to train / val / test splits, resp. PGM-S contains 42K / 20K / 200K matrices and VAP-S has 42K / 10K / 100K instances in train / val / test splits, resp.
Hardware Specification Yes Each training job was run on a node with a single NVIDIA DGX A100 GPU.
Software Dependencies No The paper mentions using the Adam optimizer but does not provide specific version numbers for any software components, libraries, or frameworks. For example: "We use the Adam (Kingma and Ba 2014) optimizer with β1 = 0.9, β2 = 0.999 and ϵ = 10 8"
Experiment Setup Yes We use batches of size 32 for Gset and O3, and 128 for the remaining datasets. Early stopping is performed after the model s performance stops improving on a validation set for 17 successive epochs. We use the Adam (Kingma and Ba 2014) optimizer with β1 = 0.9, β2 = 0.999 and ϵ = 10 8, the learning rate is initialized to 0.001 and reduced 10-fold if the validation loss doesn t improve for 5 subsequent epochs. Data augmentation is employed with 50% probability of being applied to a particular sample. When applied, a pipeline of transformations (vertical flip, horizontal flip, rotation by 90 degrees, rotation, transposition) is constructed, each with 25% probability of being adopted. The resultant pipeline is applied to each image in the matrix in the same way.