One Self-Configurable Model to Solve Many Abstract Visual Reasoning Problems
Authors: Mikołaj Małkiński, Jacek Mańdziuk
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments conducted on Raven s Progressive Matrices, Visual Analogy Problems, and Odd One Out problems show that SCAR (SAL-based models, in general) effectively solves diverse AVR tasks, and its performance is on par with the state-of-the-art task-specific baselines. The experimental evaluation of SCAR focuses on the following three settings: (1) single-task learning (STL), (2) multi-task learning (MTL), and (3) transfer learning (TL). |
| Researcher Affiliation | Academia | Mikołaj Małki nski1, Jacek Ma ndziuk1,2 1Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland 2Faculty of Computer Science, AGH University of Krakow, Krakow, Poland |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It provides architectural diagrams (Figure 2, Figure 3) and mathematical equations, but no procedural pseudocode. |
| Open Source Code | Yes | The code required to reproduce the experiments is available online1. Extended results are provided in Appendix A (Małki nski and Ma ndziuk 2023a). 1www.github.com/mikomel/sal |
| Open Datasets | Yes | The experiments are conducted on three challenging AVR problems. Firstly, we consider RPMs belonging to three datasets: G-set (Ma ndziuk and Zychowski 2019) with 1 100 matrices; PGM-S, a subset of the Neutral regime of PGM (Barrett et al. 2018)... I-RAVEN (Zhang et al. 2019a; Hu et al. 2021) with 70K samples. Secondly, we consider the VAP dataset (Hill et al. 2019) to construct the VAP-S dataset... Thirdly, the O3 tests (Ma ndziuk and Zychowski 2019) with 1 000 instances are utilized. |
| Dataset Splits | Yes | In G-set, I-RAVEN, and O3 datasets we uniformly allocate 60% / 20% / 20% samples to train / val / test splits, resp. PGM-S contains 42K / 20K / 200K matrices and VAP-S has 42K / 10K / 100K instances in train / val / test splits, resp. |
| Hardware Specification | Yes | Each training job was run on a node with a single NVIDIA DGX A100 GPU. |
| Software Dependencies | No | The paper mentions using the Adam optimizer but does not provide specific version numbers for any software components, libraries, or frameworks. For example: "We use the Adam (Kingma and Ba 2014) optimizer with β1 = 0.9, β2 = 0.999 and ϵ = 10 8" |
| Experiment Setup | Yes | We use batches of size 32 for Gset and O3, and 128 for the remaining datasets. Early stopping is performed after the model s performance stops improving on a validation set for 17 successive epochs. We use the Adam (Kingma and Ba 2014) optimizer with β1 = 0.9, β2 = 0.999 and ϵ = 10 8, the learning rate is initialized to 0.001 and reduced 10-fold if the validation loss doesn t improve for 5 subsequent epochs. Data augmentation is employed with 50% probability of being applied to a particular sample. When applied, a pipeline of transformations (vertical flip, horizontal flip, rotation by 90 degrees, rotation, transposition) is constructed, each with 25% probability of being adopted. The resultant pipeline is applied to each image in the matrix in the same way. |