V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices

Authors: Damien Teney, Peng Wang, Jiewei Cao, Lingqiao Liu, Chunhua Shen, Anton van den Hengel12071-12078

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate a range of deep learning architectures, and find that existing models, including those popular for vision-and-language tasks, are unable to solve seemingly-simple instances. Models using relational networks fare better but leave substantial room for improvement.
Researcher Affiliation Academia 1Australian Institute for Machine Learning The University of Adelaide Adelaide, Australia 2University of Wollongong Wollongong, Australia
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes models like MLP, GRU, VQA-like architecture, and Relation Networks using mathematical formulas and text, but not in pseudocode format.
Open Source Code Yes The dataset will be publicly released to encourage the development of models with improved capabilities for abstract reasoning over visual data.
Open Datasets Yes The annotations of object counts are extracted from numbers 1 10 appearing in natural language descriptions (e.g. five bowls of oatmeal ), manually excluding those unrelated to counts (e.g. five o clock or a 10 years old boy ).
Dataset Splits Yes We held out 8,000 instances from the training set to serve as a validation set, to select the hyperparameters and to monitor for convergence and early-stopping.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments were provided.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) were mentioned in the paper.
Experiment Setup Yes Suitable hyperparameters for each model were coarsely selected by grid search (details in supplementary material). We held out 8,000 instances from the training set to serve as a validation set, to select the hyperparameters and to monitor for convergence and early-stopping. Unless noted, the nonlinear transformations within the networks below refer to a linear layer followed by a Re LU. ... trained with a softmax cross-entropy loss over ˆs, standard backpropagation and SGD, using Ada Delta as the optimizer.