reproducibilityindex.ai

V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices

Authors: Damien Teney, Peng Wang, Jiewei Cao, Lingqiao Liu, Chunhua Shen, Anton van den Hengel12071-12078

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate a range of deep learning architectures, and ﬁnd that existing models, including those popular for vision-and-language tasks, are unable to solve seemingly-simple instances. Models using relational networks fare better but leave substantial room for improvement.
Researcher Affiliation	Academia	1Australian Institute for Machine Learning The University of Adelaide Adelaide, Australia 2University of Wollongong Wollongong, Australia
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes models like MLP, GRU, VQA-like architecture, and Relation Networks using mathematical formulas and text, but not in pseudocode format.
Open Source Code	Yes	The dataset will be publicly released to encourage the development of models with improved capabilities for abstract reasoning over visual data.
Open Datasets	Yes	The annotations of object counts are extracted from numbers 1 10 appearing in natural language descriptions (e.g. ﬁve bowls of oatmeal ), manually excluding those unrelated to counts (e.g. ﬁve o clock or a 10 years old boy ).
Dataset Splits	Yes	We held out 8,000 instances from the training set to serve as a validation set, to select the hyperparameters and to monitor for convergence and early-stopping.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments were provided.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) were mentioned in the paper.
Experiment Setup	Yes	Suitable hyperparameters for each model were coarsely selected by grid search (details in supplementary material). We held out 8,000 instances from the training set to serve as a validation set, to select the hyperparameters and to monitor for convergence and early-stopping. Unless noted, the nonlinear transformations within the networks below refer to a linear layer followed by a Re LU. ... trained with a softmax cross-entropy loss over ˆs, standard backpropagation and SGD, using Ada Delta as the optimizer.