V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices
Authors: Damien Teney, Peng Wang, Jiewei Cao, Lingqiao Liu, Chunhua Shen, Anton van den Hengel12071-12078
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate a range of deep learning architectures, and find that existing models, including those popular for vision-and-language tasks, are unable to solve seemingly-simple instances. Models using relational networks fare better but leave substantial room for improvement. |
| Researcher Affiliation | Academia | 1Australian Institute for Machine Learning The University of Adelaide Adelaide, Australia 2University of Wollongong Wollongong, Australia |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes models like MLP, GRU, VQA-like architecture, and Relation Networks using mathematical formulas and text, but not in pseudocode format. |
| Open Source Code | Yes | The dataset will be publicly released to encourage the development of models with improved capabilities for abstract reasoning over visual data. |
| Open Datasets | Yes | The annotations of object counts are extracted from numbers 1 10 appearing in natural language descriptions (e.g. five bowls of oatmeal ), manually excluding those unrelated to counts (e.g. five o clock or a 10 years old boy ). |
| Dataset Splits | Yes | We held out 8,000 instances from the training set to serve as a validation set, to select the hyperparameters and to monitor for convergence and early-stopping. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments were provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) were mentioned in the paper. |
| Experiment Setup | Yes | Suitable hyperparameters for each model were coarsely selected by grid search (details in supplementary material). We held out 8,000 instances from the training set to serve as a validation set, to select the hyperparameters and to monitor for convergence and early-stopping. Unless noted, the nonlinear transformations within the networks below refer to a linear layer followed by a Re LU. ... trained with a softmax cross-entropy loss over ˆs, standard backpropagation and SGD, using Ada Delta as the optimizer. |