reproducibilityindex.ai

Acquiring Common Sense Spatial Knowledge Through Implicit Spatial Templates

Authors: Guillem Collell, Luc Van Gool, Marie-Francine Moens

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present two simple neural-based models that leverage annotated images and structured text to learn this task. The good performance of these models reveals that spatial locations are to a large extent predictable from implicit spatial language. Crucially, the models attain similar performance in a challenging generalized setting, where the object-relation-object combinations (e.g., man walking dog ) have never been seen before. Next, we go one step further by presenting the models with unseen objects (e.g., dog ). In this scenario, we show that leveraging word embeddings enables the models to output accurate spatial predictions, proving that the models acquire solid common sense spatial knowledge allowing for such generalization.
Researcher Affiliation	Academia	Guillem Collell Department of Computer Science KU Leuven gcollell@kuleuven.be Luc Van Gool Computer Vision Laboratory ETH Zurich vangool@vision.ee.ethz.ch Marie-Francine Moens Department of Computer Science KU Leuven sien.moens@cs.kuleuven.be
Pseudocode	No	The paper describes algorithms but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	This evaluation set along with our Supplementary material are available at https://github.com/gcollell/spatial-commonsense.
Open Datasets	Yes	We use the Visual Genome dataset (Krishna et al. 2017) as our source of annotated images. The Visual Genome consists of 108K images containing 1.5M human-annotated (Subject, Relationship, Object) instances with bounding boxes for Subject and Object (Fig. 2).
Dataset Splits	Yes	We employ a 10-fold cross-validation (CV) setting. Data are randomly split into 10 disjoint parts and 10% is employed for testing and 90% for training, repeating this for each of the 10 folds. Reported results are averages over the 10 folds.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU model, CPU model, memory) used for running its experiments.
Software Dependencies	Yes	Our experiments are implemented in Python 2.7 and we use Keras deep learning framework for our models (Chollet and others 2015).
Experiment Setup	Yes	Model hyperparameters are ﬁrst selected in a 10fold cross-validation setting and we report (averaged) results on 10 new splits. Models are trained for 10 epochs on batches of size 64 with the RMSprop optimizer using a learning rate of 0.0001 and 2 hidden layers with 100 Re Lu units.