reproducibilityindex.ai

Learning Generalizable Visual Representations via Interactive Gameplay

Authors: Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our first set of experiments show that our agents develop low-level visual understanding of individual images measured by their capacity to perform a collection of standard tasks from the computer vision literature, these tasks include pixel-to-pixel depth (Saxena et al., 2006) and surface normal (Fouhey et al., 2013) prediction, from a single image. Our experiments are designed to address three questions: (1) has our cache agent learned to proﬁciently hide and seek objects? , (2) how do the SIRs learned by playing cache compare to those learned using standard supervised approaches and when training using other interactive tasks? , and (3) has the cache agent learned to integrate observations over time to produce general DIR representations? .
Researcher Affiliation	Collaboration	1Allen Institute for Artiﬁcial Intelligence 2University of Washington
Pseudocode	Yes	Algorithm 1: Exploration and mapping episode reward structure; Algorithm 2: Object hiding episode reward structure; Algorithm 3: Object manipulation episode reward structure; Algorithm 4: Seeking episode reward structure
Open Source Code	No	No concrete access to source code was provided. The paper states, 'We direct anyone interested in exact reproduction of our VDR procedure to our code base.' and 'To fully reproduce our results please see our code base.', but no specific link or repository name is given for public access.
Open Datasets	Yes	For this we leverage AI2-THOR (Kolve et al., 2017), a near photo-realistic, interactive, simulated, 3D environment of indoor living spaces, see Fig. 1a. We compare against a fully supervised model trained on Image Net (Deng et al., 2009). for SUN scene classif. (Xiao et al., 2010) as well as the NYU V2 depth (Nathan Silberman & Fergus, 2012) and walkability tasks (Mottaghi et al., 2016)
Dataset Splits	Yes	Excluding foyers, which are reserved for our dynamic image representation experiments and used nowhere else, we consider the ﬁrst 20 scenes of each scene type to be train scenes, the next ﬁve of each type to be validation scenes, and the last ﬁve of each type to be test scenes.
Hardware Specification	No	No specific hardware models (e.g., GPU/CPU models) were mentioned. The paper only states: 'We train our cache agent using eight GPUs with one GPU reserved for running AI2-THOR processes, one reserved for VDR, and the other six dedicated to training with reinforcement and self-supervised learning.'
Software Dependencies	Yes	We use the ADAM optimizer (Kingma & Ba, 2015) with AMSGrad (Reddi et al., 2018), moving average parameters β1 = 0.99, β2 = 0.999, a learning rate of 10−3 for VDR, and varying learning rates for the different cache stages (10−4 for the E&M, OH, OM, and S stages, 5 × 10−4 for the PS-stage). We run our analysis using the R (R Core Team, 2019) programming language, in particular we use the glmm PQL function in the nlme (Pinheiro et al., 2019) package to ﬁt our GLMM models and then the emmeans (Lenth, 2019) package to obtain p-values of contrasts between ﬁxed effects.
Experiment Setup	Yes	For the A3C loss we let the discounting parameter γ = 0.99 (except for in the OH-stage where γ = 0.8), the entropy weight β = 0.01, and GAE parameter τ = 1. We use the ADAM optimizer (Kingma & Ba, 2015) with AMSGrad (Reddi et al., 2018), moving average parameters β1 = 0.99, β2 = 0.999, a learning rate of 10−3 for VDR, and varying learning rates for the different cache stages (10−4 for the E&M, OH, OM, and S stages, 5 × 10−4 for the PS-stage).