reproducibilityindex.ai

Automatically Composing Representation Transformations as a Means for Generalization

Authors: Michael Chang, Abhishek Gupta, Sergey Levine, Thomas L. Griffiths

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show on a symbolic and a high-dimensional domain that our compositional approach can generalize to more complex problems than the learner has previously encountered, whereas baselines that are not explicitly compositional do not.
Researcher Affiliation	Academia	Michael B. Chang Electrical Engineering and Computer Science University of California, Berkeley, USA mbchang@berkeley.edu Abhishek Gupta Electrical Engineering and Computer Science University of California, Berkeley, USA abhigupta@berkeley.edu Sergey Levine Electrical Engineering and Computer Science University of California, Berkeley svlevine@eecs.berkeley.edu Thomas L. Grifﬁths Psychology and Cognitive Science Princeton University, USA tomg@princeton.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	https://github.com/mbchang/crl
Open Datasets	Yes	recognizing spatially transformed MNIST digits (Le Cun et al., 1998)
Dataset Splits	Yes	We randomly choose 16 of these 20 for training, 2 for validation, 2 for test, as shown in Figure 4 (center).
Hardware Specification	No	The paper mentions 'computing support from Amazon, NVIDIA, and Google' but does not specify exact hardware models (e.g., GPU, CPU models, or specific cloud instances with their specs).
Software Dependencies	No	All learners are implemented in Py Torch (Paszke et al., 2017) but no specific version number for PyTorch or other software dependencies is provided.
Experiment Setup	Yes	The loss is backpropagated through the modules, which are trained with Adam (Kingma & Ba, 2014). The controller receives a sparse reward derived from the loss at the end of the computation, and a small cost for each computational step. The model is trained with proximal policy optimization (Schulman et al., 2017). We found via a grid search k = 1024 and k = 256. Through an informal search whose heuristic was performance on the training set, we settled on updating the curriculum of CRL every 10^5 episodes and updating the curriculum of the RNN every 5 × 10^4 episodes. The step penalty was found by a scale search over {−1, 0.1, 0.01, 0.001} and 0.01 was a penalty that we found balanced accuracy and computation time to a reasonable degree during training.