Automatically Composing Representation Transformations as a Means for Generalization

Authors: Michael Chang, Abhishek Gupta, Sergey Levine, Thomas L. Griffiths

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show on a symbolic and a high-dimensional domain that our compositional approach can generalize to more complex problems than the learner has previously encountered, whereas baselines that are not explicitly compositional do not.
Researcher Affiliation Academia Michael B. Chang Electrical Engineering and Computer Science University of California, Berkeley, USA mbchang@berkeley.edu Abhishek Gupta Electrical Engineering and Computer Science University of California, Berkeley, USA abhigupta@berkeley.edu Sergey Levine Electrical Engineering and Computer Science University of California, Berkeley svlevine@eecs.berkeley.edu Thomas L. Griffiths Psychology and Cognitive Science Princeton University, USA tomg@princeton.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes https://github.com/mbchang/crl
Open Datasets Yes recognizing spatially transformed MNIST digits (Le Cun et al., 1998)
Dataset Splits Yes We randomly choose 16 of these 20 for training, 2 for validation, 2 for test, as shown in Figure 4 (center).
Hardware Specification No The paper mentions 'computing support from Amazon, NVIDIA, and Google' but does not specify exact hardware models (e.g., GPU, CPU models, or specific cloud instances with their specs).
Software Dependencies No All learners are implemented in Py Torch (Paszke et al., 2017) but no specific version number for PyTorch or other software dependencies is provided.
Experiment Setup Yes The loss is backpropagated through the modules, which are trained with Adam (Kingma & Ba, 2014). The controller receives a sparse reward derived from the loss at the end of the computation, and a small cost for each computational step. The model is trained with proximal policy optimization (Schulman et al., 2017). We found via a grid search k = 1024 and k = 256. Through an informal search whose heuristic was performance on the training set, we settled on updating the curriculum of CRL every 10^5 episodes and updating the curriculum of the RNN every 5 × 10^4 episodes. The step penalty was found by a scale search over {−1, 0.1, 0.01, 0.001} and 0.01 was a penalty that we found balanced accuracy and computation time to a reasonable degree during training.