Automatically Composing Representation Transformations as a Means for Generalization
Authors: Michael Chang, Abhishek Gupta, Sergey Levine, Thomas L. Griffiths
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show on a symbolic and a high-dimensional domain that our compositional approach can generalize to more complex problems than the learner has previously encountered, whereas baselines that are not explicitly compositional do not. |
| Researcher Affiliation | Academia | Michael B. Chang Electrical Engineering and Computer Science University of California, Berkeley, USA mbchang@berkeley.edu Abhishek Gupta Electrical Engineering and Computer Science University of California, Berkeley, USA abhigupta@berkeley.edu Sergey Levine Electrical Engineering and Computer Science University of California, Berkeley svlevine@eecs.berkeley.edu Thomas L. Griffiths Psychology and Cognitive Science Princeton University, USA tomg@princeton.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/mbchang/crl |
| Open Datasets | Yes | recognizing spatially transformed MNIST digits (Le Cun et al., 1998) |
| Dataset Splits | Yes | We randomly choose 16 of these 20 for training, 2 for validation, 2 for test, as shown in Figure 4 (center). |
| Hardware Specification | No | The paper mentions 'computing support from Amazon, NVIDIA, and Google' but does not specify exact hardware models (e.g., GPU, CPU models, or specific cloud instances with their specs). |
| Software Dependencies | No | All learners are implemented in Py Torch (Paszke et al., 2017) but no specific version number for PyTorch or other software dependencies is provided. |
| Experiment Setup | Yes | The loss is backpropagated through the modules, which are trained with Adam (Kingma & Ba, 2014). The controller receives a sparse reward derived from the loss at the end of the computation, and a small cost for each computational step. The model is trained with proximal policy optimization (Schulman et al., 2017). We found via a grid search k = 1024 and k = 256. Through an informal search whose heuristic was performance on the training set, we settled on updating the curriculum of CRL every 10^5 episodes and updating the curriculum of the RNN every 5 × 10^4 episodes. The step penalty was found by a scale search over {−1, 0.1, 0.01, 0.001} and 0.01 was a penalty that we found balanced accuracy and computation time to a reasonable degree during training. |