Guiding Policies with Language via Meta-Learning
Authors: John D. Co-Reyes, Abhishek Gupta, Suvansh Sanjeev, Nick Altieri, Jacob Andreas, John DeNero, Pieter Abbeel, Sergey Levine
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments analyze GPL in a partially observed object manipulation environment and a block pushing environment. |
| Researcher Affiliation | Academia | John D. Co-Reyes Abhishek Gupta Suvansh Sanjeev Nick Altieri Jacob Andreas John De Nero Pieter Abbeel Sergey Levine University of California, Berkeley jcoreyes@eecs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1: GPL meta-training algorithm. |
| Open Source Code | No | Our code and supplementary material will be available at https://sites.google.com/view/lgpl/ home |
| Open Datasets | No | Environments are generated by sampling a goal object color, goal object shape, and goal square color which are placed at random locations in different random rooms. |
| Dataset Splits | No | We train on 1700 of these environments and reserve a separate set for testing. |
| Hardware Specification | No | The paper mentions 'computational resources from Amazon and NVIDIA' in the acknowledgements, but it does not provide specific hardware details like exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | We use Adam for optimization with a learning rate of 0.001. |
| Experiment Setup | Yes | We use Adam for optimization with a learning rate of 0.001. MLP(32, 32) specifies a multilayer-perceptron with 2 layers each of size 32. CNN((4, 2x2, 1), (4, 2x2, 1)) specifies a 2 layer convolutional neural network where each layer has 4 filters, 2x2 kernels, and 1 stride. Unless otherwise stated, we use Re LU activations. |