Guiding Policies with Language via Meta-Learning

Authors: John D. Co-Reyes, Abhishek Gupta, Suvansh Sanjeev, Nick Altieri, Jacob Andreas, John DeNero, Pieter Abbeel, Sergey Levine

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments analyze GPL in a partially observed object manipulation environment and a block pushing environment.
Researcher Affiliation Academia John D. Co-Reyes Abhishek Gupta Suvansh Sanjeev Nick Altieri Jacob Andreas John De Nero Pieter Abbeel Sergey Levine University of California, Berkeley jcoreyes@eecs.berkeley.edu
Pseudocode Yes Algorithm 1: GPL meta-training algorithm.
Open Source Code No Our code and supplementary material will be available at https://sites.google.com/view/lgpl/ home
Open Datasets No Environments are generated by sampling a goal object color, goal object shape, and goal square color which are placed at random locations in different random rooms.
Dataset Splits No We train on 1700 of these environments and reserve a separate set for testing.
Hardware Specification No The paper mentions 'computational resources from Amazon and NVIDIA' in the acknowledgements, but it does not provide specific hardware details like exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No We use Adam for optimization with a learning rate of 0.001.
Experiment Setup Yes We use Adam for optimization with a learning rate of 0.001. MLP(32, 32) specifies a multilayer-perceptron with 2 layers each of size 32. CNN((4, 2x2, 1), (4, 2x2, 1)) specifies a 2 layer convolutional neural network where each layer has 4 filters, 2x2 kernels, and 1 stride. Unless otherwise stated, we use Re LU activations.