Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Guiding Policies with Language via Meta-Learning

Authors: John D. Co-Reyes, Abhishek Gupta, Suvansh Sanjeev, Nick Altieri, Jacob Andreas, John DeNero, Pieter Abbeel, Sergey Levine

ICLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments analyze GPL in a partially observed object manipulation environment and a block pushing environment.
Researcher Affiliation Academia John D. Co-Reyes Abhishek Gupta Suvansh Sanjeev Nick Altieri Jacob Andreas John De Nero Pieter Abbeel Sergey Levine University of California, Berkeley EMAIL
Pseudocode Yes Algorithm 1: GPL meta-training algorithm.
Open Source Code No Our code and supplementary material will be available at https://sites.google.com/view/lgpl/ home
Open Datasets No Environments are generated by sampling a goal object color, goal object shape, and goal square color which are placed at random locations in different random rooms.
Dataset Splits No We train on 1700 of these environments and reserve a separate set for testing.
Hardware Specification No The paper mentions 'computational resources from Amazon and NVIDIA' in the acknowledgements, but it does not provide specific hardware details like exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No We use Adam for optimization with a learning rate of 0.001.
Experiment Setup Yes We use Adam for optimization with a learning rate of 0.001. MLP(32, 32) specifies a multilayer-perceptron with 2 layers each of size 32. CNN((4, 2x2, 1), (4, 2x2, 1)) specifies a 2 layer convolutional neural network where each layer has 4 filters, 2x2 kernels, and 1 stride. Unless otherwise stated, we use Re LU activations.