Guiding Pretraining in Reinforcement Learning with Large Language Models
Authors: Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, Jacob Andreas
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate ELLM in the Crafter game environment and the Housekeep robotic simulator, showing that ELLM-trained agents have better coverage of common-sense behaviors during pretraining and usually match or improve performance on a range of downstream tasks.4. Experiments Our experiments test the following hypotheses: |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Computer Science, University of California, Berkeley, USA 2University of Washington, Seattle 3Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory 4Inria, Flowers Laboratory. |
| Pseudocode | Yes | Algorithm 1 ELLM Algorithm |
| Open Source Code | No | All code will be released soon, licensed under the MIT license (with Crafter, Housekeep licensed under their respective licenses). |
| Open Datasets | Yes | We evaluate ELLM in two complex environments: (1) Crafter, an open-ended environment in which exploration is required to discover long-term survival strategies (Hafner, 2021), and (2) Housekeep, an embodied robotics environment that requires common-sense to restrict the exploration of possible rearrangements of household objects (Kant et al., 2022). |
| Dataset Splits | No | No explicit mention of a validation dataset split or a methodology for creating one was found. Hyperparameter tuning was mentioned without specifying a validation set: 'In the Crafter environment, we swept over the following hyperparameters for the Oracle and Scratch (no-pretraining) conditions: learning rate, exploration decay schedule, and network update frequency.' (Section H). |
| Hardware Specification | Yes | We use NVIDIA TITAN Xps and NVIDIA Ge Force RTX 2080 Tis, with 2-3 seeds per GPU and running at roughtly 100ksteps/hour. |
| Software Dependencies | No | The paper mentions several models and algorithms used (e.g., DQN, Sentence BERT, GPT-2, CLIP ViT-B-32, Codex), but does not provide specific version numbers for underlying software libraries or dependencies (e.g., 'Python 3.x', 'PyTorch 1.x'). |
| Experiment Setup | Yes | Table 6: DQN Hyperparameters lists: "Frame Stack 4", "γ .99", "Seed Frames 5000", "n-step 3", "batch size 64", "lr 6.25e-5", "target update τ 1.0", "ϵ-min 0.01", "update frequency 4" (Section H). |