Position: Foundation Agents as the Paradigm Shift for Decision Making

Authors: Xiaoqian Liu, Xingzhou Lou, Jianbin Jiao, Junge Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also conduct a case study in robotic locomotion and game play by jointly pretraining on 5 tasks in the Deep Mind Control (DMC) suite (Tassa et al., 2018) and 5 games in Atari video play (Bellemare et al., 2013) through autoregressive modeling. Visualization of the fine-tuning performance on a new locomotion task and a new game is shown in Figure 2. Details of our case study can be found in Appendix C.
Researcher Affiliation Academia 1School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing, China 2Institute of Automation, Chinese Academy of Sciences, Beijing, China 3School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 4School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China.
Pseudocode No The paper does not include any explicit pseudocode blocks or algorithm listings. Methods are described in prose and supported by figures and tables.
Open Source Code No The paper states, "Our code is based on https://github.com/microsoft/smart." This indicates that the authors built upon existing open-source code, but they do not explicitly state that the code specific to *their* methodology or experiments is released or available.
Open Datasets Yes We also conduct a case study in robotic locomotion and game play by jointly pretraining on 5 tasks in the Deep Mind Control (DMC) suite (Tassa et al., 2018) and 5 games in Atari video play (Bellemare et al., 2013) through autoregressive modeling.
Dataset Splits No The paper states, "For fine-tuning, we randomly sample 10% trajectories from the full replay buffer of the SAC agent with diverse return distribution." While this describes a data split for fine-tuning, it does not explicitly define or refer to a separate "validation" dataset split for hyperparameter tuning or model selection, distinct from training and testing.
Hardware Specification Yes Our case 25M Pretraining 8h on 4 RTX4090
Software Dependencies No The paper mentions software components like "GPT-2," "AdamW optimizer," and refers to a GitHub repository base "https://github.com/microsoft/smart" but does not specify exact version numbers for these software dependencies (e.g., PyTorch version, specific library versions).
Experiment Setup Yes We use the Adam W optimizer (Loshchilov & Hutter, 2017) with parameters β1 = 0.9, β2 = 0.95, and ϵ = 1e 8 for 1M steps. For pretraining, the learning rate is originally set to be 1e-5 with linear warm-up and cosine schedule decay. We use batch size of 512 and weight decay of 0.1 for pretraining. For fine-tuning, the learning rate keeps constant of 1e-4 with batch size 256 and dropout rate 0.1. During fine-tuning, we model and predict returns and rewards in addition to actions using the learning objective log Pϕ(Rt, at, rt|τ0:t 1, ot), and perform expert action inference as introduced in (Lee et al., 2022). We use context length L = 50 for both pretraining and fine-tuning.