Generalization to New Sequential Decision Making Tasks with In-Context Learning

Authors: Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we use an illustrative example to show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks. We then demonstrate how training on sequences of trajectories with certain distributional properties leads to in-context learning of new sequential decision making tasks. We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environment stochasticity, and trajectory burstiness, all result in better in-context learning of new outof-distribution tasks. By training on large diverse offline datasets, our model is able to learn new Mini Hack and Procgen tasks without any weight updates from just a handful of demonstrations.
Researcher Affiliation Collaboration *Equal contribution 1AI at Meta 2UCL. Correspondence to: Sharath Chandra Raparthy <sharathraparthy@gmail.com>.
Pseudocode No No pseudocode or algorithm blocks are present in the paper.
Open Source Code No The paper does not explicitly state that the source code for their methodology is released, nor does it provide a direct link to a repository for their work. It only links to a third-party library used.
Open Datasets Yes For this reason, we decided to use Mini Hack (Samvelyan et al., 2021) and Procgen (Cobbe et al., 2020).
Dataset Splits Yes Formally, we consider a set of tasks T split into disjoint sets Ttrain and Ttest. ... For example, on Procgen we train on 12 of the games and test on the remaining 4. ... For each pretrained model, we conduct two types of evaluations: few-shot and zero-shot. ... This is repeated for L levels per task, and we aggregate the episodic return across all levels. ... Table 4: Train Tasks Bigfish, Bossfight, Caveflyer, Chaser Fruitbot, Dodgeball, Heist, Coinrun Leaper, Miner, Starpilot, Maze ... Test Tasks Climber, Ninja Plunder, Jumper
Hardware Specification Yes We use 8 GPUs, each with 80GB of RAM, leveraging Py Torch s Distributed Data Parallel (DDP) capabilities for training.
Software Dependencies No The paper mentions 'Py Torch' and 'Adam W optimizer' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We train our models for 25 epochs in case of Mini Hack and 100 epochs for Procgen environments. Table 5: List of hyperparameters used in our experiments