Cross-Episodic Curriculum for Transformer Agents

Authors: Lucy Xiaoyang Shi, Yunfan Jiang, Jake Grigsby, Linxi Fan, Yuke Zhu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness of CEC is demonstrated under two representative scenarios: one involving multi-task reinforcement learning with discrete control, such as in Deep Mind Lab, where the curriculum captures the learning progression in both individual and progressively complex settings, and the other involving imitation learning with mixed-quality data for continuous control, as seen in Robo Mimic, where the curriculum captures the improvement in demonstrators expertise.
Researcher Affiliation Collaboration 1Stanford University 2The University of Texas at Austin 3NVIDIA Research
Pseudocode No The paper includes mathematical equations for attention and loss functions but does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code is opensourced on the project website cec-agent.github.io to facilitate research on Transformer agent learning.
Open Datasets Yes We investigate the effectiveness of CEC in enhancing sample efficiency and generalization with two representative case studies. They are: 1) Reinforcement Learning (RL) on Deep Mind Lab (DMLab) [5]... and 2) Imitation Learning (IL) from mixed-quality human demonstrations on Robo Mimic [53].
Dataset Splits Yes For our methods on RL settings, we compute the maximum success rate averaged across a sliding window over all test episodes to account for in-context improvement. The size of the sliding window equals one-quarter of the total test episodes. These values are averaged over 20 runs to constitute the final reporting metric. ... Table A.4: Experiment details on DMLab tasks. Columns Epoch denote the exact training epochs with best validation performance. We select these checkpoints for evaluation.
Hardware Specification Yes Training is performed on NVIDIA V100 GPUs.
Software Dependencies No The paper mentions: 'We implement all models in Py Torch [61] and adapt the implementation of Transformer-XL from VPT [4].' However, it does not specify the version numbers for PyTorch or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes We follow the best practice to train Transformer agents, including adopting Adam W optimizer [49], learning rate warm-up and cosine annealing [48], etc. ... Table A.3: Hyperparameters used during training. Learning Rate 0.0005 0.0001 Warmup Steps 1000 0 LR Cosine Annealing Steps 100000 N/A Weight Decay 0.0 0.0