Cross-Episodic Curriculum for Transformer Agents
Authors: Lucy Xiaoyang Shi, Yunfan Jiang, Jake Grigsby, Linxi Fan, Yuke Zhu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of CEC is demonstrated under two representative scenarios: one involving multi-task reinforcement learning with discrete control, such as in Deep Mind Lab, where the curriculum captures the learning progression in both individual and progressively complex settings, and the other involving imitation learning with mixed-quality data for continuous control, as seen in Robo Mimic, where the curriculum captures the improvement in demonstrators expertise. |
| Researcher Affiliation | Collaboration | 1Stanford University 2The University of Texas at Austin 3NVIDIA Research |
| Pseudocode | No | The paper includes mathematical equations for attention and loss functions but does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code is opensourced on the project website cec-agent.github.io to facilitate research on Transformer agent learning. |
| Open Datasets | Yes | We investigate the effectiveness of CEC in enhancing sample efficiency and generalization with two representative case studies. They are: 1) Reinforcement Learning (RL) on Deep Mind Lab (DMLab) [5]... and 2) Imitation Learning (IL) from mixed-quality human demonstrations on Robo Mimic [53]. |
| Dataset Splits | Yes | For our methods on RL settings, we compute the maximum success rate averaged across a sliding window over all test episodes to account for in-context improvement. The size of the sliding window equals one-quarter of the total test episodes. These values are averaged over 20 runs to constitute the final reporting metric. ... Table A.4: Experiment details on DMLab tasks. Columns Epoch denote the exact training epochs with best validation performance. We select these checkpoints for evaluation. |
| Hardware Specification | Yes | Training is performed on NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions: 'We implement all models in Py Torch [61] and adapt the implementation of Transformer-XL from VPT [4].' However, it does not specify the version numbers for PyTorch or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We follow the best practice to train Transformer agents, including adopting Adam W optimizer [49], learning rate warm-up and cosine annealing [48], etc. ... Table A.3: Hyperparameters used during training. Learning Rate 0.0005 0.0001 Warmup Steps 1000 0 LR Cosine Annealing Steps 100000 N/A Weight Decay 0.0 0.0 |