Learning Versatile Skills with Curriculum Masking
Authors: Yao Tang, Zhihui Xie, Zichuan Lin, Deheng Ye, Shuai Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we show that Curr Mask exhibits superior zero-shot performance on skill prompting tasks, goal-conditioned planning tasks, and competitive finetuning performance on offline RL tasks. Additionally, our analysis of training dynamics reveals that Curr Mask gradually acquires skills of varying complexity by dynamically adjusting its masking scheme. |
| Researcher Affiliation | Collaboration | 1Shanghai Jiao Tong University 2Tencent AI Lab |
| Pseudocode | Yes | Algorithm 1: Curriculum Masking Algorithm 2: Block-wise Masking |
| Open Source Code | No | The abstract states "Code is available at here.", which is a placeholder and not a direct, active link. Appendix B.1 mentions "Our Curr Mask implementation is based on the Mask DP codebase2. ^2https://github.com/FangchenLiu/MaskDP_public", but this refers to a codebase they built upon, not their specific implementation's open release. |
| Open Datasets | Yes | We evaluate our method on a set of environments from the Deep Mind control suite (Tunyasuvunakool et al., 2020). |
| Dataset Splits | Yes | For zero-shot evaluation, we additionally construct a validation set for each environment using the same protocol but with different random seeds, following the setting in the prior work (Liu et al., 2022). |
| Hardware Specification | Yes | Utilizing a single RTX 3090 graphics card, the pretraining on the assembled datasets takes approximately 7-8 hours for 300k gradient steps. |
| Software Dependencies | No | The paper mentions that "Our Curr Mask implementation is based on the Mask DP codebase2", but it does not specify version numbers for this codebase or any other key software libraries. |
| Experiment Setup | Yes | Table 4: Hyperparameters used for model training and evaluation. (Includes: # encoder layers, # decoder layers, # autoregressive transformer layers, # attention heads, context length, hidden dimension, mask ratio, block size, training optimizer, batch size, learning rate, # gradient steps, EXP3 ϵ, EXP3 γ, evaluation interval, etc.) |