Learning Versatile Skills with Curriculum Masking

Authors: Yao Tang, Zhihui Xie, Zichuan Lin, Deheng Ye, Shuai Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we show that Curr Mask exhibits superior zero-shot performance on skill prompting tasks, goal-conditioned planning tasks, and competitive finetuning performance on offline RL tasks. Additionally, our analysis of training dynamics reveals that Curr Mask gradually acquires skills of varying complexity by dynamically adjusting its masking scheme.
Researcher Affiliation Collaboration 1Shanghai Jiao Tong University 2Tencent AI Lab
Pseudocode Yes Algorithm 1: Curriculum Masking Algorithm 2: Block-wise Masking
Open Source Code No The abstract states "Code is available at here.", which is a placeholder and not a direct, active link. Appendix B.1 mentions "Our Curr Mask implementation is based on the Mask DP codebase2. ^2https://github.com/FangchenLiu/MaskDP_public", but this refers to a codebase they built upon, not their specific implementation's open release.
Open Datasets Yes We evaluate our method on a set of environments from the Deep Mind control suite (Tunyasuvunakool et al., 2020).
Dataset Splits Yes For zero-shot evaluation, we additionally construct a validation set for each environment using the same protocol but with different random seeds, following the setting in the prior work (Liu et al., 2022).
Hardware Specification Yes Utilizing a single RTX 3090 graphics card, the pretraining on the assembled datasets takes approximately 7-8 hours for 300k gradient steps.
Software Dependencies No The paper mentions that "Our Curr Mask implementation is based on the Mask DP codebase2", but it does not specify version numbers for this codebase or any other key software libraries.
Experiment Setup Yes Table 4: Hyperparameters used for model training and evaluation. (Includes: # encoder layers, # decoder layers, # autoregressive transformer layers, # attention heads, context length, hidden dimension, mask ratio, block size, training optimizer, batch size, learning rate, # gradient steps, EXP3 ϵ, EXP3 γ, evaluation interval, etc.)