reproducibilityindex.ai

Learning Versatile Skills with Curriculum Masking

Authors: Yao Tang, Zhihui Xie, Zichuan Lin, Deheng Ye, Shuai Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we show that Curr Mask exhibits superior zero-shot performance on skill prompting tasks, goal-conditioned planning tasks, and competitive finetuning performance on offline RL tasks. Additionally, our analysis of training dynamics reveals that Curr Mask gradually acquires skills of varying complexity by dynamically adjusting its masking scheme.
Researcher Affiliation	Collaboration	1Shanghai Jiao Tong University 2Tencent AI Lab
Pseudocode	Yes	Algorithm 1: Curriculum Masking Algorithm 2: Block-wise Masking
Open Source Code	No	The abstract states "Code is available at here.", which is a placeholder and not a direct, active link. Appendix B.1 mentions "Our Curr Mask implementation is based on the Mask DP codebase2. ^2https://github.com/FangchenLiu/MaskDP_public", but this refers to a codebase they built upon, not their specific implementation's open release.
Open Datasets	Yes	We evaluate our method on a set of environments from the Deep Mind control suite (Tunyasuvunakool et al., 2020).
Dataset Splits	Yes	For zero-shot evaluation, we additionally construct a validation set for each environment using the same protocol but with different random seeds, following the setting in the prior work (Liu et al., 2022).
Hardware Specification	Yes	Utilizing a single RTX 3090 graphics card, the pretraining on the assembled datasets takes approximately 7-8 hours for 300k gradient steps.
Software Dependencies	No	The paper mentions that "Our Curr Mask implementation is based on the Mask DP codebase2", but it does not specify version numbers for this codebase or any other key software libraries.
Experiment Setup	Yes	Table 4: Hyperparameters used for model training and evaluation. (Includes: # encoder layers, # decoder layers, # autoregressive transformer layers, # attention heads, context length, hidden dimension, mask ratio, block size, training optimizer, batch size, learning rate, # gradient steps, EXP3 ϵ, EXP3 γ, evaluation interval, etc.)