RePreM: Representation Pre-training with Masked Model for Reinforcement Learning

Authors: Yuanying Cai, Chuheng Zhang, Wei Shen, Xuyun Zhang, Wenjie Ruan, Longbo Huang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate the effectiveness of Re Pre M in various tasks, including dynamic prediction, transfer learning, and sample-efficient RL with both value-based and actor-critic methods. We conduct extensive experiments on Atari games [Bellemare et al. 2013] and Deep Mind Control Suite (DMControl) [Tassa et al. 2018]. We show that our pre-trained state encoder enables sample-efficient learning on several downstream tasks including dynamic prediction, transfer learning, and sample-efficient RL. For dynamic prediction, our encoder results in a smaller prediction error than the baselines, especially for long-horizon predictions. For transfer learning, we pre-train the encoder on the data from a set of 24 Atari games and successfully transfer the representation to unseen games. For sample-efficient RL, we evaluate the pre-trained representation with the 100k benchmark.
Researcher Affiliation Collaboration Yuanying Cai1, Chuheng Zhang2, Wei Shen3, Xuyun Zhang4, Wenjie Ruan4,5, Longbo Huang1* 1 IIIS, Tsinghua University, Beijing, China 2 Microsoft Research Asia, Beijing, China 3 Hulu, Beijing, China 4 Macquarie University, Sydney, Australia 5 University of Exeter, Exeter, UK
Pseudocode No The paper describes the model architecture and training process in detail but does not include a formal pseudocode block or algorithm listing.
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes We conduct the following experiments on Atari [Bellemare et al. 2013] and DMControl [Tassa et al. 2018]. For Atari, as in many previous papers (e.g., [Kaiser et al. 2019]), we select a set of 26 games. For pre-training on Atari games, we collect the following three types of datasets with different qualities for each game: The Random dataset is collected by executing uniformly randomly sampled actions at a number of consecutive steps sampled from a Geometric distribution with p = 1/3. The Weak dataset is collected from the first 1M transitions generated by DQN. The Mixed dataset is obtained by concatenating multiple checkpoints evenly throughout the training of DQN. ... For DMControl, we collect the offline dataset in a similar way to the procedure to collect the Mixed dataset using SAC [Haarnoja et al. 2018].
Dataset Splits No The paper describes the collection of datasets (Random, Weak, Mixed) and mentions using the 'Atari-100k benchmark' where the agent interacts for '100k steps'. For dynamic prediction, it states 'We collect the training and testing samples using the same DQN policy'. However, it does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, and testing that would be needed for direct reproducibility of data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions various models and algorithms like 'ResNet', 'GTrXL', 'SimCLR', 'Rainbow', 'SAC', 'BERT', 'DQN', but does not specify any software packages or libraries with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) that were used for implementation.
Experiment Setup No The paper describes aspects of the pre-training and task setup, such as the masking process (e.g., 'probability p', 'number of consecutive masked items is sampled from Unif(n)') and the use of 'L GTr XL blocks' and a 'larger encoder architecture'. However, it does not provide concrete hyperparameter values (e.g., specific learning rates, batch sizes, number of epochs for training), optimizer settings, or other system-level training configurations needed for full reproducibility.