Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning

Authors: Donghu Kim, Hojoon Lee, Kyungmin Lee, Dongyoon Hwang, Jaegul Choo

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that pre-training objectives focused on learning task-agnostic features (e.g., identifying objects and understanding temporal dynamics) enhance generalization across different environments. Our findings, illustrated in Figure 2, show that pre-training methods aimed at learning task-agnostic features, such as extracting spatial characteristics from images and temporal dynamics from videos, enhance generalization across various distribution shifts. In this section, we present our main experimental results, as illustrated in Figure 4 and 5. In our ablation studies, we assess the effects of variations in data optimality, size, and model size on pre-training methods.
Researcher Affiliation Academia Donghu Kim * 1 Hojoon Lee * 1 Kyungmin Lee * 1 Dongyoon Hwang 1 Jaegul Choo 1 1KAIST. Correspondence to: Donghu Kim <quagmire@kaist.ac.kr>.
Pseudocode No The paper provides detailed textual descriptions of algorithms in Section 4 'Algorithms' and Appendix B.2 'Baseline Implementations', but it does not include structured pseudocode blocks or clearly labeled algorithm boxes.
Open Source Code Yes We publicize our codes, datasets, and model checkpoints at https://github. com/dojeon-ai/Atari-PB.
Open Datasets Yes We publicize our codes, datasets, and model checkpoints at https://github. com/dojeon-ai/Atari-PB. The dataset for Atari-PB is derived from the DQN-Replay Dataset (Agarwal et al., 2020).
Dataset Splits No The paper mentions fine-tuning with 50,000 frames/interactions and evaluating performance, but it does not specify explicit train/validation/test dataset splits or provide quantitative details for a dedicated validation set.
Hardware Specification No The paper discusses model architectures (e.g., Res Net-50) and training parameters, but it does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU models, or memory specifications.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' and 'Rainbow algorithm', but it does not provide specific version numbers for these or other software libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes Each model was pre-trained for 100 epochs using an Adam W optimizer (Loshchilov & Hutter, 2017) with a batch size of 512. We experimented with various learning rates, selecting from the range of {1e 3, 3e 4, ..., 3e 5, 1e 6}, and adjusted the weight decay within the range of {1e 4, 1e 5, 1e 6}. For fine-tuning, ... kept the backbone parameters frozen, while the neck and head components were re-initialized and then fine-tuned for 100 epochs. Detailed hyperparameters are listed in Table 16.