reproducibilityindex.ai

Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning

Authors: Donghu Kim, Hojoon Lee, Kyungmin Lee, Dongyoon Hwang, Jaegul Choo

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that pre-training objectives focused on learning task-agnostic features (e.g., identifying objects and understanding temporal dynamics) enhance generalization across different environments. Our findings, illustrated in Figure 2, show that pre-training methods aimed at learning task-agnostic features, such as extracting spatial characteristics from images and temporal dynamics from videos, enhance generalization across various distribution shifts. In this section, we present our main experimental results, as illustrated in Figure 4 and 5. In our ablation studies, we assess the effects of variations in data optimality, size, and model size on pre-training methods.
Researcher Affiliation	Academia	Donghu Kim * 1 Hojoon Lee * 1 Kyungmin Lee * 1 Dongyoon Hwang 1 Jaegul Choo 1 1KAIST. Correspondence to: Donghu Kim <quagmire@kaist.ac.kr>.
Pseudocode	No	The paper provides detailed textual descriptions of algorithms in Section 4 'Algorithms' and Appendix B.2 'Baseline Implementations', but it does not include structured pseudocode blocks or clearly labeled algorithm boxes.
Open Source Code	Yes	We publicize our codes, datasets, and model checkpoints at https://github. com/dojeon-ai/Atari-PB.
Open Datasets	Yes	We publicize our codes, datasets, and model checkpoints at https://github. com/dojeon-ai/Atari-PB. The dataset for Atari-PB is derived from the DQN-Replay Dataset (Agarwal et al., 2020).
Dataset Splits	No	The paper mentions fine-tuning with 50,000 frames/interactions and evaluating performance, but it does not specify explicit train/validation/test dataset splits or provide quantitative details for a dedicated validation set.
Hardware Specification	No	The paper discusses model architectures (e.g., Res Net-50) and training parameters, but it does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU models, or memory specifications.
Software Dependencies	No	The paper mentions software components like 'Adam W optimizer' and 'Rainbow algorithm', but it does not provide specific version numbers for these or other software libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	Each model was pre-trained for 100 epochs using an Adam W optimizer (Loshchilov & Hutter, 2017) with a batch size of 512. We experimented with various learning rates, selecting from the range of {1e 3, 3e 4, ..., 3e 5, 1e 6}, and adjusted the weight decay within the range of {1e 4, 1e 5, 1e 6}. For fine-tuning, ... kept the backbone parameters frozen, while the neck and head components were re-initialized and then fine-tuned for 100 epochs. Detailed hyperparameters are listed in Table 16.