Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning
Authors: Donghu Kim, Hojoon Lee, Kyungmin Lee, Dongyoon Hwang, Jaegul Choo
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that pre-training objectives focused on learning task-agnostic features (e.g., identifying objects and understanding temporal dynamics) enhance generalization across different environments. Our findings, illustrated in Figure 2, show that pre-training methods aimed at learning task-agnostic features, such as extracting spatial characteristics from images and temporal dynamics from videos, enhance generalization across various distribution shifts. In this section, we present our main experimental results, as illustrated in Figure 4 and 5. In our ablation studies, we assess the effects of variations in data optimality, size, and model size on pre-training methods. |
| Researcher Affiliation | Academia | Donghu Kim * 1 Hojoon Lee * 1 Kyungmin Lee * 1 Dongyoon Hwang 1 Jaegul Choo 1 1KAIST. Correspondence to: Donghu Kim <quagmire@kaist.ac.kr>. |
| Pseudocode | No | The paper provides detailed textual descriptions of algorithms in Section 4 'Algorithms' and Appendix B.2 'Baseline Implementations', but it does not include structured pseudocode blocks or clearly labeled algorithm boxes. |
| Open Source Code | Yes | We publicize our codes, datasets, and model checkpoints at https://github. com/dojeon-ai/Atari-PB. |
| Open Datasets | Yes | We publicize our codes, datasets, and model checkpoints at https://github. com/dojeon-ai/Atari-PB. The dataset for Atari-PB is derived from the DQN-Replay Dataset (Agarwal et al., 2020). |
| Dataset Splits | No | The paper mentions fine-tuning with 50,000 frames/interactions and evaluating performance, but it does not specify explicit train/validation/test dataset splits or provide quantitative details for a dedicated validation set. |
| Hardware Specification | No | The paper discusses model architectures (e.g., Res Net-50) and training parameters, but it does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions software components like 'Adam W optimizer' and 'Rainbow algorithm', but it does not provide specific version numbers for these or other software libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | Each model was pre-trained for 100 epochs using an Adam W optimizer (Loshchilov & Hutter, 2017) with a batch size of 512. We experimented with various learning rates, selecting from the range of {1e 3, 3e 4, ..., 3e 5, 1e 6}, and adjusted the weight decay within the range of {1e 4, 1e 5, 1e 6}. For fine-tuning, ... kept the backbone parameters frozen, while the neck and head components were re-initialized and then fine-tuned for 100 epochs. Detailed hyperparameters are listed in Table 16. |