An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning
Authors: Jaesik Yoon, Yi-Fu Wu, Heechul Bae, Sungjin Ahn
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we investigate the effectiveness of OCR pre-training for image-based reinforcement learning via empirical experiments. For systematic evaluation, we introduce a simple object-centric visual RL benchmark and conduct experiments to answer questions such as Does OCR pre-training improve performance on object-centric tasks? and Can OCR pre-training help with out-of-distribution generalization? . Our results provide empirical evidence for valuable insights... |
| Researcher Affiliation | Collaboration | 1SAP 2Rutgers University 3ETRI 4KAIST. Correspondence to: Jaesik Yoon and Sungjin Ahn <mail@jaesikyoon.com and sjn.ahn@gmail.com>. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The benchmark and source code are available on the project website:https://sites.google.com/view/ocrl/home. |
| Open Datasets | No | For pre-training on the 2D tasks, we generate a dataset with a varying number of objects of different shapes randomly placed in the scene. ... For 3D task from Causal World framework, we generate a dataset through a random policy on the task. No direct link or specific access information is provided for the generated datasets themselves, only the code to generate them. |
| Dataset Splits | Yes | The number of scenes used for training and validation are 1 million and 100,000, respectively. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using 'Stable Baselines3 library (Raffin et al., 2019)' but does not provide a specific version number for this or any other software dependency. |
| Experiment Setup | Yes | Detailed information about the architecture and hyperparameters is in Appendix A. For example, for VAE: 'Additional hyperparameters include a learning rate of 0.0001, a weight for the KL-term of 5, and a batch size of 128.' Also, for PPO: 'with a learning rate of 0.0003. Additional configurations were tuned across tasks and models. The steps per training were selected from 2048 or 8192, and the coefficient for the entropy term was selected from 0, 0.01, 0.03, 0.05, or 0.1.' Tables 17 and 18 provide further hyperparameter settings. |