reproducibilityindex.ai

An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning

Authors: Jaesik Yoon, Yi-Fu Wu, Heechul Bae, Sungjin Ahn

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we investigate the effectiveness of OCR pre-training for image-based reinforcement learning via empirical experiments. For systematic evaluation, we introduce a simple object-centric visual RL benchmark and conduct experiments to answer questions such as Does OCR pre-training improve performance on object-centric tasks? and Can OCR pre-training help with out-of-distribution generalization? . Our results provide empirical evidence for valuable insights...
Researcher Affiliation	Collaboration	1SAP 2Rutgers University 3ETRI 4KAIST. Correspondence to: Jaesik Yoon and Sungjin Ahn <mail@jaesikyoon.com and sjn.ahn@gmail.com>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The benchmark and source code are available on the project website:https://sites.google.com/view/ocrl/home.
Open Datasets	No	For pre-training on the 2D tasks, we generate a dataset with a varying number of objects of different shapes randomly placed in the scene. ... For 3D task from Causal World framework, we generate a dataset through a random policy on the task. No direct link or specific access information is provided for the generated datasets themselves, only the code to generate them.
Dataset Splits	Yes	The number of scenes used for training and validation are 1 million and 100,000, respectively.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using 'Stable Baselines3 library (Raffin et al., 2019)' but does not provide a specific version number for this or any other software dependency.
Experiment Setup	Yes	Detailed information about the architecture and hyperparameters is in Appendix A. For example, for VAE: 'Additional hyperparameters include a learning rate of 0.0001, a weight for the KL-term of 5, and a batch size of 128.' Also, for PPO: 'with a learning rate of 0.0003. Additional configurations were tuned across tasks and models. The steps per training were selected from 2048 or 8192, and the coefficient for the entropy term was selected from 0, 0.01, 0.03, 0.05, or 0.1.' Tables 17 and 18 provide further hyperparameter settings.