reproducibilityindex.ai

Privileged Sensing Scaffolds Reinforcement Learning

Authors: Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the training performance and final task scores of each method. The normalized final median scores in the top left of Figure 6 are computed using the performance of the unprivileged Dreamer V3 baseline as a lower bound (0.0) and the performance of a privileged Dreamer V3 model that is trained and evaluated on o+ as the upper bound (1.0). See Appendix E for more info.
Researcher Affiliation	Academia	1University of Pennsylvania 2UC Berkeley
Pseudocode	No	The paper describes the methodology textually and with diagrams (e.g., Figure 3, Figure 4) but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	To ensure reproducibility, we will release all code about Scaffolder, baselines, and Sensory Scaffolding Suite on the project website: https://penn-pal-lab.github.io/scaffolder/.
Open Datasets	No	The paper introduces a custom suite of 10 robotics-based tasks called the Sensory Scaffolding Suite (S3) for evaluation. It does not use or provide concrete access information for a publicly available, pre-existing dataset in the traditional sense.
Dataset Splits	No	The paper describes its evaluation protocol including periodic evaluation during training, but does not specify validation dataset splits or cross-validation setups, typical in supervised learning. The work is in reinforcement learning, where data is generated through interaction.
Hardware Specification	Yes	We train on Nvidia 2080ti, 3090, A10, A40, A6000, and L40 GPUs.
Software Dependencies	No	The paper mentions using Dreamer V3 and cleanrl's implementation of PPO, but does not provide specific version numbers for these or other software components.
Experiment Setup	Yes	We found that Dreamer V3 only needs tuning for two hyperparameters, the model size and update to data (UTD) ratio. We follow an easy guideline for tuning these more complicated dynamics generally require larger models, and tasks with harder exploration require more data and fewer updates (low UTD). As a result, we found hyperparameter settings that work for tasks with similar properties, as seen in Table 1.