Privileged Sensing Scaffolds Reinforcement Learning
Authors: Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the training performance and final task scores of each method. The normalized final median scores in the top left of Figure 6 are computed using the performance of the unprivileged Dreamer V3 baseline as a lower bound (0.0) and the performance of a privileged Dreamer V3 model that is trained and evaluated on o+ as the upper bound (1.0). See Appendix E for more info. |
| Researcher Affiliation | Academia | 1University of Pennsylvania 2UC Berkeley |
| Pseudocode | No | The paper describes the methodology textually and with diagrams (e.g., Figure 3, Figure 4) but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | To ensure reproducibility, we will release all code about Scaffolder, baselines, and Sensory Scaffolding Suite on the project website: https://penn-pal-lab.github.io/scaffolder/. |
| Open Datasets | No | The paper introduces a custom suite of 10 robotics-based tasks called the Sensory Scaffolding Suite (S3) for evaluation. It does not use or provide concrete access information for a publicly available, pre-existing dataset in the traditional sense. |
| Dataset Splits | No | The paper describes its evaluation protocol including periodic evaluation during training, but does not specify validation dataset splits or cross-validation setups, typical in supervised learning. The work is in reinforcement learning, where data is generated through interaction. |
| Hardware Specification | Yes | We train on Nvidia 2080ti, 3090, A10, A40, A6000, and L40 GPUs. |
| Software Dependencies | No | The paper mentions using Dreamer V3 and cleanrl's implementation of PPO, but does not provide specific version numbers for these or other software components. |
| Experiment Setup | Yes | We found that Dreamer V3 only needs tuning for two hyperparameters, the model size and update to data (UTD) ratio. We follow an easy guideline for tuning these more complicated dynamics generally require larger models, and tasks with harder exploration require more data and fewer updates (low UTD). As a result, we found hyperparameter settings that work for tasks with similar properties, as seen in Table 1. |