Weakly-Supervised Reinforcement Learning for Controllable Behavior

Authors: Lisa Lee, Ben Eysenbach, Russ R. Salakhutdinov, Shixiang (Shane) Gu, Chelsea Finn

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a variety of challenging, vision-based continuous control problems, our approach leads to substantial performance gains, particularly as the complexity of the environment grows.
Researcher Affiliation Collaboration 1Carnegie Mellon University 2Google Brain 3Stanford University
Pseudocode Yes Algorithm 1 Weakly-Supervised Control
Open Source Code No The paper does not contain an explicit statement or a direct link indicating that the source code for the methodology described in this paper is publicly available. While it references a dataset with a GitHub link, this is not the paper's own implementation code.
Open Datasets No The paper states: 'To generate the Sawyer datasets shown in Fig. 2, we corrected the sampled factor combinations to be physically feasible before generating the corresponding image observations in the Mujoco simulator.' and 'Both the training and test datasets were generated from the same distribution, and each set consists of 256 or 512 images (see Table 6).' However, it does not provide concrete access information (link, DOI, or citation for public availability) for these generated datasets.
Dataset Splits No The paper states, 'Both the training and test datasets were generated from the same distribution, and each set consists of 256 or 512 images (see Table 6).' While it mentions training and test datasets, it does not specify any validation splits, percentages, or absolute counts for training/validation/test partitions.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It only mentions the use of the Mujoco simulator for generating datasets, which is software, not hardware.
Software Dependencies No The paper mentions software like 'Mujoco simulator', 'OpenAI Gym', and 'ManiSkill', but it does not provide specific version numbers for any of these components or other key software dependencies required for replication.
Experiment Setup Yes We use Soft Actor-Critic (SAC) [33] with the default hyperparameters from the official SAC codebase, with the exception of the learning rate, which we set to 3e-4, and the reward scale, which we set to 0.1. We also use a larger latent dimension of 64 instead of 10 and 20 for our VAE models in the Sawyer environments. We train each agent for 2M environment steps.