Weakly-Supervised Reinforcement Learning for Controllable Behavior
Authors: Lisa Lee, Ben Eysenbach, Russ R. Salakhutdinov, Shixiang (Shane) Gu, Chelsea Finn
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On a variety of challenging, vision-based continuous control problems, our approach leads to substantial performance gains, particularly as the complexity of the environment grows. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Google Brain 3Stanford University |
| Pseudocode | Yes | Algorithm 1 Weakly-Supervised Control |
| Open Source Code | No | The paper does not contain an explicit statement or a direct link indicating that the source code for the methodology described in this paper is publicly available. While it references a dataset with a GitHub link, this is not the paper's own implementation code. |
| Open Datasets | No | The paper states: 'To generate the Sawyer datasets shown in Fig. 2, we corrected the sampled factor combinations to be physically feasible before generating the corresponding image observations in the Mujoco simulator.' and 'Both the training and test datasets were generated from the same distribution, and each set consists of 256 or 512 images (see Table 6).' However, it does not provide concrete access information (link, DOI, or citation for public availability) for these generated datasets. |
| Dataset Splits | No | The paper states, 'Both the training and test datasets were generated from the same distribution, and each set consists of 256 or 512 images (see Table 6).' While it mentions training and test datasets, it does not specify any validation splits, percentages, or absolute counts for training/validation/test partitions. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It only mentions the use of the Mujoco simulator for generating datasets, which is software, not hardware. |
| Software Dependencies | No | The paper mentions software like 'Mujoco simulator', 'OpenAI Gym', and 'ManiSkill', but it does not provide specific version numbers for any of these components or other key software dependencies required for replication. |
| Experiment Setup | Yes | We use Soft Actor-Critic (SAC) [33] with the default hyperparameters from the official SAC codebase, with the exception of the learning rate, which we set to 3e-4, and the reward scale, which we set to 0.1. We also use a larger latent dimension of 64 instead of 10 and 20 for our VAE models in the Sawyer environments. We train each agent for 2M environment steps. |