Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Visual Reinforcement Learning with Imagined Goals
Authors: Ashvin V. Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation illustrates that our method substantially improves the performance of imagebased reinforcement learning, can effectively learn policies for complex image-based tasks, and can be used to learn real-world robotic manipulation skills with raw image inputs. |
| Researcher Affiliation | Academia | Ashvin Nair , Vitchyr Pong , Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine University of California, Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1 RIG: Reinforcement learning with imagined goals |
| Open Source Code | No | The paper provides a link to videos ('sites.google.com/site/visualrlwithimaginedgoals') but does not explicitly state that the source code for the methodology is open-source or provide a link to a code repository. |
| Open Datasets | No | The paper uses simulated environments (MuJoCo) and refers to collecting images ('pretrained the VAE with 100 images. For other tasks, we used 10,000 images.') but does not provide concrete access information (link, DOI, formal citation) for any publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper mentions using a 'real-world robotic system' and a 'real-world Sawyer robotic arm' but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions software like MuJoCo, TD3, and β-VAE but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The encoder and decoder parameters, φ and ψ respectively, are jointly trained to maximize L(ψ, φ; s(i)) = βDKL(qφ(z|s(i))||p(z)) + Eqφ(z|s(i))[log pψ(s(i) | z)], where p(z) is some prior, which we take to be the unit Gaussian, DKL is the Kullback-Leibler divergence, and β is a hyperparameter that balances the two terms. |