Visual Reinforcement Learning with Imagined Goals
Authors: Ashvin V. Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation illustrates that our method substantially improves the performance of imagebased reinforcement learning, can effectively learn policies for complex image-based tasks, and can be used to learn real-world robotic manipulation skills with raw image inputs. |
| Researcher Affiliation | Academia | Ashvin Nair , Vitchyr Pong , Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine University of California, Berkeley {anair17,vitchyr,mdalal,shikharbahl,stevenlin598,svlevine}@berkeley.edu |
| Pseudocode | Yes | Algorithm 1 RIG: Reinforcement learning with imagined goals |
| Open Source Code | No | The paper provides a link to videos ('sites.google.com/site/visualrlwithimaginedgoals') but does not explicitly state that the source code for the methodology is open-source or provide a link to a code repository. |
| Open Datasets | No | The paper uses simulated environments (MuJoCo) and refers to collecting images ('pretrained the VAE with 100 images. For other tasks, we used 10,000 images.') but does not provide concrete access information (link, DOI, formal citation) for any publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper mentions using a 'real-world robotic system' and a 'real-world Sawyer robotic arm' but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions software like MuJoCo, TD3, and β-VAE but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The encoder and decoder parameters, φ and ψ respectively, are jointly trained to maximize L(ψ, φ; s(i)) = βDKL(qφ(z|s(i))||p(z)) + Eqφ(z|s(i))[log pψ(s(i) | z)], where p(z) is some prior, which we take to be the unit Gaussian, DKL is the Kullback-Leibler divergence, and β is a hyperparameter that balances the two terms. |