reproducibilityindex.ai

Visual Reinforcement Learning with Imagined Goals

Authors: Ashvin V. Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental evaluation illustrates that our method substantially improves the performance of imagebased reinforcement learning, can effectively learn policies for complex image-based tasks, and can be used to learn real-world robotic manipulation skills with raw image inputs.
Researcher Affiliation	Academia	Ashvin Nair , Vitchyr Pong , Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine University of California, Berkeley {anair17,vitchyr,mdalal,shikharbahl,stevenlin598,svlevine}@berkeley.edu
Pseudocode	Yes	Algorithm 1 RIG: Reinforcement learning with imagined goals
Open Source Code	No	The paper provides a link to videos ('sites.google.com/site/visualrlwithimaginedgoals') but does not explicitly state that the source code for the methodology is open-source or provide a link to a code repository.
Open Datasets	No	The paper uses simulated environments (MuJoCo) and refers to collecting images ('pretrained the VAE with 100 images. For other tasks, we used 10,000 images.') but does not provide concrete access information (link, DOI, formal citation) for any publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification	No	The paper mentions using a 'real-world robotic system' and a 'real-world Sawyer robotic arm' but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions software like MuJoCo, TD3, and β-VAE but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The encoder and decoder parameters, φ and ψ respectively, are jointly trained to maximize L(ψ, φ; s(i)) = βDKL(qφ(z\|s(i))\|\|p(z)) + Eqφ(z\|s(i))[log pψ(s(i) \| z)], where p(z) is some prior, which we take to be the unit Gaussian, DKL is the Kullback-Leibler divergence, and β is a hyperparameter that balances the two terms.